Accessibility Service and Accessibility Services | Suggestion and Question

Hello,

Over the past weeks I have been in and out of links and documents about the linux and GNOME Shell architecture from a point of view I would venture to label as accessibility, vis-a-vis similar preliminary learnings about the Android stack ― and I’d like to suggest the aspect of accessibility from a perspective which has been integrated, even though with caveats, in the Android platform view of accessibility: the capability to bolt-on a service that drives the desktop on the user’s behalf under user consent. In Android, this is more or less fully implemented as the Accessibility Service API.

My motivation is as follows ― but I can envision it being more generic ― I am developing an integrated voice and air-gesture user interaction layer for using compute devices without, or in ways augmenting, the quarter of century old (?!) use of keyboard and mice. We have voice transcription, computer vision and machine learning now, that may orchestrate user interaction in ways augmenting legacy user interaction input devices. I’d like to integrate this layer in linux, and I am looking at Wayland and GNOME as the integration counterpart.

I am new to this community and still learning its ethics and culture, but have at least registered for GUADEC 2021 and I hope that my initial rumination on this topic on this thread was not much of a distraction.

In a way this may be seen as stretching the boundaries of traditional accessibility as implemented so far. While an alternative view would be that this is more of a new input modality. Either way all of our traditional user interfaces other than the BIOS beep and outside the realm of Alexa/Siri/Voice Applications, are currently GUI based (I would say even the terminal can be seen as a very primitive visual user interface). And I am currently thinking of integrating the graphical display paradigm with this new input modality.

In desktop operating systems, I might think that accessibility has so far focused primarily on solutions for the vision impaired (translating visual input and orientation to other senses) and on ways of extending the OOTB standard paradigm keyboard and mouse to special motor and ergonomic needs. These have been exceptional endeavors in making digital equipment accessible to people happening to have special needs, from within the technology itself.

Following this ruminating line of progression in stretching the bounds of input possibilities, or as long as I foresee this concept of voice and air-touch interaction as being an accessibility thing connecting into the visual feedback paradigm of display monitors, I might place voice and gesture with one foot in the accessibility department. I could call this an integrated accessibility service, or, a desktop interaction service.

One conceivable set of core capabilities that such a desktop interaction service may need would be:

  • safe injection of input on user behalf ― robust avoidance of injecting input to a window or field that have shifted out of focus between the time that the user started voice dictating and the time they finished; which may require some sort of transactional API involving signals or callbacks.

  • the ability to manage windows on user behalf ― maximize, resize, minimize, close, switch to.

  • the ability to operate the desktop (anything that’s not an application window per-se but rather the “desktop” elements such as the (GNOME) apps bar, activities bar, widgets that are part of the desktop experience rather than a user application.

  • and the trivial ability to launch apps or initiate sleep/hibernation/shutdown through a customized verification dialog.

I would be happy to learn how you find these capabilities related to ATK/AT-SPI, the future of GNOME Shell, and/or possible hooks in Mutter as per your vision for GNOME Shell and GTK.

Sincerely,
Matan

P.S. please do PM me if you think I deserve a beating for the long post.

It sounds like an accessibility tool that wants more access that AT-SPI can offer. I think your best bet is to just start hammering out a GNOME Shell extension, assuming you have the input collection (ie. voice recognition, etc) in working order.

Thanks for bearing with my exposition.

I wonder what are some of the popular/notable GUI toolkits or specific projects using ATK/AT-SPI, in particular such ones that actually use this endpoint of the API, which if I understand correctly is the way for an application to voluntarily indicate that a certain action can be triggered in it by a consumer of the ATK/AT-SPI API.

Does GTK itself use these API across the board so that any GTK Window / GUI element are open to accessibility interaction by implementations using the ATK/AT-SPI from the other side?

For historical reasons, ATK is merely a wrapper around AT-SPI, and it’s only used by GTK3.

GTK4 uses AT-SPI directly, but not the entire D-Bus API surface, because:

  1. it is not a good D-Bus API
  2. it is not a good accessibility API for the Linux desktop in 2021

From a D-Bus perspective, it’s bad because it operates on small data that can be produced in large quantities; complex UIs will end up spamming the accessibility bus with loads of signal emissions, instead of operating in bulk. The “action” interface is very limited, and only maps to actions that are representable by a single entry point—“click”, “toggle”, “activate”—instead of being parametrised—“select the radio button with the value ‘foo’”, “pick a color for the following RGBA tuple”, “move to the 12th row in a list”. It’s the reason accessibility is consigned to its own separate bus instead of being on the main session bus. The whole thing was designed on CORBA, and then haphazardly ported to D-Bus because of GNOME3.

From an accessibility perspective, it is a gaping security hole that exposes all applications, even the sandboxed ones, on the same low barrier/high privileges bus; it’s like running X11: anything that says to be an AT can peek at any application, and the user cannot know it’s happening.

In general, AT-SPI is not a great API to deal with. Its design is old, and based on two technologies (X11 and CORBA) that have been left behind ages ago.

Additionally, do not expect all applications to implement AT-SPI completely. The subset of GTK3 widgets that survived from the GTK2 era is probably the closest one to a compliant implementation.

For GTK4, we re-implemented the AT-SPI end point in the toolkit; sadly, given that nobody really knows how AT-SPI works or how assistive technologies operate, we had to treat it as a black box, which means it has holes where we have no idea what ATs expect.

I’m working on a newer version of AT-SPI, but it’s still in the design and requirement phase, so it might take a little while.

Thanks for this again. Should ad-hoc questions about extensions go in the Extensions channel of the element.io server or is that channel for other things?

Going back to the topic, I am also developing bilingual (multilingual) dictation for this all, as many of us out there are bilingual and communicate bilingually over professional stuff or with friends. I think it is possible to say that at some level that is a form of accessibility even without multilingualism, as the same considerations apply also for just coming up with the right symbol that doesn’t happen to be on the language keyboard now and then.

So for this dictation scenario I was thinking of using ibus for robustly and cleanly inserting unicode characters instead of resorting to fiddling the low-level keyboard mappings for managing all the different characters while simulating key strokes as the input method. Directly inserting unicode will enable taking bilingual dictation and dictation involving special symbols whatever they are from the user ― and “simply” inserting the UTF text in a way that does not involve fiddling those mappings.

I would like to re-use the ibus api that is responsible for the “code points” mode of entry for unicode, but just directly inserting the unicode characters that come from transcribing the dictation into text, not the part that shows the user the unicode number that they type, of course.

Do you find I should take note of any special considerations in using things like ibus_engine_simple_commit_char from ibus?

Trying to explore for myself in the code, I couldn’t so far find where does GNOME Shell call into the code responsible for the “code points” functionality which ends up as I mentioned, here in ibus, but I would guess it comes from inputMethod.js and I’m just not following the cascade of calls between the two. Could you maybe recommend a certain IDE/dev setup for exploring the shell stack from GJS (in the context of GNOME Shell) down to libraries like ibus?

Now as for pointing device emulation, this should be possible using libevdev. With libevdev, this will have to happen on a separate process (communicating to the larger shell extension via something like GNOME dbus or socket communication from GJS, or through a GIO socket) due to one or two reasons:

  1. As far as I know there is no complete javascript wrapper for this library
  2. It may well require root access in order to define a new (virtual) device and send mouse events through it ― and I assume that the shell does not run as root (?!).

I’m just not aware whether the shell has any API of its own for pointer position manipulation (MouseTweaks appears to be a separate C program and not part of the shell nor an extension). Does GNOME Shell have or use other API for pointer location manipulation or mouse buttons emulation?

The RemoteDesktop portal can be used to send synthetic keyboard and mouse events to gnome-shell: Portal API Reference — Flatpak documentation. There are a few other ways to do it, but the portal will work best from within a sandbox.

Thanks for this. This suggestion assumes that one opts to be using flatpak doesn’t it? Is flatpak going to be somehow integrated with, or a prerequisite for GNOME Shell, in current or future releases? I did some reading about flatpak but am probably ignorant about how much it might be trending in linux.

Would I be right in understanding that the Remote Desktop API of flatpak is available to any application that is packaged and installed via flatpak?

The portal APIs are developed by flatpak, and they are available to flatpak applications, but the APIs do not depend on flatpak. Any application can access the portal APIs on any GNOME system, as long as it has the xdg-desktop-portal-gtk package installed. I think other sandboxing solutions such as snapd and firejail are starting to suggest use of the portal APIs as well. If the plan is to ship an accessibility app as a flatpak or snap package, these may be the APIs that are wanted.

The underlying APIs can also be accessed directly through Mutter, but this API is private and subject to change: org.gnome.Mutter.RemoteDesktop.xml

If these are not adequate, one may also be able to create gnome-shell extensions that are able to directly inject events to clutter, but I haven’t tried it to see how well it works.

Thanks for this. So I understand that the portal API is currently trending nicely, and appears to be there on at least my Ubuntu system by default. I gather that it is also tightly related with flatpak as a means for flatpak applications to escape their sandbox through an agreed upon API, an API that is included in GNOME Shell and one of the other desktop environments as such.

So my take is that the Portal API could indeed be a fair solution for pointer emulation, whereas it does not provide a way for enabling multilingual text injection as in a multilingual dictation scenario, as it can only send key codes. I would therefore believe I should probably revert to the Ibus method that I mentioned earlier on this thread, as the most robust and clean way for injecting unicode text. Otherwise, tinkering key mappings in a totally hackerish way might be the only path for injecting unicode text and this dynamic fiddling of key mappings would be way outside any robust design, as well as interfere with the user’s chosen mappings.

The Portal API seems like a viable choice for pointing device emulation till I encounter the unexpected about it; in particular because IIUC the Portal API is completely user-space, unlike libevdev in the same context.

I don’t believe you can use that ibus API without implementing your own input method, which is probably not wanted, but maybe someone more knowledgeable than me could give a better answer here. It seems you can invoke Main.inputMethod.commit directly from within gnome-shell though, to send already composed text. Such is the method used by GNOME’s onscreen keyboard. If you want to send other keys (tab, enter, escape, arrow keys, etc) you will likely have to use another method that can send the raw keysyms.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.