Hello,
Over the past weeks I have been in and out of links and documents about the linux and GNOME Shell architecture from a point of view I would venture to label as accessibility, vis-a-vis similar preliminary learnings about the Android stack ― and I’d like to suggest the aspect of accessibility from a perspective which has been integrated, even though with caveats, in the Android platform view of accessibility: the capability to bolt-on a service that drives the desktop on the user’s behalf under user consent. In Android, this is more or less fully implemented as the Accessibility Service API.
My motivation is as follows ― but I can envision it being more generic ― I am developing an integrated voice and air-gesture user interaction layer for using compute devices without, or in ways augmenting, the quarter of century old (?!) use of keyboard and mice. We have voice transcription, computer vision and machine learning now, that may orchestrate user interaction in ways augmenting legacy user interaction input devices. I’d like to integrate this layer in linux, and I am looking at Wayland and GNOME as the integration counterpart.
I am new to this community and still learning its ethics and culture, but have at least registered for GUADEC 2021 and I hope that my initial rumination on this topic on this thread was not much of a distraction.
In a way this may be seen as stretching the boundaries of traditional accessibility as implemented so far. While an alternative view would be that this is more of a new input modality. Either way all of our traditional user interfaces other than the BIOS beep and outside the realm of Alexa/Siri/Voice Applications, are currently GUI based (I would say even the terminal can be seen as a very primitive visual user interface). And I am currently thinking of integrating the graphical display paradigm with this new input modality.
In desktop operating systems, I might think that accessibility has so far focused primarily on solutions for the vision impaired (translating visual input and orientation to other senses) and on ways of extending the OOTB standard paradigm keyboard and mouse to special motor and ergonomic needs. These have been exceptional endeavors in making digital equipment accessible to people happening to have special needs, from within the technology itself.
Following this ruminating line of progression in stretching the bounds of input possibilities, or as long as I foresee this concept of voice and air-touch interaction as being an accessibility thing connecting into the visual feedback paradigm of display monitors, I might place voice and gesture with one foot in the accessibility department. I could call this an integrated accessibility service, or, a desktop interaction service.
One conceivable set of core capabilities that such a desktop interaction service may need would be:
-
safe injection of input on user behalf ― robust avoidance of injecting input to a window or field that have shifted out of focus between the time that the user started voice dictating and the time they finished; which may require some sort of transactional API involving signals or callbacks.
-
the ability to manage windows on user behalf ― maximize, resize, minimize, close, switch to.
-
the ability to operate the desktop (anything that’s not an application window per-se but rather the “desktop” elements such as the (GNOME) apps bar, activities bar, widgets that are part of the desktop experience rather than a user application.
-
and the trivial ability to launch apps or initiate sleep/hibernation/shutdown through a customized verification dialog.
I would be happy to learn how you find these capabilities related to ATK/AT-SPI, the future of GNOME Shell, and/or possible hooks in Mutter as per your vision for GNOME Shell and GTK.
Sincerely,
Matan
P.S. please do PM me if you think I deserve a beating for the long post.