Strawman for a new version of AT-SPI

AT-SPI is not a great DBus API, and it’s still pretty tied to concepts that made sense about 20 years ago.

We should begin thinking about a potential set of changes, driven by toolkits, to the AT-SPI interfaces to ensure that it conforms to DBus and windowing system best practices.

This is a strawman proposal with a set of changes for AT-SPI; it mostly impacts the DBus interfaces, but once those change, it will require a new libatspi major version, as well as a pyatspi set of changes to drop old legacy compatibility code from pyatspi 1.x and 2.x.

AT-SPI v3

  • version the DBus interface: org.a11y.atspi3.* instead of having a “AtSpiVersion” property on the root node
  • use properties instead of Get methods
    • drop org.a11y.atspi.Event.* interface and use org.freedesktop.DBus.Properties.PropertiesChanged signal
  • remove setters on the org.a11y.atspi.Component interface
    • SetExtents()
    • SetPosition()
    • SetSize()
  • remove screen-relative coordinates and keep top-level relative ones
  • remove old methods:
    • org.a11y.atspi.Component.GetPosition() - use GetExtents()
    • org.a11y.atspi.Component.GetSize() - use GetExtents()
    • org.a11y.atspi.Component.GetAlpha()
    • org.a11y.atspi.Component.GetLayer()
    • org.a11y.atspi.Component.GetMDIZOrder()
    • org.a11y.atspi.Image.GetImagePosition() - use org.a11y.atspi.Component
    • org.a11y.atspi.Image.GetImageSize()
    • org.a11y.atspi.Image.GetImageExtents()
  • add parameters to org.a11y.atspi.Action actions
    • return the action signature in the GetActions() method
    • add a variant in argument to DoAction(), containing the payload of the action arguments
  • remove org.a11y.atspi.DeviceEventController and org.a11y.atspi.DeviceEventListener; use platform-specific API to listen to device changes and events
  • remove org.a11y.atspi.Registry
  • add org.a11y.atspi.Value.CurrentValueText; matches WAI-ARIA aria-valuetext to avoid overriding org.a11y.atspi.Accessible Name
4 Likes

remove org.a11y.atspi.DeviceEventController and org.a11y.atspi.DeviceEventListener ; use platform-specific API to listen to device changes and event

What exactly would this mean? Should ORCA or whatever component wants e.g. registerKeystrokeListener() talk directly to a DE specific API that e.g. GNOME Shell would expose something specifically for ORCA or … ? Or would we want a separate AT ↔ display server D-Bus API for things like that?

1 Like

Ideally, we’d have a windowing system specific protocol, for instance a Wayland protocol like we do for input methods. Orca would register key shortcuts, and be notified by the compositor when those happen; it would also be able to send key/pointer-like events to the currently focused application through the compositor.

What is it that is windowing system specific that it needs to do? Screen readers don’t tend to have open windows, or need any kind of focus tracking themself, I assume. Wayland protocols are complicated here because they are exposed unfiltered to sandboxes; they don’t have a portal that implements permission control, user controlled access granting etc.

I assume if Orca wants to send key/pointer events, it’d eventually use libei, which is intended to be accessible via portals, but for listening, would Orca want to register actions, scrap generic keys, a very limited selected few, or what level of eaves dropping does it require? Can you give examples?

The problem is that we want this protocol to work on every Wayland compositor, so proposing a DBus interface for it is going to fly like a leaden balloon. Additionally, Orca is already going to use XInput2 on X11 to subscribe to events because we dropped all the key snooping API from GTK.

The compositor can act as a broker. Either the screen reader (or any other AT) is installed by the OS, in which case it has implicit access to the system; or the AT is installed by the user, in which case:

  • it will need authorisation
  • it will need that authorisation to be accessible, otherwise people using ATs to perceive the UI won’t ever be able to authorise it

Yes, that’s definitely the most viable option.

The snooping level is a lot. Orca is expected to be able to replay event sequences, or at least see what kind of keys have been pressed from a certain point in time.

We have a whole discussion about this on GitLab.

I tend to think that there’s better hope to being standard if the event snooping is an integral part of the at-spi3 DBus interfaces. “There is a wayland protocol” does not necessarily mean “all compositors will support the protocol”, plus there’s a negotiation layer due that probably everyone will have to reinvent.

IMHO, it would be best to make the DBus protocol observe both sides of event snooping (feeding the events from one trusted source, and receiving them). Compositors of course have to support it to be accessible, but the creation and support of a wayland protocol is just as unlikely to happen overnight. There’s also chances that compositors want to be a a11y client in addition, to be truly called accessible. Key snooping is just a bare minimum.

From the mutter perspective, the tendency has been towards unifying platform bits in common interfaces (e.g. drawing stuff from g-s-d), I personally think it’d be cleaner to do that here as well, i.e. peeking in X11 with XI2 raw events, getting them out of the box with Wayland/native, and feed them the same way to AT.

I understand it is highly important to make a smooth transition, and to support as many scenarios as possible as soon as possible. With X11 environments this could come as a “fallback” daemon that does the XI2 raw event snooping and feeds it to AT. In Wayland, doing a libinput-based counterpart does not sound far fetched (although there’s need for higher permissions). Involving compositors from day 1 here it’s sadly likely it’ll remain a “you must be this tall to ride” problem for a while…

From working on a11y together with Emmanuele recently, I agree this is the right direction to go: Use properties and actions as much as possible, and avoid custom interfaces with methods. Adding parameters to actions would be nice, since it lets us expose a lot more functionality that we already have in this form. I would also recommend to that we take a serious look at the current interfaces, and just drop the ones that are either overlapping with others, or don’t add clear value. Examples:

  • Image seems redundant with Component (for the extent part) and redundant with Accessible (for the description)
  • Selection seems cumbersome and could probably just be replaced by a writable ‘selected’ property
  • Socket is about something we don’t support anymore

This reminds me that the registration on the accessibility bus is currently done by calling the Embed() method of the org.a11y.atspi.Socket interface on the AT-SPI registry object itself which is, quite frankly, horrific.

There’s really no need to register an application to begin with: just asking for all connected clients if they expose the root AT-SPI interface is enough. There’s literally no reason to have a separate registry daemon.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

What would be the replacement for org.a11y.atspi.Registry ?

There is no need to have a registry: the compositor can already track all applications.

We can have an interface for ATs to get notified from the compositor of a window focus change; at that point, the AT will either ask the compositor for the object path of the application to which the window belongs, and call a method on the application object itself to initiate a direct connection; or will have a connection already, and will get the event stream.

A new window appearing does not necessarily imply a focus change, but I guess that does not change the premise, the compositor could notify the AT of either event. But then, an AT must not have knowledge of how to talk to a given compositor/wm/whatever, so an abstraction will be needed: my point is, Registry does that that job already.

That one is another thing I’ve been battling with without finding any doc or an obvious API to initiate the direct connection. I guess your proposal on this is to keep the current mechanism, but what is it ?

There’s really no need for “an abstraction”: it can be a Wayland protocol, or a DBus interface.

No, it’s really not. And, in any case, I would not want to hijack the Registry interface as it is. At most, I’d rename the interface anyway, to version it properly.

No, my proposal is to get rid of the current mechanism, because it’s entirely based on a bus design. Applications don’t know whether an AT is actually listening in to accessibility events, which means they always have to broadcast to the ether what they are doing. The new direct, P2P connection would not need this separate bus, because ATs would connect to each application.

This also removes the need for ATs to track application focus with events coming from the applications, instead of using the compositor as they should, as well as removing the issue of ATs written using a toolkit with accessibility support suddenly appearing themselves on the accessibility bus.

What happens for non-wayland apps (x11, whatever) ?

DBus would work the same on Wayland and X11; to be fair, since this is GTK4-and-newer, I’m getting less and less interested in X11.

I mean, I thought there was an existing p2p mechanism already, is that a misunderstanding ?

There is some code that looks like it sets up a P2P connection, but:

  1. it’s not enabled by default
  2. it’s only used to set up a direct connection between the registry daemon and an undefined “something else”
  3. it’s really not how P2P connections over DBus work