Understanding GNOME Shell’s focus stealing prevention

Focus stealing prevention exists for two main reasons: One is security, since we need to prevent rogue apps from deceiving users into e.g. typing their password into another window. If apps can silently claim keyboard focus and open their own window over the currently focused one, this enables phishing and other similar attacks. The other is user experience: Even if an app isn’t maliciously taking over your focus, it can be annoying to have a new window popping up while you’re typing something and have half your sentence end up in the wrong app.

At the same time there are cases where you want apps to be able to request focus, for example when clicking a link in a chat app and wanting it to open in the browser. In this case you want the focus to move to the browser window.

This is why our compositor library mutter implements focus stealing prevention mechanisms, which allow the currently focused app to request that a specific other app be allowed to claim focus now.

is ready??

Most users have probably seen an “ is ready” notification in GNOME Shell at some point. Unfortunately this notification doesn’t really explain why it’s being shown and what’s happening, which may cause confusion.

Because of this there have been proposals to disable focus stealing prevention until it works better (mutter issue 673), and a number of GNOME Shell extensions).

Screenshot of a GNOME Shell notification showing that Telegram Desktop Media viewer is ready

These are the main cases where the notification is shown:

  •  A new window is opened and either the launcher app, or the launched app doesn’t implement the XDG Activation protocol or the startup notification specification
  •  An app requests focus for one of its windows, but was not activated in a valid way (e.g. because it wasn’t started by a user action)
  • An app requests focus for a new window, but it’s slow to start and in the meantime there are additional user interactions. In this case we don’t want to interrupt, and show the notification so people can switch at their convenience.
  • An app is launched from an environment that isn’t able to use the XDG Activation protocol (e.g. a terminal)

The protocol responsible for this, XDG Activation, the Wayland equivalent to the X11-specific startup notification spec was introduced somewhat recently (2020), and needs to be adopted by UI toolkits. GNOME 46 and 47 saw a few fixes and the feature was polished both in the client toolkit side (GTK and xdg-desktop-portal, as well as in the compositor implementation mutter, but there are still cases where XDG activation isn’t hooked up properly.

How XDG activation works

Flow xdg activation protocol.
XDG activation flow for moving focus between two existing windows

The way the protocol works is that the currently focused app asks the compositor to create a token linked to the focused window (Wayland surface) and the most recent user interaction (an input event serial associated with a seat).

This token is then used by the app that should receive focus when it requests to be activated. In GNOME Shell, activation means that the the window receives focus and is placed on top of other windows. An activation token may still be rejected, for example if the window linked to the token doesn’t have focus or when the linked user interaction isn’t recent enough.

In addition to handling focus, GNOME Shell also tracks app launching. Until the new app window is actually shown, GNOME Shell uses a “loading spinner” mouse cursor to indicate to the user that the app is loading. If the app doesn’t implement the XDG Activation protocol, the loading indicator only disappears after a timeout because GNOME Shell doesn’t know that the application finished loading and has presented the target window.

The protocol doesn’t define how tokens are given to the target app. One reason for this is because it depends on how the app is started. The main options are:

  • Setting the XDG_ACTIVATION_TOKEN environment variable
  • D-Bus Activation using the platform-data field, which contains the activation token
  • XDG portals that will launch an app (e.g. the OpenURI or OpenFile portals)

The target app then needs to collect the token and use it to have its window activated to receive focus and to signal to the compositor that it started successfully.

Not smart enough

When I started looking into how our focus prevention mechanism works to investigate the issues mentioned above, I was initially pretty confused. There were a lot of cases where the focus window switch worked fine, but other times it wouldn’t. I realized quickly that with existing windows, the “ is ready” notification is shown, but new window would get focus immediately.

This struck me as odd: Why are new windows allowed to do whatever, but existing windows are restricted in the way they can take over focus?

I first thought this was some sort of bug, but then I discovered that the behavior was by design: Mutter has a gsettings property called focus-new-windows that controls the focus stealing prevention mechanism. This property can be strict or smart (the latter being the default).

  • smart means that in most cases new windows get focus (even without asking for it) and are raised to the top of the window stack
  • strict means they get focus (are “activated”, in technical terms) only when they are actually supposed to

The smart mode exists in part because there are some cases where our current focus prevention system does not work well. These issues include:

  • Launching apps via terminal (vte issue #2788). The main issue is that the terminal executing a command does not know whether that process will present a window or not. For example, if you launch vim there’s no new window, but if you launch firefox there is.
  • Launching apps via Run a Command in GNOME Shell (gnome-shell issue #7704) shares similar issues as running apps from the terminal
  • Apps launched via custom keyboard shortcut (e.g. set up in Settings > Keyboard > Keyboard Shortcuts)
  • The lack of implementation of the appropriate protocols in apps or toolkits

Because the cases where a new window is opened are a significant percentage of the overall cases where focus prevention is triggered, this smart mode is making it appear as though apps actually implement the XDG Activation protocol, even if they don’t. While it does somewhat reduce annoyance for users, it gives developers the false impression that they don’t have to do anything.

It also makes it harder to debug issues where something doesn’t work as expected or is missing the correct implementation. For example, even in GTK4 the focus transferring is broken in some cases and took a long time to be discovered (gtk issue #6711).

Security implications

Unfortunately the current situation with smart as the default means that we’re not getting most of the benefits of focus stealing prevention. Apps are able to spawn a new window over your current one and grab keyboard focus, because the smart mode just gives the new window focus, circumventing the safety measures. This is trivial to exploit by malicious apps: All they need to do is open a new window, and focus stealing prevention doesn’t apply.

Next steps

While some people have asked for focus stealing prevention to be disabled completely until it’s implemented by most apps and toolkits, I’m not sure this is the best way forward. If we did that, nobody would notice which apps don’t implement it, so there’d be no reason for toolkits to do so.

On the other hand, there are some remaining issues around terminal applications and similar use cases that we don’t have a plan for yet, so just switching to strict to flush out app bugs isn’t ideal either at the moment.

  • There is currently no consensus in the team as to how to proceed. The two main directions we could take are:
  • Switch to strict mode by default (mutter issue #3486) once a few remaining issues are resolved, perhaps with a “flag day” deadline so apps have time to implement it.
  • Slowly make the smart mode stricter over time.

Either way we need to raise more awareness of the issue to get app and toolkit developers interested in improving things in this area, which this blogpost is a part of 🙂

It’d also be helpful if more people (especially developers) turn on strict mode on their system, so we get more testing for which apps work and which don’t. This is the relevant gsetting:

gsettings set org.gnome.desktop.wm.preferences focus-new-windows 'strict'

Thanks

Thanks to the Sovereign Tech Fund for allowing me to take the time to properly work through this as part of my broader effort around improving notifications. Thanks also to Sonny Piers and Tobias Bernard for organizing the STF project, Florian Müllner, Sebastian Wick, Carlos Garnacho, and the rest of the GNOME Shell team for reviewing my MRs, and Jonas Dreßler and Jonas Ådahl for reviewing the blogpost.


This is a companion discussion topic for the original entry at https://blogs.gnome.org/shell-dev/2024/09/20/understanding-gnome-shells-focus-stealing-prevention/
7 Likes

I have been interested in fixing these issues for a long time. I asked many people why it’s not working and nobody seemed to know where to start. So it’s great to have an idea now how things are supposed to work.

I get the “is ready” notification if I drag an image from Nautilus into Loupe and Loupe calls gtk_window_present. How would I debug this? How would Loupe get the token from Nautilus in this case? Where would I need to report this?

PS: This is still on Shell 46.

1 Like

DND (under Wayland) uses the protocol bits under wl_data_device (wl_data_device, wl_data_source and wl_data_offer). To achieve proper “startup notification” in this case, I can think of a couple of options to explore:

  1. Add a way to create an XDG activation token from a wl_data_device.drop event (received by destination), which it would then used by Loupe to activate itself
  2. Add a way to create an XDG activation token from a wl_data_source.dnd_drop_performed event (received by source) which it’d create a XDG activation token that would be passed via the wl_data_source/wl_data_offer data transfer objects (using a new pair of request/event some how). Nautilus would request the token, and Loupe would use it to activate itself after the DnD drop.

The former is probably simpler - might be enough with only a new event with a serial number and a seat, while the latter would be more complicated due to the direction of events during a DnD drop, and the need to refer to a concept of a protocol extension in the core protocol, but would allow the source to have a say whether the destination should be allowed to activate itself or not.

Is that something that would be worth asking distributions to publicize in their beta release announcements / somehow announce on their forums or mailing lists/etc.? Thinking that, for example, if folks who were going to download/learn about the Fedora 42 beta, or who looking to enable a testing repository on a rolling release like Tumbleweed or Arch, got the news about this work and how they can contribute then that might reach them in a way that news straight from the desktop environment project might not?

(That’d be based on a theory that folks who are seeking out a beta are more likely to be interested in contributing something back to the project, which would…hopefully be true?)

If that’s something already considered or already in progress, please ignore and apologies for the interruption!

No, we already know such a huge amount of issues that are not worked on, I don’t think there is any need to make many people aware of this.

I have enabled ‘strict’ now and now just opening an image or an audio file from Nautilus gives an “is ready” notification as well.

Same for calculating 1+1 in the overview and opening the calculator from there.

I also get “is ready” when opening Pika Backup from a shell notification if it’s running in the background while it works with Clocks. I guess the issue is there that I’m calling a custom action in Pika Backup to show a specific view instead of just “activate”? (This issue also exists without ‘strict’)

I’m still not sure where to report these issues. Generally, it looks like focus stealing prevention with ‘strict’ stops almost all cases that need window focus from working which I can think of.

1 Like

@sophieherold You probably see a bug in mutter, we have a bug which breaks a lot of apps in strict mode unfortunately. In 47 we have a fix for it.

If it still happens in GNOME 47, I’d open this on nautilus, since nautilus is the launcher, and likely isn’t creating and forwarding a token.

I think first gnome-shell. The activation is done via org.gnome.Shell.SearchProvider2.ActivateResult. Sadly it doesn’t have a a{sv} argument, and only passes a timestamp. Ideally one would send a token via the ActivateResult call, but we’d have to add ActivateResult2 since ActivateResult isn’t changable.

Then, gnome-calculator, to support plumbing the token from the activation to gtk when showing the calculator window.

I think still the activation token isn’t plumbed via org.freedesktop.portal.Notification; @jsparber, perhaps you can provide a rough state-of-affairs regarding the work there?

The portal normally uses D-Bus activation (we call it exported GActions) for starting an application so it should just work including GNOME 46. Apps can also use the signal org.freedesktop.portal.Notification::ActionInvoked which doesn’t support passing around the activation token.

I think, I found the reason why you see the “is ready” notification. Pika doesn’t have “StartupNotify=true” set in the desktop file. GNOME Shell in some cases uses DBus activation directly in other cases it uses GLib’s wrapper methods which respects the “StartupNotify” option. GLib only adds a activation token the “StartupNotify” option is set to “true”. Long term it may sense to change GLib’s behavior when the option is missing.

That’s something I initially wanted to explain with this post, but sadly it really depends on the specific situation. What i can recommend is to try to follow the activation token and make sure that it’s handed over correctly.

Very interesting learning why and how this works! Also good to know about the focus-new-windows property.

The actual behavior is also clearly not ideal, but I would be more likely to deem this a non-issue if Adwaita styling made focus status and changes more immediately apparent.

Most of the time, I’d absolutely prefer it if an app raised itself when I get a focus notification.

I don’t think I’d care for apps stealing focus while typing or actively moving the cursor, but I think there are better solutions between relying on interacting again with a notification to actually launch an app and the app launching, sniping keystrokes, possibly inserting itself under your cursor, while providing a pretty damn subtle indication that focus has changed.

I use the Forge extension for tiling, so my preferred behavior would be for the spawned window to tile itself to split a region not under my cursor, then let it take focus once my mouse moves over the new window.

Not sure if this behavior could be worked into vanilla GNOME, but cursor avoidance, not stealing keystrokes, and more obvious animations/styling seem like the most important takeaways in most setups.

The principle of least disturbance seems like it would be a good rule of thumb. Like if “raising” an existing window means I’m ripped out of my context to the workspace 3 left of my current one, I’d prefer the notification. Ideally, the existing window would move to a region of my current workspace that doesn’t interfere with my cursor.

On mobile, this would be less of an issue because the OSK could hide when switching to the newly-spawned app.

But does this have anything to do with enabling previews in folders, both for video and image? Because I haven’t heard anything about it.