Chrome Remote Desktop Support for GNOME/Wayland

Hi All,

(Cross posting her from the desktop mailing list since a helpful community member mentioned that people are not active on mailing list. Also, another member mentioned that there is a non-interactive/password driven option already - feel free to provide pointers for it, if that is the case.)

I am looking into supporting CRD for GNOME/wayland. CRD would be leveraging remote desktop APIs (along with screencast) as exposed by xdg-desktop-portal{,-gnome}. While experimenting with remote desktop APIs, I see that for enhanced security, an interactive dialog (relevant code: https://gitlab.gnome.org/GNOME/xdg-desktop-portal-gnome/-/blob/main/src/screencast.c#L345-349) is always presented to the user to select sources/devices they want to allow to be remote controlled. Though this workflow would make perfect sense when a user is directly connected to the machine and is allowing someone remote to take control (e.g. to get help from IT) but it is less than ideal for a user who is trying to access their own machine remotely (e.g. accessing their work computer from home).

I would like to hear ideas from the GNOME community about how to best support the latter use case for remote desktop. Is there a secure way to bypass the dialog/prompt selectively for apps? There seems to be recent support for restoring the capture streams (if persistence was demanded previously by the user) using flatpak permission store: https://gitlab.gnome.org/GNOME/xdg-desktop-portal-gnome/-/merge_requests/14 but remembering streams for remote desktop session is explicitly disallowed in the portal frontend ( though this doesn’t seem to be GNOME specific). Is it reasonable to extend the stream restoration support to work for remote desktop sessions as well as pre-populating the permission store to allow remote desktop session (so that user intervention can be avoided)? Also, would pre-populating the permission store work for system installation of CRD or would only work for flatpak/sandboxed version of the app?

Looking at how other software/systems are supporting screen capture/remote desktop, we see wlroots/sway allows for configuring the output screen to capture in a config file. I believe Windows.Graphics.Capture APIs can also allow Win32 apps to capture a window/screen without user interaction.

Would love to hear thoughts from the community about the best way forward.

Thanks,
Salman

Right now, there is no complete story of how actual remote login should work, in contrast to what you have discovered already works rather well: sharing your screen. The main reason for this is that the problems they introduce that needs solving are fundamentally different, and the needed solutions will likely be different too.

For example, when a user shares the screen, they are already by the computer, and there should be no way possible for an application to taking over the control of or share the content the system, without the user having explicit control. As you know, this problem has been solved by making both screen sharing and remote control part of the xdg-desktop-portal API family. A central point to these APIs are that each action where the user share something critical or personal, it should be an explicit choice, and we’ve gone great lengths in designing these APIs in ways that deliver this. This means for example screen sharing lets the user explicitly share the content to be shared, or the file being shared, in contrast to on/off permission toggles for “screen access” or “file access”, as the latter tends to work poorly in protecting private data.

So while this works fairly well for use cases when the computer itself has someone in front of it that can be in physical control, these solutions don’t work very well when noone is there, which tends to be the case with remote login functionality.

Instead, remote functionality comes with its own unique set of problems. With remote login, I see two fundamental things that needs careful considerations:

  • Continuing a session remotely must without doubt make sure that the physical session is not unlocked or turned on. Logging into your machine at the office from home should not allow colleagues in the office see what you are doing.

  • Remote login might have to actually start a new session as you might not have been logged in to begin with.

I’m not sure the portal APIs are a good fit for these use cases - both have no users physically present, and the latter sits on a completely different level in the system than portals; it must be a system component rather than something running inside a users session. The level of trust one needs to place on such a component is also fundamentally different from “applications”, which in a way invalidates the usefulness of portals.

Right now, GNOME Remote Desktop, which I suspect is what was being referred to regarding interactive/password things in the mailing list, only supports the use cases that you know already are supported by the portal APIs, but there are plans to expand functionality to allow the use cases you describe. The rough plan is this:

  1. Allow remote login to fully persistantly running headless sessions

This would mean a user session would contain a running GNOME session, using mutters headless mode in which it will not attempt to interact with input devices or monitors. The only way a user would interact with it is via a remote desktop service. The session would launched on boot, or manually via Cockpit/ssh or something similar.

This is somewhat possible already; one can with some effort launch a session like this, but it’s not yet supported properly, but hopefully will be relatively soon. This is the easiest possible solution, as it avoids both problems listed earlier.

  1. Allow making a headfull session headless, and a headless session headfull.

This would mean a user logged in to a machine physically, later locked it, and while the session was locked, logged in remotely. When logged in remotely, the display server would swich mode to fully headless, making sure input and output devices would not be interacted with. Implementing support in the display server to toggle between being headless and headfull is probably not very difficult, but needs to be done.

However, with this there are a few open questions and implementation details that are unsolved: taking back of the physical session should require unlocking; this means the display server must present an unlockscreen while being “headless”. This is more complicated than toggling between headless and headfull, since it involves two isolated presentation spaces that must not accidentally end up in the wrong place.

  1. Allow remote login and multi user systems.

This is a different type of problems than the other two; it would need a remote desktop service running on the system level that can take care of user selection, logging in, session launching, as well as handing over remote desktop connection to user sessions. I imagine this either needs integration with the gdm greeter, or logind itself, depending on how a login screen should work. As for how it would work past the login phase, I imagine it would be more or less the same as step 2 or step 1, depending on whether physical access would be needed.

I see three options for wider remote desktop service support beyond the existing screen sharing support via portals:

a) Twist and turn the portal APIs to handle the use cases where the user of the API is a more of a system component than an application. Personally I don’t think this is necessarily a good path to take, as it’s not the use case portals were designed to handle.

b) Start using private/unstable GNOME API and handle the consequences. Also not a very good choice IMHO, it wouldn’t be cross DE, thus require custom solutions for the same fundamental problem. On the other hand, it’d get access new features faster, as it’s less hindered by stability promises etc.

c) Create a new API under the org.freedesktop. prefix, and make it an xdg-spec under xdg / xdg-specs · GitLab. Personally, I think this is the best path forward; it might even make GNOME Remote Desktop usable in places that doesn’t implement the private GNOME APIs it uses, would it also support this API. Such an API I imagine would make different assumptions about trust compared to the portal APIs, while also making it impossible to used by regular applications, ensuring it wouldn’t be abused. I also assume it would make use of the same lower level technologies as both the GNOME APIs and portals, such as PipeWire for actual screen casting, and eventually libei for virtual input devices. I have no such API sketched out anywhere other than my head, but if there is interest, I could scribble it down.

1 Like

Thank you for the thoughtful response, Jonas.

Instead, remote functionality comes with its own unique set of problems. With remote login, I see two fundamental things that needs careful considerations:

  • Continuing a session remotely must without doubt make sure that the physical session is not unlocked or turned on. Logging into your machine at the office from home should not allow colleagues in the office see what you are doing.

Agreed. CRD refers to it as “curtain mode”.

The rough plan is this:

  1. Allow remote login to fully persistantly running headless sessions
    …
  2. Allow making a headfull session headless, and a headless session headfull.
    …
  3. Allow remote login and multi user systems.

That sounds reasonable. Thanks for the details around it and the intricacies involved. IIUC, #2 will allow the user to continue the remote session from where they left off in physical session (and vice versa)?

I see three options for wider remote desktop service support beyond the existing screen sharing support via portals:

a) Twist and turn the portal APIs to handle the use cases where the user of the API is a more of a system component than an application. Personally I don’t think this is necessarily a good path to take, as it’s not the use case portals were designed to handle.

b) Start using private/unstable GNOME API and handle the consequences. Also not a very good choice IMHO, it wouldn’t be cross DE, thus require custom solutions for the same fundamental problem. On the other hand, it’d get access new features faster, as it’s less hindered by stability promises etc.

c) Create a new API under the org.freedesktop. prefix, and make it an xdg-spec under xdg / xdg-specs · GitLab. Personally, I think this is the best path forward; it might even make GNOME Remote Desktop usable in places that doesn’t implement the private GNOME APIs it uses, would it also support this API. Such an API I imagine would make different assumptions about trust compared to the portal APIs, while also making it impossible to used by regular applications, ensuring it wouldn’t be abused. I also assume it would make use of the same lower level technologies as both the GNOME APIs and portals, such as PipeWire for actual screen casting, and eventually libei for virtual input devices…

I would also vouch for a portable, generic and secure solution i.e. option “c” above. However, I am also curious to know more about what private GNOME APIs are being referred to in option “b”.

… I have no such API sketched out anywhere other than my head, but if there is interest, I could scribble it down.

Thanks for volunteering here. I would be definitely interested in seeing more details on how this new API would look like.

1 Like

That sounds reasonable. Thanks for the details around it and the intricacies involved. IIUC, #2 will allow the user to continue the remote session from where they left off in physical session (and vice versa)?

Right, that’s the idea.

I would also vouch for a portable, generic and secure solution i.e. option “c” above. However, I am also curious to know more about what private GNOME APIs are being referred to in option “b”.

The APIs I’m talking about are org.gnome.Mutter.RemoteDesktop and org.gnome.Mutter.ScreenCast. They are used by GNOME Remote Desktop to implement the built in screen sharing support, and xdg-desktop-portal-gnome to implement the portal screen casting backend.

When transitioning to headless mode, could the display server disconnect from the local console altogether and kick back to the display manager? Then if the same user later logged in locally to the display manager, any ongoing remote session could be terminated and the display server reattached to the console. It seems like some of the needed display manager ↔ session communication is already present for user switching, and it would relieve display servers of the complexity of maintaining a single window with different inputs and outputs.

When transitioning to headless mode, could the display server disconnect from the local console altogether and kick back to the display manager? Then if the same user later logged in locally to the display manager, any ongoing remote session could be terminated and the display server reattached to the console. It seems like some of the needed display manager ↔ session communication is already present for user switching, and it would relieve display servers of the complexity of maintaining a single window with different inputs and outputs.

That should be possible, yes. That would indeed avoid complexity of e.g. showing a lock screen on a user session while the actual session is accessed remotely.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.