Right now, there is no complete story of how actual remote login should work, in contrast to what you have discovered already works rather well: sharing your screen. The main reason for this is that the problems they introduce that needs solving are fundamentally different, and the needed solutions will likely be different too.
For example, when a user shares the screen, they are already by the computer, and there should be no way possible for an application to taking over the control of or share the content the system, without the user having explicit control. As you know, this problem has been solved by making both screen sharing and remote control part of the xdg-desktop-portal API family. A central point to these APIs are that each action where the user share something critical or personal, it should be an explicit choice, and we’ve gone great lengths in designing these APIs in ways that deliver this. This means for example screen sharing lets the user explicitly share the content to be shared, or the file being shared, in contrast to on/off permission toggles for “screen access” or “file access”, as the latter tends to work poorly in protecting private data.
So while this works fairly well for use cases when the computer itself has someone in front of it that can be in physical control, these solutions don’t work very well when noone is there, which tends to be the case with remote login functionality.
Instead, remote functionality comes with its own unique set of problems. With remote login, I see two fundamental things that needs careful considerations:
-
Continuing a session remotely must without doubt make sure that the physical session is not unlocked or turned on. Logging into your machine at the office from home should not allow colleagues in the office see what you are doing.
-
Remote login might have to actually start a new session as you might not have been logged in to begin with.
I’m not sure the portal APIs are a good fit for these use cases - both have no users physically present, and the latter sits on a completely different level in the system than portals; it must be a system component rather than something running inside a users session. The level of trust one needs to place on such a component is also fundamentally different from “applications”, which in a way invalidates the usefulness of portals.
Right now, GNOME Remote Desktop, which I suspect is what was being referred to regarding interactive/password things in the mailing list, only supports the use cases that you know already are supported by the portal APIs, but there are plans to expand functionality to allow the use cases you describe. The rough plan is this:
- Allow remote login to fully persistantly running headless sessions
This would mean a user session would contain a running GNOME session, using mutters headless mode in which it will not attempt to interact with input devices or monitors. The only way a user would interact with it is via a remote desktop service. The session would launched on boot, or manually via Cockpit/ssh or something similar.
This is somewhat possible already; one can with some effort launch a session like this, but it’s not yet supported properly, but hopefully will be relatively soon. This is the easiest possible solution, as it avoids both problems listed earlier.
- Allow making a headfull session headless, and a headless session headfull.
This would mean a user logged in to a machine physically, later locked it, and while the session was locked, logged in remotely. When logged in remotely, the display server would swich mode to fully headless, making sure input and output devices would not be interacted with. Implementing support in the display server to toggle between being headless and headfull is probably not very difficult, but needs to be done.
However, with this there are a few open questions and implementation details that are unsolved: taking back of the physical session should require unlocking; this means the display server must present an unlockscreen while being “headless”. This is more complicated than toggling between headless and headfull, since it involves two isolated presentation spaces that must not accidentally end up in the wrong place.
- Allow remote login and multi user systems.
This is a different type of problems than the other two; it would need a remote desktop service running on the system level that can take care of user selection, logging in, session launching, as well as handing over remote desktop connection to user sessions. I imagine this either needs integration with the gdm greeter, or logind itself, depending on how a login screen should work. As for how it would work past the login phase, I imagine it would be more or less the same as step 2 or step 1, depending on whether physical access would be needed.
I see three options for wider remote desktop service support beyond the existing screen sharing support via portals:
a) Twist and turn the portal APIs to handle the use cases where the user of the API is a more of a system component than an application. Personally I don’t think this is necessarily a good path to take, as it’s not the use case portals were designed to handle.
b) Start using private/unstable GNOME API and handle the consequences. Also not a very good choice IMHO, it wouldn’t be cross DE, thus require custom solutions for the same fundamental problem. On the other hand, it’d get access new features faster, as it’s less hindered by stability promises etc.
c) Create a new API under the org.freedesktop. prefix, and make it an xdg-spec under https://gitlab.freedesktop.org/xdg/xdg-specs/. Personally, I think this is the best path forward; it might even make GNOME Remote Desktop usable in places that doesn’t implement the private GNOME APIs it uses, would it also support this API. Such an API I imagine would make different assumptions about trust compared to the portal APIs, while also making it impossible to used by regular applications, ensuring it wouldn’t be abused. I also assume it would make use of the same lower level technologies as both the GNOME APIs and portals, such as PipeWire for actual screen casting, and eventually libei for virtual input devices. I have no such API sketched out anywhere other than my head, but if there is interest, I could scribble it down.