Sandboxing portal

Opening a thread here to follow up on discussions at GUADEC with @matthiasc @alexl @jamesh @kenvandine @chergert @hadess Olivier Tilloy, myself and whoever else I was able to corner. For context, at present Flatpak totally blocks the syscalls necessary to set up your own filesystem/user namespace (see https://github.com/flatpak/flatpak/blob/master/common/flatpak-run.c#L2476-L2480) and necessarily blocks setuid binaries.

Chromium’s sandboxing relies on either being able to directly open a user namespace (ie the system has unprivileged user namespaces enabled) or relying on its setuid helper to do so. In the case both of these fail, Chromium errors out, which has precluded running Chromium inside a Flatpak for the time being, without patching out the sandbox code, which seems deeply irresponsible, or passing --disable-sandbox which earns you a very scary warning.

It seems recently that Electron has either updated to a newer Chromium or changed their build defaults, which means that various third-party Electron apps are starting to fail to run inside Flatpak too. This is sad.

Although I have not read the code, speaking with @jamesh and @kenvandine it seems that snapd has a special consideration for browsers, based off a whitelist or a rarely-given privilege, that trusts them to create user namespaces. It wasn’t clear how Electron apps avoid the same fate, given they don’t possess this privilege, but (conjecture) Electron builder itself has a mode where it outputs snaps, so perhaps this also disables the sandboxing.

Similarly, we also have the oddly-inverted situation that certain operations (eg thumbnailing, see https://gitlab.gnome.org/GNOME/gnome-desktop/blob/master/libgnome-desktop/gnome-desktop-thumbnail-script.c#L766-770) which should/could be very heavily sandboxed are run with no additional confinement when they are executed in a Flatpak (and presumably snap?) context.

@refi64 has a proof-of-concept patch to Chromium which shells out to flatpak-spawn at the time it would otherwise call the setuid sandbox helper, but it strikes me that this Flatpak-specific approach is the wrong way to do it and isn’t likely to be upstream-able, so it also wouldn’t make it very far through the Electron ecosystem either.

I think if we work on a shared sandboxing portal API which both Flatpak and snapd can implement, we can remove any special-casing for browsers and make a consistent/united case towards toolkit and app developers about how stricter sandboxing should be accessed for confined apps, improving security and our chances of getting one “golden” (ie works everywhere) patch upstream.

My plan would be something like:

  • Discuss the Flatpak sandboxing API and how it could be made generic to work with snapd as well (see https://github.com/flatpak/flatpak/blob/master/data/org.freedesktop.portal.Flatpak.xml)
  • Implement the agreed API all round
  • Re-implement our Chromium patch in terms of calling the portal API over D-Bus and submit upstream
  • Submit to Electron in parallel if that lets us stop breakage sooner
  • Push towards using that API in Firefox as well
  • Drop browser hacks all round
  • Add a “sandbox spawn” API to Glib that could use this or fallback to eg bwrap/seccomp when run on a Linux system (other platforms may be available)
  • Celebrate improved desktop security
8 Likes

Also add WebKitGTK to that list :stuck_out_tongue:

So I wanted to use flatpak-spawn for our purposes there and there were a few limitations that I hope a replacement will cover:

  • Ability to specify directories to mount in the sandbox. Currently it just gives you a shared dir which isn’t ideal.
  • More granular disable toggles. Currently it just has --sandbox and --no-network. I would like --no-x11, --no-dbus (except I do want xdg-desktop-portal access so maybe --filtered-dbus or something, I realize that is probably WebKitGTK specific since Chromium doesn’t care about the portals), etc.

Add a “sandbox spawn” API to Glib

My only concern with this is that its impossible to automatically run a sandbox using host data that covers all usage so you’ll have to make applications set their own mounts at which point is a very thin wrapper that doesn’t hide much details. Maybe that is OK but I don’t know if it has a ton of value.

:tada:

I suspect that we won’t be able to (like flatpak-spawn does) start with the assumption that “everything shared will be the same as flatpak does, apart from…” because we need something which can be implemented the same in snapd, and it has a different approach to structuring its filesystem etc.

I am a little puzzled though (and my apologies for not making it to your talk; it’s on my “wait for the video” list) - which part of WebKitGTK wants access to portals and filesystems? Does it not have a UI / renderer split similar to Chromium where the renderer is very unprivileged and so quite a reasonable sandbox boundary?

In a Glib API that might, eg, want to fall back to not sandboxing but still executing, where the OS/environment limits the available technique, an “assume very little, add things incrementally” approach of finding a set of sandbox controls which are in common/useful set between Flatpak and snap might work OK.

Some random side notes that might be interesting:

  • We’d need to be very careful to make sure the sandboxes are consistent with each other. For instance, a fallback sandbox’s PID namespace would probably be a child of the caller’s, but with flatpak-spawn they’re fully isolated and in parallel.
  • Apparently, you can still use seccomp/bpf sandboxes in Flatpak? I was kinda surprised that it worked but surely enough it did. Nowadays, that alone is a pretty powerful sandboxing technique; heck, the only thing missing on top of it in Chrome that requires us to call out to flatpak-spawn is lack of unshare.
  • More granular controls like @pgriffis said would be nice IMO (right now the GPU process needs to run unsandboxed because otherwise it will fail to access the Xorg display), but it might be a hard sell to upstream Flatpak.

It is split to a degree but the WebKitWebProcess still has GTK usage.

As for file system access applications can embed arbitrary plugins in the process (WebKitWebExtensions) and we allow giving them limited filesystem access. WebKit has a few internal directories we grant access to also that are less arbitrary but still don’t fit into the limitations of flatpak-spawn.

I think in terms of our patch for Chromium/Electron we just need to make sure that the code interacting with the sandboxed PIDs is able to work with a different PID namespace. Ie make this an assumption of the API, that a new PID namespace might be unavoidable.

Yes, they can be composed safely because essentially you can’t later permit something that was earlier (by a more priveleged process that launched you) denied. You can only add them and deny more stuff.

Well, @alexl is here (or can be) and I am hoping to get @jamesh and Snap folks to weigh in soon. But, I think “this needs display access” or - perhaps more nuanced, considering we have Wayland-or-X11-fallback behaviour in Flatpak - “inherit my display access” is not an unreasonable ask for the API.

Hi. Sorry for dropping the ball on this. I have discussed the basic requirements of this API with some of the core snapd developers (Zygmunt Krynicki in particular). At present it is not trivial for us to support the API, since the mount namespaces, AppArmor profiles and seccomp filters for snaps are generated at package install/upgrade time.

I’m at a sprint with the snapd folk this week, and we have a few meetings scheduled to discuss it. I will keep you guys updated on what happens.

Hello. This is Zygmunt from the snapd team. We discussed this with @jamesh and we’d like to work together towards making this a reality. It’s not something we can easily deliver, mainly because the confinement models differ, but we’d like to have a deeper conversation on how to resolve the problems, issue by issue, so that we can come to a solution that is practical and sensible for everyone.

James is much more familiar with the flatpak implementation and he will be the primary contact point. I’m just saying hi, offering to answer snap-side questions and to help if necessary.

Following on from @zyga’s post, there were a few issues that came up in discussion:

  1. How dynamic do these sub-sandboxes need to be? With snapd’s current model, it would be relatively easy to configure additional confinement profiles for a snap on install, and then wire up the common API to allow use of these profiles. If the app requested a sandbox configuration that had not been configured, the call would fail.

    This seems like it would easily cover things like “no network” or “locked down filesystem access”. I’m less certain about sharing individual files with a subprocess, since I assume it would be fairly common to run multiple similar subprocesses with access to different sets of files.

  2. I think we need to be a bit more rigorous about what the sandbox restrictions actually mean. For example, “no network” could mean any of the following:

    • a seccomp filter will abort the process if it attempts to use any networking related system calls.
    • an LSM will block the creation or use of sockets for certain address families (e.g. even if it is one of the file descriptors passed to the subprocess).
    • there are no restrictions on creating sockets, but the process is trapped in a network namespace that isolates it from the world.

    This doesn’t necessarily mean that all confinement systems need to implement their restrictions in the same way, but it is important that app developers know what to expect.

  3. Do we have an initial batch of apps acting as use cases for this portal API that we could experiment with to test out various implementation strategies?

2 Likes

Speaking just for webkit only filesystem access is dynamic, other permissions can be known ahead of time.

Personally that is what I’d like to see and it is how all 3 web engines currently limit network access.

At the moment, snapd’s networking restrictions are based around a combination of the first two, which is why I brought this particular issue up. An application designed to avoid all network calls will work in both types of sandboxes, while one that assumes other hosts will simply be unreachable won’t. Are they choosing network namespaces because that’s what they need, or is it just because that’s what happens to be possible as an unprivileged user?

So nailing down what the guarantees of each of these sandbox restrictions mean is important.

Well, they choose it because its the best API for the job. Aborting the process any time it tries to use a network related syscall is absolutely going to cause problems.

Olivier here, hi @ramcq and everyone, and sorry for being late to the party.

I wasn’t sure myself how this worked, but from the look of things electron-builder’s snap mode defaults to requesting connection to the browser-support interface, but with allow-sandbox set to false (see isBrowserSandboxAllowed(…) function).

allow-sandbox: true requires the snap publisher to be trusted for auto-connection to be allowed. This is typically the case for browser vendors, not for random electron app publishers.

Hi all, I was reminded of this issue by Skype removing the older .deb (which didn’t use Electron sandboxing) from their archive, breaking the Flatpak unless we deploy nasty hacks (https://github.com/flathub/com.skype.Client/pull/77). But it seems like the electron-builder snap mode essentially does the same (quietly disables sandboxing for Electron) so maybe this is just the status quo at present. :slight_smile:

In terms of requirements, I think we can maybe be even more tactical here and set our initial requirements based on precisely the privileged operation that the Chromium sandbox attempts to do, but is not able to due to the seccomp filter. I think this will start with a very restricted set, and then we would need to test that against additional use-cases such as WebKit Gtk+ (although including Gtk+ in a sandboxed part and requiring filesystem access seems like a very odd sandbox boundary to me).

As a maybe dumb/naive question - if this portal was implemented specifically in terms of bwrap, is there a way that snapd could start a bwrap instance (ie potentially opening some different filesystem/process/network/etc namespaces) that inherited all of the restrictions that a particular snap has? So there wouldn’t need to be any dynamism in snap’s policies beyond what it was already restricting, just additional degrees of freedom (ie, restriction) based on what was dynamically enabled by bwrap?

The reason this appeals to me is that a very reasonable idiom for sandboxed code (eg in GdkPixbuf loading images, or generating thumbnails, or whatever) could then be:

if (sandboxed)
  ask_the_bwrap_portal (restrictions);
else
  run_bwrap (restrictions);

GTK is used by the web process because that’s the only practical way to render widgets with GTK style. I think it would be pretty hard to exploit WebKit via the GTK theme support, and anyway, if you don’t trust GTK, then it’s better to use it sandboxed than not. That said, we’ll be dropping the GTK support for GTK 4 regardless due to theme changes and dark mode problems (Firefox is considering this too for the same reasons).

The web process certainly does not require host filesystem access beyond the usual paths also whitelisted by flatpak; that would defeat the point of having a sandbox.