Gdbus behavior cgroup freeze

adrianvovk · March 12, 2024, 5:18pm

Hello! I’m working on integrating systemd-homed w/ GNOME as part of the STF grant. systemd-homed will freeze the user session whenever it is locked, and I want to avoid some issues that may come up from that.

Here’s my question: Let’s say I have a gdbus client that calls some method on some server running outside of the session, then the client is frozen via a cgroup freezer or some other mechanism in the kernel. While the client is frozen the server processes the method call and returns a result or error. The dbus broker will queue this response back to the client, but the client is frozen and thus won’t immediately process it. Eventually, potentially hours later, the client is unfrozen by the cgroup freezer and can start processing events again. Will there be issues that come up because of this sequence of events?

My concern is timeouts: if the method call is given, say, a 25 second timeout. Let’s say the system is frozen for 4 hours; the kernel holds the method reply in a buffer waiting for the client to read it out. Upon resume, we get unlucky and the timeout fires before the POLLIN handler for the dbus socket. The monotonic clock continues to run during a cgroup freeze, so the timeout source func will see that it’s been way longer than 25 seconds and return a timeout error for the dbus method call. Then the POLLIN handler will fire, read out the actual method reply, but by then it’s too late. How feasible is this scenario?

I suspect a possible fix is to make the POLLIN source func a higher priority than the timeout, to ensure that all events are read out of the dbus socket before the timeout can fire. I tried to quickly check if this is already the case in gdbus; seems like it’s not but I don’t quite fully grok gdbus’s internals so I don’t think that’s a conclusive answer. Neither the G_IO_IN source func nor the timeout source func have any kind of priority set on them. So I’m asking here.

Thanks

adrianvovk · March 12, 2024, 5:55pm

Just gave it a test. gdbus doesn’t seem to handle this correctly

Here’s my test setup:

#!/usr/bin/python3
from gi.repository import GLib
import dbus
import dbus.service
from dbus.mainloop.glib import DBusGMainLoop
from time import sleep

class Example(dbus.service.Object):
    def __init__(self, object_path):
        dbus.service.Object.__init__(self, dbus.SessionBus(), object_path)

    @dbus.service.method(dbus_interface='com.example.Sample', out_signature='s')
    def Ping(self):
        print("Sleeping...")
        sleep(5)
        print("Replying...")
        return "pong"

if __name__ == '__main__':
    dbus.mainloop.glib.DBusGMainLoop(set_as_default=True)

    session_bus = dbus.SessionBus()
    name = dbus.service.BusName("com.example.Sample", session_bus)
    object = Example('/com/example/Sample')

    mainloop = GLib.MainLoop()
    mainloop.run()

Then on the command-line, with different possible TIMEOUT values:

# In one shell
$ systemd-run --user -S
Running as unit run-RANDOM.service
$ gdbus call -e -d com.example.Sample -o /com/example/Sample -m com.example.Sample.Ping -t TIMEOUT
$ busctl call --user com.example.Sample /com/example/Sample com.example.Sample Ping --timeout TIMEOUT

While running systemctl freeze/thaw run-RANDOM.service in another shell to freeze/thaw the commands. Here’s what I discovered:

Without freeze/thaw, setting TIMEOUT to 4 will cause both implementations to time out
Still with TIMEOUT set to 4: if I freeze while it’s waiting, then let the service send its reply, then thaw, I get different behaviors out of the two implementations. gdbus returns a timeout error. sd-bus returns the data that came in while it was frozen, even though that data technically came in after the timeout should have lapsed
Now I changed the TIMEOUT to 6 and repeated the tests. Without freeze/thaw, neither implementation times out
Still with TIMEOUT set to 6, I do the freeze, wait for send, then thaw. gdbus in this case will time out, even though technically the server responded before the timeout should have fired! This is the issue I was worried about. sd-bus doesn’t exhibit this behavior, and returns the data

In conclusion of my test: looks like gdbus will process the timeout first, ignoring any data that is available in the queue. Looks like sd-bus will do the opposite: process all data in the queue first, ignoring any timeout that has fired. Ideas on how to fix this welcome!

system · March 26, 2024, 5:56pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.