G_main_context_iteration and signal-safety(7)

Hi All,

Playing around with ‘gsettings get/set org.gnome.mutter check-alive-timeout’ to avoid an annoying popup, resulted in a completely unresponsive gnome. Injecting my code with g_main_context_iteration()s at regular intervals didn’t seem the way to go either.

As I remembered from the early days, timers offer a great solution when it comes to frame rates. So I created a timer, using ‘timer_create’ and ‘timer_settime’ using a SIGEV_SIGNAL notifier, because timers created using ‘g_timeout_add’ stop executing the timer callback when we are inside any GTK signal handler like GtkWidget::realize. Next, I bumped into signal-safety(7) and noticed that ‘g_main_context_iteration’ is not necessarily a signal-safe function. The program indeed crashes with a SIGSEGV even before the main window shows up when calling ‘g_main_context_interation’ from the timer signal handler.

Can anything be done about the signal-safety of g_main_context_pending() and g_main_context_iteration() and what are my other options?

Best regards,
Mischa Baars, The Netherlands.

No, it’s basically impossible to make non-trivial code async-signal-safe, and apart from the few functions in GLib which are already documented as being async-signal-safe, no other functions ever will be.

I don’t understand what the problem is you’re trying to solve, or how you’re trying to solve it, but libc timers and anything involving sigevent is guaranteed to not be the solution. sigevent is a fairly old, underused set of libc APIs which do not fit in well with anything based around poll() (which is all of GLib and GNOME and basically all modern Linux).

Investigating what’s causing the unresponsivity (is this in an app you’re writing, or the desktop as a whole?) would be the direction to proceed in.

1 Like

Hi Philip,

Never mind, it’s already working. I moved the ‘timer_insert()’ / ‘timer_delete()’ from the application level to the start and the end of the loop in question and now it’s working (with a frame rate of 60 frames per second).

Apparently it’s not ‘g_main_context_interation()’ that is not signal-safe, it’s the window as a whole that is not yet on the screen that is causing trouble.

What is causing the application to become unresponsive, is that 32 threads are being used to download data from a database into a GtkTreeStore, each thread taking a significant number of milliseconds to finish the download. For 18000 rows, it takes up to 10 minutes to fill the GtkTreeStore. During the load, a GtkProgressBar is updated after the start of each thread, but the window as a whole isn’t without explicitly calling ‘g_main_context_iteration()’ since we’re still inside a GTK signal handler.

The progress bar is now updated 60 times per second, instead of after ‘gtk_progress_bar_set_text()’ and ‘gtk_progress_bar_set_fraction()’. There is more code like this that cause Gnome to complain, like the code for sorting these 18000 rows for example. I think that that too, is a thing of the past now.

Thank you for your help!

Best regards,
Mischa Baars.

How are you updating the GtkTreeStore? Are you calling methods on it directly from the 32 worker threads, or are you scheduling updates to it by calling something like g_idle_add() to marshal data back to the main thread?

A GTK application should only become unresponsive if:

  • It has deadlocked in the traditional way due to two mutexes being accessed out of order; or
  • A source on the GMainContext in the main thread is taking too long to return to the main loop; or
  • There are too many high-priority sources being dispatched on the GMainContext in the main thread and they are starving the UI sources of the time needed to redraw the UI.

It sounds like this isn’t a traditional deadlock, but rather that the GMainContext in the main thread is being overloaded and UI updates are being starved out. You might want to look into making sure that not too many sources are being run on it, perhaps by doing bulk updates to the GtkTreeStore rather than individual updates, and/or updating the progress bar less frequently (no human is going to be able to discriminate fractional updates to it at 60fps).

| pwithnall Philip Withnall GNOME Team
April 13 |

  • | - |

How are you updating the GtkTreeStore? Are you calling methods on it directly from the 32 worker threads, or are you scheduling updates to it by calling something like g_idle_add() to marshal data back to the main thread?

A GTK application should only become unresponsive if:

  • It has deadlocked in the traditional way due to two mutexes being accessed out of order; or
  • A source on the GMainContext in the main thread is taking too long to return to the main loop; or

You got it there. It takes too long to handle signals like GtkWidget::realize and return to the main loop. Then Gnome thinks the application has become unresponsive.

  • There are too many high-priority sources being dispatched on the GMainContext in the main thread and they are starving the UI sources of the time needed to redraw the UI.

It sounds like this isn’t a traditional deadlock, but rather that the GMainContext in the main thread is being overloaded and UI updates are being starved out. You might want to look into making sure that not too many sources are being run on it, perhaps by doing bulk updates to the GtkTreeStore rather than individual updates, and/or updating the progress bar less frequently (no human is going to be able to discriminate fractional updates to it at 60fps).

Frequency is not the issue. Even if I were to update the progress bar at 50% and 100% only, it still had to be manually redrawn by calling ‘g_main_context_iteration()’ because we’re operating outside the main loop’s scope.

I was simply trying to avoid all kind of manual insertions of ‘g_main_context_iteration()’ throughout the code by installing a timer that does exactly that.

With the timer installed the annoying popup is gone. Problem solved.

Your code is almost certainly doing the wrong thing. The realize signal should not block unless your code is explicitly blocking there. And you should never be calling any GTK functions directly or indirectly (which the call to main_context_iteration does) from a signal handler. There is a very high chance you corrupt internal GTK state that way. If it works at all, it is by pure stroke of luck with that undefined behavior.

AFAIK in general, the only safe way to send a notification back to the main thread from a signal handler is to write to a self-pipe. But that is the same as the glib timers as it still requires waking the main thread, so you cannot solve your problem that way.

You shouldn’t have to call g_main_context_iteration() manually to get things to work. This sounds a lot like a hack which is going to cause you problems in the future. It also sounds like you’re calling g_main_context_iteration (NULL) (i.e. iterating the global default main context) from a non-main thread (although all of this is guesswork without seeing your code). If so, that’s a serious thread safety issue as well.

If GtkWidget::realize is taking too long to return to the main loop then perhaps something about your GtkTreeStore usage needs to be made more efficient. I’ll let a GTK expert comment on that though; it’s not something I’m an expert in.

Sorry for pressing you on this, but it sounds a lot like your code is constructed in a way which will cause you further issues in the future, and that’s something which it would be good to avoid :slight_smile:

People, the only things that caused me trouble were:

First, g_timeout_add() adds a timer which is apparently polled from the g_main_loop which is not running while being dispatched to a GtkWidget signal handler. This means we can not solve this problem using Glib timers. Your mistake.

Second, the application crashes when g_main_context_iteration is called and certain conditions are not met, like there most probably has to be a g_application running at the time of invocation. I can try to figure out the details if you want. The timer is now inserted and deleted from within the GtkWidget signal handler. My mistake.

The loop is definitely running. It’s running the signal handler!

Can you link to your code somewhere?

| mcatanzaro Michael Catanzaro
April 13 |

  • | - |

MischaBaars:

First, g_timeout_add() adds a timer which is apparently polled from the g_main_loop which is not running while being dispatched to a GtkWidget signal handler. This means we can not solve this problem using Glib timers. Your mistake.

The loop is definitely running. It’s running the signal handler!

It is calling the signal handler. The main loop and the signal handler run on the same processor, they do not run simultaneously.

You started this thread talking about UNIX signals but now you’re talking about GObject signals. The main context (iterated by your main loop) is assuredly calling your GObject signal handler. They don’t get executed via magic. Some code has to call them. For GTK, that’s done by the default main context.

No. You have to solve this with polling if you want to do it reliably. There is no other safe way, that is how the GTK main loop works on a very base level, and for the same reason it is also how unix async signal safety is designed.

If it happens to work it is by a fluke, and it could break again at any time. The details are not important, GTK is simply not async signal safe at all. You can probably find various situations like this where you can hack it and get it to (sort of) do something but this is not a reliable or intended behavior and it can break at any time for any number of reasons. So I really would suggest you don’t do that. Instead, just fix your gobject signal handlers so they don’t block.

Hi Michael,

I have not been talking about GObject signals, I was talking about g_timeout_add() vs timer_create(2). g_timeout_add() is part of GLib, not of GObject.

What I said was, that the timeout handler does not get executed while being dispatched to a GtkWidget signal, like GtkWidget::realize. The GTK main loop and the GtkWidget signal run on the same processor and therefore the Glib timer does not get polled while the GtkWidget signal handler is running.

GLib.timeout_add.

So, if we want to update a GtkLabel with the processor load for example during the execution of a GtkWidget signal handler, we need a POSIX timer and a manual call to g_main_context_iteration() to update the GUI as a whole.

http://mailman.cs.huji.ac.il/pipermail/linux-il/2013-July/010458.html

You win.

It just crashed.

That leaves me with the manual insertion of ‘g_main_context_iteration()’ at non-trivial places, because also the sorter shows the inactivity popup if the number of rows becomes too large.

Most likely because the thread is potentially in the middle of other gtk operations at the moment g_main_context_iteration() is called. They interfere.

I think the best course of action for you is to take advice from GTK experts on how to performantly handle 18000 rows in a GtkTreeStore.

Actually, I thought enclosing the GTK functions in the GtkWidget handler with a ‘sigprocmask()’ to block the timer handler and avoid interference might be important. Now it is working.

I won.

Have a great day!

This level of hackery is going to cause you issues in future. None of GTK or GLib is designed to work like that. You will save yourself time and effort in the long run (not having to diagnose and work around odd crashes in future) by using the toolkit the way it’s designed to be used.