G_idle behavior or priority question in context of other g_thread(s)

I have a behavior I did not expected or understand:

I have a gtk application while is doing updates in a drawing area (via a g_idle function) while receiving external data (via a thread A).
This works fine by itself.

Now I also have a embedded python “running” a script for various tasks. One is checking on data frequently.

To have the python interpreter not blocking the main task in case it has to wait for something I run it in a g_task as a thread B (I also tried a plain g_thread – same behavior) I do find a kind of strange effect on my g_idle function when I for example simply run a g_usleep ( for a few seconds) from a embedded python callback in this thread B:

The thread B seams to block the g_idle function updating my drawing as long as it is in the “sleep” mode.

It is not blocking the application as I can work with it and also do trigger a manual update of my drawing.

I do not understand this behavior – as if the idle function has “less priority” than the thread – but they should be totally concurrent.

You should probably try to show your code what your thread/application is doing here.

From the top of my head, the following questions arise:

  • Is that thread the only python code running (to double check if the GIL might be involved, I don’t think so)
  • Is that thread touching a GMainContext/GMainLoop, if yes, the default one?
  • Do you have custom locking that might be going wrong?

Interesting points.

It’s a larger project all together. The python execution thread is here, line 3004:

https://sourceforge.net/p/gxsm/svn/HEAD/tree/trunk/Gxsm4/plug-ins/common/pyremote.cpp

I am not sure what GIL stands for.

I am not touching main loop from the thread.
A simple g_usleep() is causing the issue been noticable.

Another observation: I do have other displays updating machine status periodically using a g_timeout() and this keeps going fine.

I am tempted to see what happens turning the g_idle into a timeout.

That’s the python global interpreter lock.

I see.
And yes that’s the only python interpretor and python code I have running at times in this application.

And there are no mutexes used or needed for this thread. Well I do have some, but not touched on this context.

Do you have an example of python code that is loaded/running there? Including the imports?

One obvious bug I see is that push_message_async is not thread safe because the GSList is not. You’ll either need to add locking, or use e.g. a GAsyncQueue instead. But, this is really unlikely to be causing issues.

Anyway, my suggestion would be to attach gdb, then, when you are in the situation where the code hangs hit ctrl+c and get a threads apply all bt. Moste likely it’ll tell you more about why the main thread is not processing events.

Here is a python code example.
The first block below is my automated and default environment configuration and to catch messages for put on my console.

Then I just execute a few test python commands at the end.

The simple first command “gxsm.sleep(10)” is blocking the g_idle already.
The python code is NOT hanging nor the GUI is blocking or hanging either!
Only a GUI update (done via a g_idle) stalls as long as long as the python thread runs.

I also included below the python code the C++ code fragments I wrote to thread start and the thread:
see “py_gxsm_console::run_command(…)”.

Another note: I do have at this time one (and at times more) more g_idle() running to watch and update the “console” output from the python. That actually runs OK (started 1st along with the python thread).

May be a priority issue between g_idle functions?

Here is a visual demo:

Watch
a) the (noisy test data) scanning image data appearing (top center window) and stalling updates when once the script is started. (a g_idle is managing the image updates)
b) also watch the “very top right” tip position indicator (this is periodically updated wit ha g_timeout)
c) the g_idle image update “catching up” once the script is finished.

It’s so weird, the program GUI is functionally just fine, I can do manual updates/etc… of the data, only the g_idle to do this automatically stalls and but catches up again when the python is done.

However, I also tried for testing (non ideal) to replace the g_idle for image update with a g_timeout, also stalls, no idea here. The simple “sleep” call is not interfering with any of my data.

Python code:

## environment and redirection setup, executed once at initialization
import redirection
import sys
class StdoutCatcher:
    def write(self, stuff):
        redirection.stdoutredirect(stuff)
    def flush(self):
        redirection.stdoutredirect('\n')
class StderrCatcher:
    def write(self, stuff):
        redirection.stdoutredirect(stuff)
    def flush(self):
        redirection.stdoutredirect('\n')
sys.stdout = StdoutCatcher()
sys.stderr = StderrCatcher()

# import own gxsm interface functions
import gxsm

#### actual test script, executed later by user in the same python env.

print ("sleep 10 test")  ## this is doing no more than a "g_usleep() on C level"
gxsm.sleep (10)

print ("start scan test")
gxsm.startscan ()
print ("watching...")

print ('y=', gxsm.y_current())
gxsm.sleep (10)
print ('y=', gxsm.y_current())
gxsm.sleep (10)
print ('y=', gxsm.y_current())


The key functions running the python interpreter in a thread. I tested both a g_thread and also a g_task – both has the same outcome blocking the g_idle.


void py_gxsm_console::PyRun_GTaskThreadFunc (GTask *task,
                                             gpointer source_object,
                                             gpointer task_data,
                                             GCancellable *cancellable){
        PyRunThreadData *s = (PyRunThreadData*) task_data;
        PI_DEBUG_GM (DBG_L2, "pyremote Plugin :: py_gxsm_console::PyRun_GTaskThreadFunc");
        s->ret = PyRun_String(s->cmd,
                              s->mode,
                              s->dictionary,
                              s->dictionary);
        g_free (s->cmd);
        s->cmd = NULL;
        PI_DEBUG_GM (DBG_L2, "pyremote Plugin :: py_gxsm_console::PyRun_GTaskThreadFunc done");
}


gpointer py_gxsm_console::PyRun_GThreadFunc (gpointer data){
        PyRunThreadData *s = (PyRunThreadData*) data;
        PI_DEBUG_GM (DBG_L2, "pyremote Plugin :: py_gxsm_console::PyRun_GThreadFunc");
        s->ret = PyRun_String(s->cmd,
                              s->mode,
                              s->dictionary,
                              s->dictionary);
        g_free (s->cmd);
        s->cmd = NULL;
        PI_DEBUG_GM (DBG_L2, "pyremote Plugin :: py_gxsm_console::PyRun_GThreadFunc PyRun completed");
        if (!s->ret) PyErr_Print();
        --s->pygc->user_script_running;
        s->pygc->push_message_async (s->ret ?
                                    "\n<<< PyRun user script (as thread) finished. <<<\n" :
                                    "\n<<< PyRun user script (as thread) run raised an exeption. <<<\n");
        s->pygc->push_message_async (NULL); // terminate IDLE push task
        PI_DEBUG_GM (DBG_L2, "pyremote Plugin :: py_gxsm_console::PyRun_GThreadFunc finished.");
        return NULL;
}

void py_gxsm_console::PyRun_GAsyncReadyCallback (GObject *source_object,
                                                 GAsyncResult *res,
                                                 gpointer user_data){
        PI_DEBUG_GM (DBG_L2, "pyremote Plugin :: py_gxsm_console::PyRun_GAsyncReadyCallback");
	py_gxsm_console *pygc = (py_gxsm_console *)user_data;
        if (!pygc->run_data.ret) PyErr_Print();
        --pygc->user_script_running;
        pygc->push_message_async (pygc->run_data.ret ?
                                  "\n<<< PyRun user script (as thread) finished. <<<\n" :
                                  "\n<<< PyRun user script (as thread) run raised an exeption. <<<\n");
        pygc->push_message_async (NULL); // terminate IDLE push task
        PI_DEBUG_GM (DBG_L2, "pyremote Plugin :: py_gxsm_console::PyRun_GAsyncReadyCallback done");
}


const gchar* py_gxsm_console::run_command(const gchar *cmd, int mode)
{
   	if (!cmd) {
		g_warning("No command.");
		return NULL;
	}

        PyErr_Clear(); // clear any previous error or interrupts set

        g_idle_add (pop_message_list_to_console, this); // keeps running and watching for async console data to display
        if (!run_data.cmd){
                PI_DEBUG_GM (DBG_L2, "pyremote Plugin :: py_gxsm_console::run_command *** starting console IDLE message pop job.");
                run_data.cmd = g_strdup (cmd);
                run_data.mode = mode;
                run_data.dictionary = dictionary;
                run_data.ret  = NULL;
                run_data.pygc = this;
#if 1
                g_thread_new (NULL, PyRun_GThreadFunc, &run_data);
#else
                GTask *pyrun_task = g_task_new (NULL,
                                                NULL,
                                                PyRun_GAsyncReadyCallback, this);
                g_task_set_task_data (pyrun_task, &run_data, NULL);
                g_task_run_in_thread (pyrun_task, PyRun_GTaskThreadFunc);
#endif
                PI_DEBUG_GM (DBG_L2, "pyremote Plugin :: py_gxsm_console::run_command thread fired up");
                return NULL;
        } else {
                return "Busy";
        }
}

I am now confused – is GSList not thread safe? I though glib itself is thread safe in general? OK, I tried.
Update: I tested it with a added mutex for “message_list” as used below, no change in behavior.

        void push_message_async (const gchar *msg){
                g_mutex_lock (&g_list_mutex);
                if (msg)
                        message_list = g_slist_prepend (message_list, g_strdup(msg));
                else
                        message_list = g_slist_prepend (message_list, NULL); // push self terminate IDLE task mark
                g_mutex_unlock (&g_list_mutex);
        }

        static gboolean pop_message_list_to_console (gpointer user_data){
                py_gxsm_console *pygc = (py_gxsm_console*) user_data;

                g_mutex_lock (&g_list_mutex);
                if (!pygc->message_list){
                        g_mutex_unlock (&g_list_mutex);
                        return true;
                }
                GSList* last = g_slist_last (pygc->message_list);
                if (!last){
                        g_mutex_unlock (&g_list_mutex);
                        return true;
                }
                if (last -> data)  {
                        pygc->append (last -> data);
                        g_free (last -> data);
                        pygc->message_list = g_slist_delete_link (pygc->message_list, last);
                        g_mutex_unlock (&g_list_mutex);
                        return true;
                } else { // NULL data mark found
                        pygc->message_list = g_slist_delete_link (pygc->message_list, last);
                        g_mutex_unlock (&g_list_mutex);
                        pygc->append ("--END IDLE--");
                        return false; // finish IDLE task
                }
        }

The C code for the “gxsm.sleep()”:

static PyObject* remote_sleep(PyObject *self, PyObject *args)
{
	PI_DEBUG(DBG_L2, "pyremote: Sleep ");
	double d;
	if (!PyArg_ParseTuple(args, "d", &d))
		return Py_BuildValue("i", -1);
	if (d>0.){ // d in 1/10s
                g_usleep ((useconds_t)round(d*1e5)); // now in a thread and can simply sleep here!
		// sleep_ms((int)(round(d*100)));
	}
	return Py_BuildValue("i", 0);
}

Ugh, of course … you are doing a g_idle_add for pop_message_list_to_console. That is bad, you are busy looping just to poll for messages.

Instead, you want to wake up a GSource every time a message is or may be there. i.e. your pop_message_list_to_console handler should always return G_SOURCE_REMOVE. And then you call g_idle_add from push_message_async.

1 Like

GSList does no locking of its own. It is not thread-safe unless you add your own external locking to it.

None of GLib (or GObject, or GIO) is thread-safe in general. A number of types and objects are explicitly documented as thread safe (for example, GDBus), but nothing else is.

OK, good to know! I did that now and it indeed helps this trouble :slight_smile:

Anyhow, now the message update is some what sluggish, seam to come in “chunks”:
While previously the “line by line” number display came spot on 1,2,3,4,5,6,7,… now it’s more in blocks with jittery delays like this: 1,2,3,4,…(delay)… 5,6,7,8, … (delay) …

No big deal in this case, but not nice. Potentially related to my other idle functions…:

---- Update:
<<< OK – fine now! I had temporary increased the scan-view update idle priority!!!
It’s in sync with same priority!

------ but still curious ----
The scan-line update/display is now normal again – also done in a much more complex idle_function operating like a state machine. This idle I do want to run as fast as possible to maintain a more or less life update up the data stream and some indicators. (And here I have unrelated?? performance troubles with gtk4, not allowing me any more to schedule only parts (box region) of a potentially large image to update).
And yes, this idle function returns TRUE to be rescheduled as long as the scan is going as I want it to run again as soon as possible, but sure other idle call shall be called / shared.
There is never a real “dead polling here” as there is always new data streamed in (in a different thread from hardware, potentially very fast), and if it is only a few new pixels to be updated.

And leads me to wonder and asking for details about the g_idle scheduling or priority management as I did not expected this pop_message g_idle been almost exclusively executed and not the other g_idle function(s) – all were deployed with the default idle priority.

I some what expected all g_idle function I may have with same priority been called one by one and then starting over. vs. blocking other idles.

This seams to be the case some what!
Anyhow, is there detailed info or more insight on this matter?

I do have various idle tasks to be worked on while scanning and this python pop_message console update is one of them when python is running a job.

If there is a delay, then the main thread is probably busy doing something else. You might want to split the task that needs more time into smaller chunks, returning to the mainloop more often. Or, if it is something bigger, it might be worth moving it into a different thread using GTask or similar.

There is never a real “dead polling here” as there is always new data streamed in (in a different thread from hardware, potentially very fast), and if it is only a few new pixels to be updated.

Maybe, though you can only update the screen at 60fps or so. Processing the data in larger chunks with some throttling is probably more CPU efficient. Not entirely sure, but I have in mind that gtk_widget_queue_draw will actually throttle requests to the display refresh rate (GdkFrameClock in GTK 3). See the documentation of gtk_widget_get_frame_clock.

IIRC, glib should dispatch all ready sources of the same priority together. So, if you have multiple idle handlers of the same priority, they should all be executed in the same mainloop iteration.

That said, GTK also uses idle handlers in various locations. So it could still be that you are pre-empting some of those. Said differently, if you have an background idle handler that is always ready, then you should probably set it to at least G_PRIORITY_LOW (i.e. >= 300).

But, I would still maintain that you really shouldn’t have an unthrottled idle handler. Even if you are handling streamed data at a high rate.

OK I see what you mean. I think I got this now :slight_smile:

If you are curious, this “scan update and start/stop management” is done in a state machine like non blocking construct and yes in various chunks or stages. Here is the major “GUI based” work load managing g_idle task:

https://sourceforge.net/p/gxsm/svn/HEAD/tree/trunk/Gxsm4/plug-ins/control/spm_scancontrol.cpp

in line 669:

gboolean SPM_ScanControl::spm_scancontrol_run_scans_task (gpointer data){
…}

and just above this it is launched:
static void spm_scancontrol_start_callback (GtkWidget *w, void *data){
…}

However, at this time this idle task is or was non-throttled. … testing to throttle the scan refresh it at 50ms.
The init stages 0,10,11 should be completed as fast as possible.
Then the repeating “scan update stage” 20 calling

SPM_ScanControl::scanning_task (gpointer spc)
calling mainly gboolean SPM_ScanControl::scanning_control_run ()
… and for every channel
if (!main_get_gapp()->xsm->hardware->ScanLineM (line, 1, xp_srcs, m2d_xp, sls_config)){…

… this can be throttled. And I am now terminating this g_idle after one update and starting a new g_timeout top catch up ~50ms later just before returning from the g_idle with G_SOURCE_REMOVE!

        case 20:
                SPM_ScanControl::scanning_task (data); // actual scanning "setup, monitoring and update" task



                if (((SPM_ScanControl*)data) -> scanning_task_stage == 0){ // competed?
                        runmode = 30;
                        g_idle_add (SPM_ScanControl::spm_scancontrol_run_scans_task, data);
                } else {
                        g_timeout_add (50, SPM_ScanControl::spm_scancontrol_run_scans_task, data); // throttle to 50ms
                }

                return G_SOURCE_REMOVE; // throttled
                //return TRUE;

So the updating works fine now! On my “setup and GUI update” is slow. I have a ton of entry fields to be managed.

I do have the exact same code in a gtk3 based version I am porting currently, mainly done, but has some pending issues here and there.

Previously (gtk3) this all worked just fine and no issues. No with gtk4 I find the whole “scan start” procedure done in this idle task taking almost 10x longer, makes it feeling very slow and I am not sure where the bottle neck is or how to address it.

I have some hints that my added on “gtk_entry” management (core-source/pcs.cpp) with automated mapping to g_settings and internal variables, slave fields and more… has some new in gkt4 related very “slow” function calls involved. I am trying to find out what it is.

Well, the g_idle problem is solved, many thanks for the tips!!


PS: The real hard and time critical data transport from a USB device (maintaining a data stream via a FIFO) is done in a thread in a different lower level hardware plugin. Via a abstraction class I only query the actual progress and current line completed and then update one or multiple scan data view windows via the task above accordingly.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.