Process control retained over detaching children

,

I’m developing a library controlling other processes, like ssh instances.

As a first test, I tried to use the older g_spawn* API and specifically g_spawn_sync() to launch a (special) ssh process and read its stdout and stderr output buffers:

    gint ssh_exit = -1;
    gchar *ssh_stdout = NULL, *ssh_stderr = NULL;
    GError *ssh_err = NULL;
    gboolean ret = g_spawn_sync (NULL, (gchar**)(ssh_cmd->pdata), NULL, G_SPAWN_SEARCH_PATH, NULL, NULL, &ssh_stdout, &ssh_stderr, &ssh_exit, &ssh_err);

    if (ret) {
      g_printf ("Process executed successfully!\nReturn value: %d\nStdout:\n>>>%s<<<\nStderr:\n>>>%s<<<\n", ssh_exit, ssh_stdout, ssh_stderr);
 
      if (ssh_err) {
        g_printf ("Successful execution, but ssh_err set? Weird, here's the message: %s", ssh_err->message);
      }
    }
    else {
      g_printf ("Process didn't execute successfully!\nError:\n>>>%s<<<\n", ssh_err->message);
    }

    /* Uninteresting cleanup via g_free () and stuff, redacted. */

ssh_cmd->pdata is essentially a string array containing [ssh] [ionic.de] [-p] [22] [-o] [ControlMaster="yes"] [-o] [ControlPersist="yes"] [-o] [ControlPath="/home/ionic/.libx2goclient//ssh/control"] [-f] [-N] [-T] [-o] [ExitOnForwardFailure="yes"] [uptime] [(null)]. Nothing crazy, but the important parts are the -f flag, which tells ssh to detach from the controlling terminal just before actual command execution and the master-slave control socket setup.

Executing this test lead to a result I didn’t expect: the application just hang in g_spawn_sync(), even though the initial process terminated successfully after spawning a new child process (i.e., forking into background). Killing the forked child process lead to the test application spitting out the gathered stdout/stderr buffers.

My initial idea was that the spawned child process inherits its parent file descriptors, keeping stdout and stderr open to the controlling application (although that would be weird and defeating the daemonizing purpose) and tried adding the G_SPAWN_CLOEXEC_PIPES flag to work around that, but no avail.

I’m also aware of the GSubprocess API, which is newer and supposed to be better integrated with other glib APIs. Since the original goal was to use the GSubprocess API anyway and I only used g_spawn_sync() as an interim quick test, I ported the code over to the GSubprocess API:

    GError *ssh_err = NULL;
    self->master_conn = g_subprocess_newv ((const gchar* const*)(ssh_cmd->pdata), G_SUBPROCESS_FLAGS_STDOUT_PIPE | G_SUBPROCESS_FLAGS_STDERR_PIPE, &ssh_err);

    ret = self->master_conn != NULL;

    if (ret) {
      g_printf ("Process started/executed successfully!\n");

      if (ssh_err) {
        g_printf ("Successful execution, but ssh_err set? Weird, here's the message: %s", ssh_err->message);
      }

      GCancellable *master_conn_comm_cancel = g_cancellable_new ();
      g_clear_error (&ssh_err);
      GBytes *ssh_stdout = NULL, *ssh_stderr = NULL;
      if (!(g_subprocess_communicate (self->master_conn, NULL, master_conn_comm_cancel, &ssh_stdout, &ssh_stderr, &ssh_err))) {
        g_log (NULL, G_LOG_LEVEL_CRITICAL, "Communication with master connection subprocess failed: %s", ssh_err->message);
      }
      else {
        gsize ssh_stdout_size = 0, ssh_stderr_size = 0;
        const gchar *ssh_stdout_str = g_bytes_get_data (ssh_stdout, &ssh_stdout_size),
                    *ssh_stderr_str = g_bytes_get_data (ssh_stderr, &ssh_stderr_size);
        g_printf ("Stdout:\n>>>%.*s<<<\nStderr:\n>>>%.*s<<<\n", ssh_stdout_size, ssh_stdout_str, ssh_stderr_size, ssh_stderr_str);

        g_bytes_unref (ssh_stdout);
        g_bytes_unref (ssh_stderr);
      }

      g_clear_error (&ssh_err);
    }
    else {
      g_printf ("Process didn't execute/start successfully!\nError:\n>>>%s<<<\n", ssh_err->message);
    }

    g_clear_error (&ssh_err);

Again, g_subprocess_communicate() hangs while the forked child process is running. Killing that, once again, lets the test application continue.

Why does glib track child processes of children? Is this an intended behavior? I would have expected it to track only the initially spawned process, without an implicit “follow-fork” mode.

(Yes, eventually I will have to handle all of this asynchronously; the synchronous operation was just meant as a quick test to get to know the APIs on a basic level and how they behave. It might also turn out to be very comfortable that glib automatically tracks the ssh master connection in this case, but it still feels odd.)

Calling g_subprocess_get_identifier() will correctly return the string representation of the PID (on Linux, at least), but NULL later on. The documentation says:

If the subprocess has terminated, this will return NULL .

Thus, I tried code such as that:

      for (gsize i = 0; i < 100; ++i) {
        const gchar *pid = g_subprocess_get_identifier (self->master_conn);

        if (pid) {
          g_printf ("Process identifier: %s\n", pid);
          g_usleep (50 * 1000);
        }
        else {
          g_log (NULL, G_LOG_LEVEL_DEBUG, "PID is NULL, calling g_subprocess_wait()...");
          g_subprocess_wait (self->master_conn, master_conn_comm_cancel, &ssh_err);
        }
      }

… and indeed, it seems that the wait call seems to return immediately. So the process terminated, but read calls still block?

The “problem” seems to be that my grandchild process has an open pipe on stderr to the test application. Interestingly, the pipe to stdout doesn’t seem to be connected (or gets connected only when actually fetching the pipe and reading from it?) That seems to block while reading.

I guess that ssh just doesn’t close the stderr FD when detaching and what I see is really a misbehaving child application that doesn’t clean up its FDs correctly?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.