Gnome-shell getting killed with no coredump, artifacting with workaround fix 🤧

Hello everyone :wave:

I’m experiencing an issue where my gnome-shell process is being killed (by the kernel?). No coredump or useful information in either system journal or the wayland user service’s journal. No memory exhaustion or other kernel issues either.

journalctl logs:

Aug 11 23:52:49 suse-pc pipewire[7352]: pw.node: (alsa_output.usb-Topping_DX3_Pro_-00.HiFi__Headphones__sink-54) graph xrun (0 suppressed)
Aug 11 23:52:56 suse-pc nautilus[16304]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc WebExtensions[12343]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc transmission-gtk.desktop[16594]: Gdk-Message: 18:22:56.841: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc gnome-shell[11720]: (EE) failed to read Wayland events: Broken pipe
Aug 11 23:52:56 suse-pc tilix[11345]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc thunderbird-bin[12207]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc xdg-desktop-por[11216]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc gsd-media-keys[7041]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc gsd-color[7026]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc gsd-keyboard[7037]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc gsd-power[7046]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc gsd-wacom[7119]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc xdg-desktop-por[11254]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc systemd[6182]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE
Aug 11 23:52:56 suse-pc evolution-alarm[7079]: Error reading events from display: Broken pipe
Aug 11 23:52:56 suse-pc systemd[6182]: org.gnome.SettingsDaemon.Keyboard.service: Main process exited, code=exited, status=1/FAILURE
Aug 11 23:52:56 suse-pc systemd[6182]: org.gnome.Shell@wayland.service: Main process exited, code=killed, status=9/KILL

org.gnome.Shell@wayland.service logs:

Aug 11 23:46:39 suse-pc gnome-shell[6474]: Received error from D-Bus search provider firefox.desktop: Gio.DBusError: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.mozilla.firefox.SearchProvider was not provided by any .service files
Aug 11 23:52:56 suse-pc gnome-shell[11720]: (EE) failed to read Wayland events: Broken pipe
Aug 11 23:52:56 suse-pc systemd[6182]: org.gnome.Shell@wayland.service: Main process exited, code=killed, status=9/KILL
Aug 11 23:52:57 suse-pc systemd[6182]: org.gnome.Shell@wayland.service: Failed with result 'signal'.
Aug 11 23:52:57 suse-pc systemd[6182]: org.gnome.Shell@wayland.service: Triggering OnFailure= dependencies.
Aug 11 23:52:57 suse-pc systemd[6182]: org.gnome.Shell@wayland.service: Consumed 23min 59.621s CPU time.

The problem started after upgrading to Linux kernel 6.10.3 and gnome-shell version 46.3.1. I’m not entirely sure which caused the issue to surface but I was able to confirm rolling back the update fixed it.

As per this gitlab issue, I have added the environment variable MUTTER_DEBUG_KMS_THREAD_TYPE=user to ~/.config/environment.d/00-gnome_fixes.conf and did the update again. Now gnome-shell does not crash but there is some artifacting in gnome various apps like console, calculator, chrome (with discourse open in a new tab) when I have Youtube videos playing in Chrome or Firefox for example. Appears whenever the GPU is taxed I suppose.

Is there a workaround for the workaround? :pleading_face:

And you’re certain you were not low on memory? SIGKILL is almost always a memory pressure kill.

(But maybe gnome-shell exceeded some other resource limit?)

1 Like

If gnome-shell gets killed and MUTTER_DEBUG_KMS_THREAD_TYPE=user helps, the reason for it getting killed almost certainly was the RT KMS thread being stalled for too long, usually somewhere in the driver. So that’s probably related to the kernel update more than the shell update.

I have sysstat/SAR running on my machine and was able to confirm it was not normal memory exhaustion and the OOM killer was not triggered, perhaps VRAM exhaustion? :thinking:

Would a fix be forthcoming from gnome to be more resilient to these types of issues or should I file a bug report with kernel.org?

It helps but introduced some artifacts, how could that be happening? Not familiar at all with graphics stuff, I know just enough to decipher by RT KMS you meant realtime kernel modesetting. Don’t understand why gnome-shell needs realtime KMS by default but works without it with this setting or why these artifacts happen.

A way to avoid this from the gnome side would be to temporarily disable RT in situations where drivers are known to take a while to complete an operation. Similar to what was done here:

I’m also not really familiar with the KMS code, so I probably won’t be able to help with that. This is the first time I hear of such artifacts, can you describe them or show a screenshot?. Also are you sure those are not related to some other updates?

1 Like

I’m experiencing an issue where my gnome-shell process is being killed (by the kernel?).

Any hint on what you were doing when it happens? Closing the laptop lid, connecting/disconnecting monitors, switching configuration, making some window fullscreen, for example?

Do you have a more complete journal log, especially a few seconds before it happened.

The work around shouldn’t result in any graphical glitches; if it does, I suspect you’re running into a driver bug.

rolling back the update fixed it.

Rolling back to what versions of the components?

1 Like

Here’s a screencast of the issue happening in gnome calculator with a 4k video playing in the background.

Yep this, fullscreen-ing a playing video can reliably reproduce the issue (get gnome-shell killed) without MUTTER_DEBUG_KMS_THREAD_TYPE=user.

I tried without the workaround and reproduced the problem after upgrading to kernel 6.10.5 yesterday. journal logs for wayland user service:

Aug 18 09:53:23 suse-pc systemd[3098]: Starting GNOME Shell on Wayland...
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Running GNOME Shell (using mutter 46.3.1) as a Wayland display server
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Made thread 'KMS thread' realtime scheduled
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Device '/dev/dri/card1' prefers shadow buffer
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Added device '/dev/dri/card1' (amdgpu) using atomic mode setting.
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Created gbm renderer for '/dev/dri/card1'
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Boot VGA GPU /dev/dri/card1 selected as primary
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Obtained a high priority EGL context
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Obtained a high priority EGL context
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Using public X11 display :0, (using :1 for managed services)
Aug 18 09:53:24 suse-pc gnome-shell[3363]: Using Wayland display name 'wayland-0'
Aug 18 09:53:25 suse-pc gnome-shell[3363]: Unset XDG_SESSION_ID, getCurrentSessionProxy() called outside a user session. Asking logind directly.
Aug 18 09:53:25 suse-pc gnome-shell[3363]: Will monitor session 1
Aug 18 09:53:25 suse-pc gnome-shell[3363]: Could not issue 'GetUnit' systemd call
Aug 18 09:53:25 suse-pc systemd[3098]: Started GNOME Shell on Wayland.
Aug 18 09:53:25 suse-pc gnome-shell[3363]: Failed to launch ibus-daemon: Failed to execute child process “ibus-daemon” (No such file or directory)
Aug 18 09:53:25 suse-pc gnome-shell[3363]: Error looking up permission: GDBus.Error:org.freedesktop.portal.Error.NotFound: No entry for geolocation
Aug 18 09:53:25 suse-pc gnome-shell[3363]: Error creating proxy: Error calling StartServiceByName for org.gtk.vfs.UDisks2VolumeMonitor: Unit gvfs-udisks2-volume-monitor.service is masked. (g-io-error-quark, 36)
Aug 18 09:53:27 suse-pc gnome-shell[3363]: GNOME Shell started at Sun Aug 18 2024 09:53:25 GMT+0530 (India Standard Time)
Aug 18 09:53:27 suse-pc gnome-shell[3363]: Registering session with GDM
Aug 18 09:54:26 suse-pc gnome-shell[3363]: Could not issue 'GetUnit' systemd call
Aug 18 09:54:26 suse-pc gnome-shell[5124]: The XKEYBOARD keymap compiler (xkbcomp) reports:
Aug 18 09:54:26 suse-pc gnome-shell[5124]: > Warning:          Unsupported maximum keycode 708, clipping.
Aug 18 09:54:26 suse-pc gnome-shell[5124]: >                   X11 cannot support keycodes above 255.
Aug 18 09:54:26 suse-pc gnome-shell[5124]: > Warning:          Could not resolve keysym XF86KbdInputAssistPrevgrou
Aug 18 09:54:26 suse-pc gnome-shell[5124]: > Warning:          Could not resolve keysym XF86KbdInputAssistNextgrou
Aug 18 09:54:26 suse-pc gnome-shell[5124]: Errors from xkbcomp are not fatal to the X server
Aug 18 09:54:26 suse-pc gnome-shell[3363]: Failed to launch ibus-daemon: Failed to execute child process “ibus-daemon” (No such file or directory)
Aug 18 09:54:30 suse-pc google-chrome.desktop[5351]: [5344:5344:0818/095430.642591:ERROR:object_proxy.cc(576)] Failed to call method: org.freedesktop.ScreenSaver.GetActive: object_path= /org/freedesktop/ScreenSaver: org.freedesktop.DBus.Error.NotSupported: This method is not part of the idle inhibition specification: https://specifications.freedesktop.org/idle-inhibit-spec/latest/
Aug 18 09:54:30 suse-pc google-chrome.desktop[5351]: [5344:5382:0818/095430.686986:ERROR:nss_util.cc(345)] After loading Root Certs, loaded==false: NSS error code: -8018
Aug 18 09:54:33 suse-pc google-chrome.desktop[5351]: Created TensorFlow Lite XNNPACK delegate for CPU.
Aug 18 09:54:39 suse-pc google-chrome.desktop[5351]: [5390:5390:0818/095439.401918:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 1 times!
Aug 18 09:54:41 suse-pc gnome-shell[3363]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
Aug 18 09:54:42 suse-pc google-chrome.desktop[5351]: [5390:5390:0818/095442.399922:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 2 times!
Aug 18 09:54:43 suse-pc google-chrome.desktop[5351]: [5390:5390:0818/095443.396807:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 3 times!
Aug 18 09:54:56 suse-pc gnome-shell[3363]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
Aug 18 09:55:09 suse-pc gnome-shell[3363]: Received error from D-Bus search provider org.gnome.Terminal.desktop: Gio.IOErrorEnum: Cannot invoke method; proxy is for the well-known name org.gnome.Terminal without an owner, and proxy was constructed with the G_DBUS_PROXY_FLAGS_DO_NOT_AUTO_START flag
Aug 18 09:55:09 suse-pc gnome-shell[3363]: Received error from D-Bus search provider firefox.desktop: Gio.DBusError: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.mozilla.firefox.SearchProvider was not provided by any .service files
Aug 18 09:55:09 suse-pc google-chrome.desktop[5351]: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#58 is a dynamic-sized tensor).
Aug 18 10:04:28 suse-pc gnome-shell[5106]: (EE) failed to read Wayland events: Broken pipe
Aug 18 10:04:28 suse-pc systemd[3098]: org.gnome.Shell@wayland.service: Main process exited, code=killed, status=9/KILL
Aug 18 10:04:28 suse-pc systemd[3098]: org.gnome.Shell@wayland.service: Failed with result 'signal'.
Aug 18 10:04:28 suse-pc systemd[3098]: org.gnome.Shell@wayland.service: Triggering OnFailure= dependencies.
Aug 18 10:04:28 suse-pc systemd[3098]: org.gnome.Shell@wayland.service: Consumed 37.738s CPU time.

Problem happened after upgrading machine on Aug 8 2024 that installed/upgraded these components, especially kernel to v6.10 and gnome-shell to v46.3:

suse-pc:~ # cat /var/log/zypp/history | grep '2024-08-10' | grep '|install|' | grep -iE 'gnome|gtk|kernel-default' | cut -d '|' -f 3,4
gnome-control-center-user-faces|46.3-2.1
gtk3-data|3.24.43-2.1
gtk3-schema|3.24.43-2.1
gtk4-schema|4.15.4-1.1
libgnome-autoar-0-0|0.4.4-1.6
gtk3-tools|3.24.43-2.1
libgtk-3-0|3.24.43-2.1
gtk3-branding-openSUSE|15.0-2.5
libgck-modules-gnome-keyring|46.2-1.1
gtk3-immodule-amharic|3.24.43-2.1
gtk3-immodule-inuktitut|3.24.43-2.1
gtk3-immodule-thai|3.24.43-2.1
gtk3-immodule-tigrigna|3.24.43-2.1
gtk3-immodule-vietnamese|3.24.43-2.1
openssh-askpass-gnome|9.6p1-11.1
libgtkmm-3_0-1|3.24.9-1.3
libgnome-autoar-gtk-0-0|0.4.4-1.6
libdbusmenu-gtk3-4|16.04.0-10.4
pinentry-gnome3|1.3.1-1.1
gnome-keyring|46.2-1.1
gnome-keyring-pam|46.2-1.1
libjavascriptcoregtk-6_0-1|2.44.2-4.0.2.1.sr20240803
libjavascriptcoregtk-4_1-0|2.44.2-4.0.2.1.sr20240803
libjavascriptcoregtk-4_0-18|2.44.2-4.0.2.1.sr20240803
kernel-default-devel|6.10.3-1.1
kernel-default|6.10.3-1.1
libqt5-qtbase-platformtheme-gtk3|5.15.14+kde143-1.2
libqt5-qtstyleplugins-platformtheme-gtk2|5.0.0+git20170311-10.19
libreoffice-gnome|24.2.5.2-1.1
libreoffice-gtk3|24.2.5.2-1.1
gstreamer-plugins-good-gtk|1.24.6-1.1
libgtk-4-1|4.15.4-1.1
gtk4-branding-openSUSE|15.0-3.7
typelib-1_0-Gtk-4_0|4.15.4-1.1
typelib-1_0-Gtk-3_0|3.24.43-2.1
gtk4-tools|4.15.4-1.1
gnome-control-center|46.3-2.1
gnome-music|46.1-1.1
qemu-ui-gtk|9.0.2-1.1
gnome-shell|46.3.1-2.1
gnome-shell-calendar|46.3.1-2.1
gnome-control-center-users|46.3-2.1
gnome-control-center-goa|46.3-2.1
gnome-control-center-color|46.3-2.1
qt6-platformtheme-gtk3|6.7.2-2.2
gnome-extensions|46.3.1-2.1
webkitgtk-6_0-injected-bundles|2.44.2-4.0.2.1.sr20240803
libwebkitgtk-6_0-4|2.44.2-4.0.2.1.sr20240803
webkit2gtk-4_0-injected-bundles|2.44.2-4.0.2.1.sr20240803
libwebkit2gtk-4_0-37|2.44.2-4.0.2.1.sr20240803
libwebkit2gtk-4_1-0|2.44.2-4.0.2.1.sr20240803
webkit2gtk-4_1-injected-bundles|2.44.2-4.0.2.1.sr20240803

Prior to this, I was running kernel v6.9 and gnome-shell v46.2.

Maybe enabling/disabling direct scanout needs to disable RT during that switch. Does it make difference if you use this extension to disable direct scanout? Disable unredirect fullscreen windows - GNOME Shell Extensions

1 Like

Can you try downgrading the gtk4 package? I think I saw a recent issue about the corruption above/below text.

1 Like

That extension fixed the original issue of gnome-shell getting killed without the workaround! :100:

Wow!:tada:

I confirm this is a different issue, probably related to my distro shipping an unstable version of gtk4. Thank you very much for your help! :handshake:

Then the original issue seems to be that with the kernel upgrade, the graphics driver kernel froze for over 15 frames (assuming 60 Hz monitor) when just being asked to check whether the fullscreen window could be scanned out, leaving the process in a “running” state while doing so. This should never happen… Could you open up a bug report on the kernel for this? Where to do that depends on what GPU you are using.

1 Like

Thanks for the clarification, I will submit a regression report when I have the time. Afraid they might ask for a bisection, I still haven’t fully recovered from doing one for kexec earlier this year :smiling_face_with_tear:

1 Like

The extension also is a workaround. And I would suggest to rather use the environment variable than the extension, because the extension disables bypassing the compositor for fullscreen applications, which is worse for the performance.

I only suggested the extension to figure out if this is caused by enabling direct scanout, which this extension avoids.

1 Like