Old home, new gnome-shell; no login

Context : I multi-boot different GNU/Linux-based operating systems with a shared $HOME mount partition. My current list is : Debian, Ubuntu, Solus, Fedora.

Last week, I re-installed both Ubuntu and Fedora to the latest releases. Since then I haven’t been able to login to the wayland sessions on either OSes. Xorg session on ubuntu works but not on Fedora. Both OSes run the latest GNOME Shell 3.36.1

This affects only my primary user ( the only one that exists and the only one that I use ) on the system. For testing purposes, I created another test user who is able to login to their wayland session without any issues.

Looking at the journal, I found this weirdness when I attempted to login with my primary user.
ubuntu :

Apr 30 02:09:13 /usr/lib/gdm3/gdm-wayland-session[9811]: grep: unrecognized option '--session=ubuntu'
Apr 30 02:09:13 /usr/lib/gdm3/gdm-wayland-session[9811]: Usage: grep [OPTION]... PATTERNS [FILE]...
Apr 30 02:09:13 /usr/lib/gdm3/gdm-wayland-session[9811]: Try 'grep --help' for more information.
Apr 30 02:09:13 dbus-daemon[9378]: [session uid=1000 pid=9378] Activating via systemd: service name='org.gtk.vfs.Daemon' unit='gvfs-daemon.service' requested by ':1.2' (uid=1000 pid=9798 comm="/usr/libexec/gnome-session-binary --systemd -l --s" label="unconfined")
Apr 30 02:09:13 systemd[6023]: Starting Virtual filesystem service...
Apr 30 02:09:13 dbus-daemon[9378]: [session uid=1000 pid=9378] Successfully activated service 'org.gtk.vfs.Daemon'
Apr 30 02:09:13 systemd[6023]: Started Virtual filesystem service.
Apr 30 02:09:13 gnome-session-b[9798]: Unknown option -l
Apr 30 02:09:13 gdm-password][9736]: pam_unix(gdm-password:session): session closed for user shine
Apr 30 02:09:13 gdm3[932]: GdmDisplay: Session never registered, failing

fedora

Apr 29 23:50:37 systemd[1]: Started Session 7 of user shine.
Apr 29 23:50:37 gdm-password][3445]: pam_unix(gdm-password:session): session opened for user shine by (uid=0)
Apr 29 23:50:37 gdm-password][3445]: gkr-pam: gnome-keyring-daemon started properly and unlocked keyring
Apr 29 23:50:37 audit[3445]: USER_START pid=3445 uid=0 auid=1000 ses=7 subj=system_u:system_r:xdm_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_selinux,pam_loginuid,pam_selinux,pam_keyinit,pam_namespace,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_gnome_keyring,pam_umask acct="shine" exe="/usr/libexec/gdm-session-worker" hostname=divine-inifinty-black-hole-gamma-lap addr=? terminal=/dev/tty4 res=success'
Apr 29 23:50:37 audit[3445]: USER_LOGIN pid=3445 uid=0 auid=1000 ses=7 subj=system_u:system_r:xdm_t:s0-s0:c0.c1023 msg='uid=1000 exe="/usr/libexec/gdm-session-worker" hostname=? addr=? terminal=? res=success'
Apr 29 23:50:37 gnome-session-b[3483]: Unknown option -l
Apr 29 23:50:37 gdm-password][3445]: pam_unix(gdm-password:session): session closed for user shine

What is that Unknown option -l that is thrown by gnome-session-b[inary]?
I looked very hard to find any reference to a -l flag on the internet, but couldn’t find any. Even my Debian buster install ( GNOME Shell 3.30.2 ) doesn’t have it.

Considering that this affects only my existing user and it’s the same error on both OSes, it should be hidden somewhere within my $HOME directory. I did dig around in $HOME/.config and $HOME/.local, but couldn’t find anything consequential. I’ve been scratching my head at this for 2 days now. Can someone please point me in the direction of where to look?

I have a couple of ubuntu-specific ‘session_migration’ files in $HOME/.local/share, but that just seem to be a list of migrations that is probably for tracking metadata or something. And it has nothing about any login / GDM-related migrations either.

That’s a very, very bad idea. Each of these distributions ships with different versions of GNOME and its components, and since the configuration and state for your user is stored inside your $HOME directory, you’ll end up with conflicting data.

GNOME simply does not support this use case. You’ll have to modify environment variables to point directories like $XDG_DATA_HOME, $XDG_CONFIG_HOME, and $XDG_CACHE_HOME to other paths and store them on a per-OS basis, and even then you might get weird conflicts, like per-user systemd services going out of sync. That, of course, has the side effect of not being able to share the configuration among installations.

The gnome-session wrapper script used to launch gnome-session-binary supports the -l switch to match the equivalent bash option, which means “use a login shell”; it’s fed directly to the shell, and in theory it should be dropped from the command line.

I know. I’ve gotten only this answer whenever I’ve had to mention this to anyone.

For some reason, I’ve been successful so far without any issues. Actually, I don’t really run anything system-wide other than those that come with the OS. Most of the “extra” software that I use are compiled from source. Most of the time, it’s only shared library development headers that I install extra from the OS repositories. I think that’s why my $HOME has been portable across various OSes running different versions of GNOME.

I was looking for that wrapper script that actually makes that call. Where can I find the script that actually adds the flag?

Actually, it is - for newer users. There probably is some leftovers from some session ( old version of GNOME ) presumably in my $HOME that’s adding that flag. I need to find that script and nuke it or bring it back to sanity.

PS : the same test with Xorg on fedora threw Unknown option -c too. But it works fine on Ubuntu. I just didn’t want to throw a lot of similar errors and confuse people.
So, essentially, what I’m looking for is the directory of gnome-session wrapper scripts somewhere in my $HOME directory right?
The only directory that I could find with the name gnome-session was $HOME/.config/gnome-session which only had another empty directory called saved-session. So, I believe I don’t have any saved GNOME sessions that GNOME is trying to recover from either.

Have you changed your login shell (in /etc/passwd, e.g. via chsh)?

On Ubuntu 20.04, the default Wayland session /usr/share/wayland-sessions/ubuntu-wayland.desktop has

Exec=env GNOME_SHELL_SESSION_MODE=ubuntu /usr/bin/gnome-session --systemd --session=ubuntu

and /usr/bin/gnome-session does

#!/bin/sh

if [ "x$XDG_SESSION_TYPE" = "xwayland" ] &&
   [ "x$XDG_SESSION_CLASS" != "xgreeter" ] &&
   [  -n "$SHELL" ] &&
   grep -q "$SHELL" /etc/shells &&
   ! (echo "$SHELL" | grep -q "false") &&
   ! (echo "$SHELL" | grep -q "nologin"); then
  if [ "$1" != '-l' ]; then
    exec bash -c "exec -l '$SHELL' -c '$0 -l $*'"
  else
    shift
  fi
fi

#... snip irrelevant code...

if [ -d "${XDG_RUNTIME_DIR}/systemd" ]; then
  exec /usr/libexec/gnome-session-binary --systemd "$@"
else
  exec /usr/libexec/gnome-session-binary --builtin "$@"
fi

So what happens is that first the /usr/bin/gnome-session wrapper script executes, notices that this is a wayland session and that $1 is not ‘-l’ (it’s --systemd), then it tries to re-excetute itself by doing that terrible double-exec of bash, then $SHELL, hoping to get a login shell either from setting its argv[0] to something starting with a -. The -l argument is there to prevent infinite recursion, AFAIU, and the second invocation is supposed to strip it off with the shift.

The purpose of this is to invoke a login shell that would read your ~/.profile and make the environment variables defined there available for the entire GUI session. Before this was done, many people loudly objected about losing their custom $PATH or locale variables, and would switch back to Xorg sessions.

The errors from grep and gnome-session seem to indicate that your $SHELL is not behaving like a POSIX shell. It’s a weak guess, but I’ve no other theories to explain why you get errors from grep, or why the -l added by the first invocation is not stripped by the shift in the second invocation.

tl;dr perhaps it’s not a file in your home directory, but a custom login shell set in /etc/passwd for your primary user?

I had done that earlier on my previous installs ( to compiled versions ( bleeding-edge master compiles of bash or zsh ), but these ( Ubuntu and Fedora ) are fresh installs and I haven’t gotten a chance to change the shells yet ( though I would have had these errors not popped up ).

You are correct ( I haven’t confirmed it 100%; but I’m fairly certain ). This has been the case for me even before ( Ubuntu 18.04 LTS Bionic Beaver ) too.
On Ubuntu, my $SHELL ( upon fresh login before the invocation of $HOME/.profile ) has been --session=ubuntu for a long time now. It is the same in the XOrg session as well ( I can’t explain why I can login here though ).
My SSH ProxyCommands would fail and everything else that relied on $SHELL would either complain or just plain die. It got to a point where it was frustrating to have a command fail due to this and then having to override $SHELL before I could run the command again. And I couldn’t find the root-cause of the problem either. So, then I just overrode it in my $HOME/.profile and got done with it.

So, I think you’ve identified the problem for me. I’m not in a position to test this extensively right now. The reason being, I have to restart the machine multiple times between OSes. However, I will test this more extensively and get back with test cases and results.

My suspicions are in the double-shell invocations in /usr/bin/gnome-session. That’s where I’m going to put the most of my effort in testing. I’ll get back with my observations as soon as I can.

Thanks again for identifying the problem and pointing me in the right direction.

Wow. I would like to know how you managed that!

Do you get the bad $SHELL if you switch to a text console with Ctrl+Alt+F3 and log in there? Or if you use SSH to log in? That is, does the bad SHELL problem get introduced by gnome-session’s attempt to execute a login shell, or is it happening before that?

Do you have a ~/.pam_environment file, and what’s in it?

I think the login works in Xorg sessions despite the bad $SHELL because the first check gnome-session does is compare $XDG_SESSION_TYPE to “wayland”, before checking anything else and trying to re-exec itself.

I solved the mystery!

Actually, it was in my $HOME. Remember I told you that I had over-riden the annoying --session=ubuntu in my $HOME/.profile? Turns out, that was the culprit.

SHELL=$( ps $$ | awk '( NR==2 ) { print $NF }' | awk -F / '{ print $NF }' )

I used this hack so that I could make both zsh and bash work whenever I invoked either. I wanted $SHELL to the path to the binary.

As soon as I disabled that, I was able to login.


Answers to previous questions :

I don’t have that file.

Yes, except I was able to login only on Ubuntu and not on Fedora.


Long story :

This is where I started my investigation with. Though I vaguely remember having the same problem in TTYs on bionic as well, it wasn’t there on focal or Fedora. I was getting -bash ( while I was supposed to at least get /bin/bash ).

I played around with /usr/bin/gnome-session. The file was pretty much identical on every OS, so, I ruled out the gnome-session script to be the culprit.
I set -o xtrace on the script and figured that the error was happening with the second invocation of the shell.
Then I went around removing the double invocation ( though I knew that wasn’t the solution ). It worked.
Now, I had to figure out what was happening in between the first and the second invocation that could possibly break my login.

Then this came to my mind and I realized that was what the -bash was about in my $SHELL when I logged in with a TTY.
Then it took me some more testing before it struck me that I was manipulating $SHELL in my $HOME/.profile and that probably was interfering in between the double-shell invocation.

I turned that off and everything went back to normal.

Many thanks to @mgedmin for the explanations and the right pointers that helped debug and get to the bottom of this issue. :heart:


Now, my non-default shell is going to be messed up because when I invoke it from an existing shell, it already has the $SHELL environment variable set, so, it doesn’t re-set it ( at least that’s how I understood ).
I also realized that I couldn’t use a compiled version of bash or zsh that resides within my $HOME on Fedora because of SELinux ( I guess because I got thrown an avc denied error ) when attempting it.
So, now my only option is to default to /bin/bash everywhere because it is on every system and then invoke my own zsh later; but then, unfortunately, zsh doesn’t set $SHELL which in turn should invoke my $HOME/.zshrc. So, now, I’m kind of stuck without getting to use zsh though I love zsh more than bash.

I guess this would be my weekend experiment to figure out.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.