G_print vs g_ptintf and UTF8

I wanted to print an umlaut using g_print, but instead of the umlaut, I get a “?”.

If I use g_printf, the umlauts print correctly, but then the compiler throws a warning.

Can anyone explain this behavior?

// gcc -o main main.c `pkg-config --cflags --libs glib-2.0`

#include <glib.h>



int main() {
    g_printf("Ich bin ein Umlaut: öäü ÜÄÖ\n");
    g_print("Ich bin ein Umlaut: öäü ÜÄÖ\n");
    return 0;
} 

Output:

gcc -o main main.c `pkg-config --cflags --libs glib-2.0`
main.c: In function ‘main’:
main.c:8:5: warning: implicit declaration of function ‘g_printf’; did you mean ‘g_print’? [-Wimplicit-function-declaration]
    8 |     g_printf("Ich bin ein Umlaut: öäü ÜÄÖ\n");
      |     ^~~~~~~~
      |     g_print
tux@tux-B660M-DS3H-DDR4:~/Schreibtisch/g_printf$ ./main 
Ich bin ein Umlaut: öäü ÜÄÖ
Ich bin ein Umlaut: ??? ???

type or paste code here

Hi,

I can’t test right now, but you probably need to add:

setlocale(LC_ALL, "");

at the start of the main to properly support unicode.

1 Like

Knowing nothing of the history behind this, I see that <glib.h> includes a bunch of other headers, including (sorry) <glib/gmessages.h>, which declares g_print. But g_printf is declared in <glib/gprintf.h>, which is not. So adding that fixes the warning; the docs mention this.

g_printf – added later in glib v2.2 – is intended to be an implementation of a “latest and greatest” version of the standard printf, to avoid platform-specific quirks. It so happens that just plain printf does not know or care about UTF-8. It’s just a byte sequence in a C-string – if that’s what you’ve actually got in your source file. (Do a hexdump to check.) So g_printf will pass it through, and if your terminal understands UTF-8, you see what you expect.

The original g_print however, does care apparently, and tries to apply the locale setting. If it is the default minimal 7-bit ASCII "C" locale, those “extended” characters are replaced as unprintable ? (Interesting: if you add an emoji, which encodes as four bytes of UTF-8, that is replaced with a single ?. The umlauts are two bytes each. So at least with my setup, it’s reading the bytes as UTF-8.) Using setlocale as suggested fixes the problem. Does the blank "" mean “don’t bother”, to act like printf? Using a specific locale like "en_US.UTF-8" also works.

Passing ”” as the locale to setlocale() means “look the user’s preferred locale up from the environment”.

man setlocale says:

   If locale is an empty string, "", each part of the locale that should be
   modified is set according to the environment variables.  The details are
   implementation-dependent.   For  glibc,  first (regardless of category),
   the environment variable LC_ALL is inspected, next the environment vari‐
   able with the same name as the category (see the table above),  and  fi‐
   nally  the  environment  variable  LANG.  The first existing environment
   variable is used.  If its value is not a valid locale specification, the
   locale is unchanged, and setlocale() returns NULL.
1 Like

Thanks, now it’s clear.

I’ll just use g_printf in the future.