GLib UTF-8 functions assume 8-bit guchar?

narwahl · March 30, 2024, 9:54am

It seems there’s a baked-in assumption that a guchar is 8 bits wide [1]. Is that a safe assumption to make for all platforms on which GLib is supported?

POSIX mandates that a byte is exactly 8 bits. So the question is whether GLib requires (implicitly or explicitly) POSIX compliance in this regard.

[1] looking at code such as g_utf8_validate → fast_validate() and at the utf8_skip_data array,

pwithnall · March 30, 2024, 10:30am

Yes, GLib assumes an 8-bit char/uchar. GLib is not supported on platforms which have wider char, such as PDP machines or various DSPs.

What’s your use case?

narwahl · March 30, 2024, 7:56pm

Thanks. I work on Pacemaker (an HA cluster resource manager). We don’t have an exhaustive list of supported platforms; we test against RHEL, Fedora, CentOS, Debian, Ubuntu, OpenSUSE, and FreeBSD on several CPU architectures. We don’t explicitly require an 8-bit char. However, AFAIK all of our test systems use an 8-bit char.

We use GLib heavily – mostly for lists, hash tables, option parsing, and the main event loop, and a little bit of GString.

I recently wrote some CHAR_BIT-aware code to handle non-ASCII UTF-8 characters. I wanted to look into replacing it with GLib calls instead of reinventing the wheel.

I’ll discuss with the project lead next week… perhaps we have enough of a de facto dependency on 8-bit char via GLib already, to proceed with g_utf8_next_char(), etc.

system · April 29, 2024, 7:57pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.