Some colleagues and I are having problems with a program in which libpango calls the glib function ‘g_utf8_validate()’. Regardless of platform, it seems consistent for very simple UTF8 strings - i.e. the low value characters, such as numbers and A-Z characters. But for anything more complicated it’s giving different results for different platforms and compilers. The higher value UTF8 characters seem to get displayed okay if compiled for Linux (with gcc). And some of them work if compiled for Windows (with gcc…) But I haven’t found any yet that’ll display properly if compiled for Windows with MSVC.
So can anyone point me in the direction of a coding example that’ll show ‘g_utf8_validate()’ being used correctly?
Just to discount this possibility first off: are you sure the strings you’re handling on Windows are UTF-8 and not UTF-16? The native encoding on Windows is UTF-16 and depending on where you’re getting the strings from, they potentially are encoded as that.
IIUC the string “\u00A9” should equate to the UTF8 character for a copyright symbol but after g_utf8_validate() returns, valid is FALSE. Admittedly this is just a single charavter - so does it need to be NULL terminated maybe?
The “\uXXXX” escape sequence a C99 feature, and it’s converted by the compiler, sure; but it’s only going to be applied to string literals, not to any random string.
Good point, in our case they are all string literals which are interpreted at compile time in C++11 code [1]. This works fine with gcc, clang and mingw, but apparently fails with MSVC for @johne53 for some reason.
Unicode literals in Visual C++ - Stack Overflow suggests that MSVC might be interpreting the \u escapes into the wrong codepage, so they are compiled as something which isn’t UTF-8. Is that the case?
Adding u8 before the relevant strings has helped a lot and doesn’t seem to be upsetting the non-MSVC compilers so far! Many thanks for everyone’s help with this.