GLib new resources

Hi guys.
I recently made a request to implement functionality for GLib, creating a title function just like it has in python. However, I was instructed by the maintainers to open a discussion here on the GNOME Discourse. I made a small implementation of this function, for me it works correctly. I would like to know if the community thinks it is viable to have this functionality in GLib, if the code I made is correct or needs improvement. Please leave your comments and criticisms on the subject, so that we can have feedback on the topic in question.

static gchar*
g_utf8_clear_space (const gchar *str)
{
  GString     *string;
  const gchar *p;
  gchar       *format;
  
  string = g_string_new ("");
  
  format = g_strdup (str);
  format = g_strstrip (format);
  
  for (p = format; *p != 0x00; p = g_utf8_next_char (p))
    {
      if (g_unichar_isspace (g_utf8_get_char (p)) && g_unichar_isspace (g_utf8_get_char (p+1)))
        /* done */;
      else      
        g_string_append_unichar (string, g_utf8_get_char (p));

    }
  
  g_string_append_c (string, 0x00);
  
  g_free (format);
  
  return g_string_free (string, FALSE);
}

static gchar*
g_utf8_title (const gchar *str)
{
  GString     *string;
  gboolean    r;
  const gchar *p;
  gchar       *format;
  
  g_return_val_if_fail (str != NULL, NULL);
  
  string  = g_string_new ("");
  r       = TRUE;
  format  = g_utf8_strdown (str, -1);
  
  for (p = format; *p != 0x00; p = g_utf8_next_char (p))
    {
      if ((g_unichar_isalpha (g_utf8_get_char (p)) || g_unichar_ismark (g_utf8_get_char (p))) && r)
        {
          g_string_append_unichar (string, g_unichar_totitle (g_utf8_get_char (p)));
          r = FALSE;
        }
      else if (g_unichar_isspace (g_utf8_get_char (p)))
        {
          g_string_append_unichar (string, g_utf8_get_char (p));
          r = TRUE;          
        }
      else
        {
          g_string_append_unichar (string, g_utf8_get_char (p));
        }

    }
  
  g_string_append_c (string, 0x00);
  
  g_free (format);
  
  return g_string_free (string, FALSE);
}

static gchar*
g_utf8_clear_title (const gchar *str)
{
  g_autofree gchar *format = g_utf8_clear_space (str);
  
  return g_utf8_title (format);
}

gchar*
on_format_text_title (const gchar *str)
{
  gchar *title = g_utf8_clear_title (str);
    
  return title;
}

Iā€™m a bit confused here: you mention you want to implement a title() function, but then some of the functions in your code block are about clearing whitespace for some reason.

In general: +1 for having a ā€œtitlecaseā€ API for strings in GLib (I could actually use this in some projects at least).

Btw, if there is an issue/MR on GitLab which contains prior discussion, it might make sense to link to that in your post :slight_smile:

Thanks for posting this Danilo, hopefully others can make use of your code if they need to. What license is it under? :slightly_smiling_face:

On the issue you filed against GLib, I actually said:

Sorry, I think thatā€™s too specific to what a particular application would want (and also very hard to do correctly for every locale). If you can show that the same implementation of title casing would be useful for three or more applications (see CONTRIBUTING.md), then please re-open this issue. Thanks.

That response still stands ā€” converting strings to title case is very use-case specific (why would an application want to do it; on what kind of strings; to produce what kind of title case) and would be hard to get working across multiple locales. Applications with specific use cases may be able to avoid that difficulty perhaps by ignoring non-Western-European locales (depending on their use case and target audience, Iā€™m not advising it though). GLib canā€™t, so any implementation in GLib would have to do something appropriate for all locales.

To be clear, adding this functionality would need a very strongly evidence-backed proposal of how it would make several application authorsā€™ lives easier, and would need to come with an implementation which can be shown to make sense for all locales.

Hello everyone, anyone can use this code if they want. In my case I have several input fields for the user, and I donā€™t want the data entered to look like this:
` my name is danilo.ā€™ several blank spaces

But yes:
My Name is Danilo.

This illustrates my point well: the code youā€™ve posted will actually generate the string My Name Is Danilo. (note the capitalised ā€˜Isā€™). Typically title case in English will keep short words and connectives (ā€˜isā€™, ā€˜orā€™, ā€˜andā€™, etc.) in lowercase, though; and thatā€™s where it becomes difficult to implement in a way that works for every locale and every use-case.

This kind of string manipulation is too specific to a single use-case to be in GLib.

In my opinion the User should decide what and how many Spaces are needed. Just saying ā€¦

Okay but spaces arenā€™t really the problem here?

1 Like

Thank you very much for your opinion

It really gets a little difficult, but does the functionality of this function in python make this distinction too?

This is imo a bit beyond the unicode functionality that glib can and should provide out of the box. We have case conversion and case properties for individual characters, and we have breaking properties. But the actual Unicode break algorithm (TR14) is implemented in pango.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.