GLib new resources

galetedanilo · June 28, 2020, 11:20pm

Hi guys.
I recently made a request to implement functionality for GLib, creating a title function just like it has in python. However, I was instructed by the maintainers to open a discussion here on the GNOME Discourse. I made a small implementation of this function, for me it works correctly. I would like to know if the community thinks it is viable to have this functionality in GLib, if the code I made is correct or needs improvement. Please leave your comments and criticisms on the subject, so that we can have feedback on the topic in question.

static gchar*
g_utf8_clear_space (const gchar *str)
{
  GString     *string;
  const gchar *p;
  gchar       *format;
  
  string = g_string_new ("");
  
  format = g_strdup (str);
  format = g_strstrip (format);
  
  for (p = format; *p != 0x00; p = g_utf8_next_char (p))
    {
      if (g_unichar_isspace (g_utf8_get_char (p)) && g_unichar_isspace (g_utf8_get_char (p+1)))
        /* done */;
      else      
        g_string_append_unichar (string, g_utf8_get_char (p));

    }
  
  g_string_append_c (string, 0x00);
  
  g_free (format);
  
  return g_string_free (string, FALSE);
}

static gchar*
g_utf8_title (const gchar *str)
{
  GString     *string;
  gboolean    r;
  const gchar *p;
  gchar       *format;
  
  g_return_val_if_fail (str != NULL, NULL);
  
  string  = g_string_new ("");
  r       = TRUE;
  format  = g_utf8_strdown (str, -1);
  
  for (p = format; *p != 0x00; p = g_utf8_next_char (p))
    {
      if ((g_unichar_isalpha (g_utf8_get_char (p)) || g_unichar_ismark (g_utf8_get_char (p))) && r)
        {
          g_string_append_unichar (string, g_unichar_totitle (g_utf8_get_char (p)));
          r = FALSE;
        }
      else if (g_unichar_isspace (g_utf8_get_char (p)))
        {
          g_string_append_unichar (string, g_utf8_get_char (p));
          r = TRUE;          
        }
      else
        {
          g_string_append_unichar (string, g_utf8_get_char (p));
        }

    }
  
  g_string_append_c (string, 0x00);
  
  g_free (format);
  
  return g_string_free (string, FALSE);
}

static gchar*
g_utf8_clear_title (const gchar *str)
{
  g_autofree gchar *format = g_utf8_clear_space (str);
  
  return g_utf8_title (format);
}

gchar*
on_format_text_title (const gchar *str)
{
  gchar *title = g_utf8_clear_title (str);
    
  return title;
}

nielsdg · June 29, 2020, 6:53am

I’m a bit confused here: you mention you want to implement a title() function, but then some of the functions in your code block are about clearing whitespace for some reason.

In general: +1 for having a “titlecase” API for strings in GLib (I could actually use this in some projects at least).

Btw, if there is an issue/MR on GitLab which contains prior discussion, it might make sense to link to that in your post

pwithnall · June 29, 2020, 8:13am

Thanks for posting this Danilo, hopefully others can make use of your code if they need to. What license is it under?

On the issue you filed against GLib, I actually said:

Sorry, I think that’s too specific to what a particular application would want (and also very hard to do correctly for every locale). If you can show that the same implementation of title casing would be useful for three or more applications (see CONTRIBUTING.md), then please re-open this issue. Thanks.

That response still stands — converting strings to title case is very use-case specific (why would an application want to do it; on what kind of strings; to produce what kind of title case) and would be hard to get working across multiple locales. Applications with specific use cases may be able to avoid that difficulty perhaps by ignoring non-Western-European locales (depending on their use case and target audience, I’m not advising it though). GLib can’t, so any implementation in GLib would have to do something appropriate for all locales.

To be clear, adding this functionality would need a very strongly evidence-backed proposal of how it would make several application authors’ lives easier, and would need to come with an implementation which can be shown to make sense for all locales.

galetedanilo · June 29, 2020, 11:07am

Hello everyone, anyone can use this code if they want. In my case I have several input fields for the user, and I don’t want the data entered to look like this:
` my name is danilo.’ several blank spaces

But yes:
My Name is Danilo.

pwithnall · June 29, 2020, 11:21am

This illustrates my point well: the code you’ve posted will actually generate the string My Name Is Danilo. (note the capitalised ‘Is’). Typically title case in English will keep short words and connectives (‘is’, ‘or’, ‘and’, etc.) in lowercase, though; and that’s where it becomes difficult to implement in a way that works for every locale and every use-case.

This kind of string manipulation is too specific to a single use-case to be in GLib.

MichiB · June 29, 2020, 3:27pm

In my opinion the User should decide what and how many Spaces are needed. Just saying …

zbrown · June 29, 2020, 4:05pm

Okay but spaces aren’t really the problem here?

galetedanilo · June 29, 2020, 11:32pm

Thank you very much for your opinion

galetedanilo · June 29, 2020, 11:36pm

It really gets a little difficult, but does the functionality of this function in python make this distinction too?

matthiasc · July 13, 2020, 8:18pm

This is imo a bit beyond the unicode functionality that glib can and should provide out of the box. We have case conversion and case properties for individual characters, and we have breaking properties. But the actual Unicode break algorithm (TR14) is implemented in pango.

system · July 31, 2020, 5:21am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.