Different methods to use GSList as value of a GHashTable

Given the code snippet:

#include <glib.h>
#include <stdio.h>

void print_city(gpointer value, gpointer data)
{
    printf("%s, ", value);
}
void print(gpointer key, gpointer value, gpointer data)
{
    printf("Here are some cities in %s: ", key);
    g_slist_foreach((GSList *)value, (GFunc)print_city, NULL);
    printf("\n");
}

void destroy(gpointer key, gpointer value, gpointer data)
{
    printf("Freeing a GSList, first item is %s\n", ((GSList*)value)->data);
    g_slist_free(value);
}

int main(int argc, char *argv[])
{
    GHashTable *hash = g_hash_table_new(g_str_hash, g_str_equal);

    g_hash_table_insert(hash,
            "Virginia",
            g_slist_append(g_hash_table_lookup(hash, "Virginia"), "Richmond")
    );

    g_hash_table_insert(hash,
            "Virginia",
            g_slist_append(g_hash_table_lookup(hash, "Virginia"), "Keysville")
    );

    g_hash_table_insert(hash,
            "Texas",
            g_slist_append(g_hash_table_lookup(hash, "Texas"), "Houston")
    );

    g_hash_table_insert(hash,
            "Texas",
            g_slist_append(g_hash_table_lookup(hash, "Texas"), "Austin")
    );

    g_hash_table_foreach(hash, print, NULL);
    g_hash_table_foreach(hash, destroy, NULL);
    g_hash_table_destroy(hash);
    return 0;
}

We can notice that the lines in main function that calls g_hash_table_insert, looks a bit convoluted, repeating itself, hence I was looking code alternatives to simplify the value parameter. With that in mind, my first attempt to change the code is the following (comments added)

GSList *list = NULL;

// Adding the first element to `list`
// list contains the pointer to the GSList head
list = g_slist_append(list, "Austin");

// Inserting the list (an address  to a GSLIST) into
// the value of "Texas"  key.
g_hash_table_insert(hash, "Texas", list);

// Retrieves the content of "Texas"  key.
// (That is right now the pointer to the  GSList head)
// and stores into value pointer
list = g_hash_table_lookup(hash, "Texas");

// Add the string "Houston" to the GSList and stores 
// the new head at `value` to keep track of head.
list = g_slist_append(list, "Houston");

// Updates the 'key' Texas with list.
g_hash_table_insert(hash, "Texas", list);

Does not looks much better. I have also added a few comments in code to describe my understanding.

Question 1: Does my understanding (expressed in comments) are correct?

Question 2: Keeping the code consistency, is there any way to improve the code above?

I noticed that I could remove all calls to g_hash_table_insert after the first one, maybe sacrificing some consistency. The code in this case would looks like this

GSList *list = NULL;

list = g_slist_append(list, "Austin");
g_hash_table_insert(hash, "Texas", list);
list = g_hash_table_lookup(hash, "Texas");
list = g_slist_append(list, "Houston");

Question 3: How this change affects the consistency?

Question 4: Which behaviors could emerge from this change?

Anyhow, given this previous snippet I tried to make things more “organized” hence I reached the follow snippet to update the list in GHashTable:

GSList *list = NULL;
g_hash_table_insert(hash, "Texas", list);
list = g_hash_table_lookup(hash, "Texas");

list = g_slist_append(list, "Austin");
list = g_slist_append(list, "Houston");

The code above seems more organized, but fails completely in work (even triggering segfault on free calls). I could not understand why this last snippet fails.

Question 5: Why the segfault is happening?

Question 6: Is there a less convoluted way to handle GSLists in GHashTables that not uses the initial construction?

Interesting.

I have to admit that I know nothing about glib lists and hashes – and I would prefer to use a real programming language whenever possible. But for your last code sniped I may have an idea: You put a pointer with value null into the hash table and got that null pointer back. After that you use g_slist_append() to add elements – I assume g_slist_append() has a special handling for cases when list is still null, so it creates the list then. But then the list is alive, and is in no way related to the hash any more. When I had to use C and glib, I would start with this strategy: First create the list and add elements, where adding the first elements seem to do the creation. After that add the list to the hash table. When necessary get the list back from hash table and add more elements. But maybe I would consult API docs before. And generally adding some testing code, like test for null result of course.

Unless there’s some good reason to use linked lists (single or double), I would suggest using a GPtrArray instead. Lower allocation and traversal overheads, and less complex to handle.

1 Like

Sorry Stefan, I could not understand what you mean about “real programming language”, isn’t C a real programming language?

Completely agree with you here. Probably that is the reason.

@pwithnall if linked lists are not good for any reason why do they exist in GLib?

I could also go for, why do we teach/learn them in CS lessons?

But yes. I think I will go with GPtrArray or GArray (if all elements in array are the same).

Thank you.

Well, C is an early attempt of a programming language, about 50 years ago, at a time where languages like Cobol, Algol and Fortran were created. The ceators of C did a great job at that time, C is still fine for microcontrollers, device drivers, linux kernel, and gtk core supported by glib and gobject. The big benefit of plain C is easy interfacing with other languages.

But for other software development we have so many modern languages now – like Rust, D, Nim, Zig, Go, Haskell, Crystal to name only a few of those running native on bare metal/without stuff like virtual machine. And many more like Scala, Julia or Kotlin running on virtual machines.

Coding in plain C is such hard work with slow progress – but of course it is good exercise like weight lifting.

I have made an initial implementation using these ideas. Those can be seen here Learning GLib - Graph implementation (a draft)

A general suggestion: when you have this kind of problems is much cleaner to wrap the GLib code with your own specialized functions and hide the implementation as much as you can, for example (untested):

// country.h: public interface

typedef struct _Country Country;

Country *   country_new         (void);
void        country_town_add    (Country *      country,
                                 const gchar *  state,
                                 const gchar *  town);
...


// country.c: implementation

struct _Country {
    GHashTable *states;
};

Country *
country_new(void)
{
    Country *coutry = g_new(Country, 1);
    country->states = g_hash_table_new(g_str_hash, g_str_equal);
    return country;
}

void
country_town_add(Country *country, const gchar *state, const gchar *town)
{
    GSList *towns = g_hash_table_lookup(country->states, state);
    towns = g_slist_append(towns, town);
    g_hash_table_insert(country->states, state, towns);
}

This has the huge advantage you can reimplement country.c in whatever way you want, e.g. by using @pwithnall idea, without having to touch the code above it.

@ntd have you seen the code here? Learning GLib - Graph implementation (a draft)

ps: I’m not a fan on typedeffing everything.

ps2: But for sake of export a clean object, maybe is a good approach.

Yes, you are still exposing everything. Don’t get me wrong: it is perfectly fine for an example or inside an application but if you want to improve code reusability you should be able to freely rework the implementation (both code and data structure) without touching the API (the public interface).

Just dont’ use typedef, it is not mandatory. The principle still stands: hide as much as you can.

Probably I don’t know how to not expose. IMHO the Graph implementation code (in the link above ) isn’t different of your code. Is just a matter of separate what is in .h and what is in .c?

Yes, that would help a lot. The .h should be left untouched, apart adding new code.

For instance, struct graph should be a forward declaration in the header, so if you decide to use something different from a hash table you are free to do it. The same goes for struct node and struct neighbor, although I would use a more radical approach and completely hide all of them (together with some dubious functions such as graph_node, graph_{node,neighbors}_print, graph_node_{neighbor,value}_destroyed that IMO are exposing too much).

If you carefully review country.h, you’ll see that I’m just exposing an opaque data type (Country), an obvious API (country_town_add) and not much more. An external application does not need to know what Country is or how country_town_add is implemented, it just needs to know what that function does at conceptual level.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.