GHashTable using 64-bit ints

cottrell · July 3, 2019, 2:49pm

There’s something I’m not getting about using 64-bit int keys with GHashTable; if anyone can correct me I’d be grateful. Here’s the basics of my code:

int main (void)
{
    GHashTable *ht;
    gpointer p;
    gint64 key;
    gint i, val, N = 4;

    ht = g_hash_table_new(g_int64_hash, g_int64_equal);

    for (i=0; i<N; i++) {
	key = i + 1;
	val = i + 10;
	p = GINT_TO_POINTER(val);
	g_hash_table_insert(ht, &key, p);
	printf("%d: insert: key %ld, value %d (%p)\n", i, key, val, p);
    }

    for (i=0; i<N; i++) {
	key = i + 1;
	p = g_hash_table_lookup(ht, &key);
	val = GPOINTER_TO_INT(p);
	printf("%d: lookup: key %ld, value %d (%p)\n", i, key, val, p);
    }

    g_hash_table_destroy(ht);

    return 0;
}

What I’m getting by way of output is

0: insert: key 1, value 10 (0xa)
1: insert: key 2, value 11 (0xb)
2: insert: key 3, value 12 (0xc)
3: insert: key 4, value 13 (0xd)
0: lookup: key 1, value 11 (0xb)  // hmmm
1: lookup: key 2, value 11 (0xb)
2: lookup: key 3, value 12 (0xc)
3: lookup: key 4, value 13 (0xd)

cottrell · July 3, 2019, 5:14pm

A little more info I should have added: the results above (with an apparently erroneous lookup for key = 1) were obtained with glib 2.60.4 on Arch, and glib 2.58.3 on Fedora 29.

Could this be a case of a hashing “collision”? (As you can tell, I’m not a hashing expert.)

Anyway, the error goes away if I start the key sequence at 17 rather than at 1. And it does not occur (even with the key sequence starting at 1) if I use 32-bit integer keys and the g_direct_hash method. If there’s not an actual error in my code, perhaps the glIb docs should state that very small key values should not be used with the g_int64_hash method?

postiffm · July 3, 2019, 6:12pm

No. A hash table data structure is supposed to handle that for the
user without him having to worry about it.
That’s luck.
Very small key values should work. If they don’t, it is a bug in the
library. More likely a bug in your code.

cottrell · July 3, 2019, 6:56pm

If there’s a bug in my code (not unlikely) I think it has to be a case of misuse of the GHashTable API (otherwise it’s very straightforward C). What I’m fishing for is a diagnosis of the misuse.

postiffm · July 3, 2019, 7:32pm

Hi Allin, I was first wondering about the print statement. Is %ld correct for 64-bit int, or is it %lld?

I think more likely the problem is you are passing a pointer to key and val, which are a local (stack/temporary) variables. The pointer is stored in the hash table, and whatever changes you are making to key/val actually is changing what is in the hash table. Maybe you have to make newval = new gint with initial value = val, and store that into the hash table.

Also, the %p printout “pointed out” seemed odd to me. You are printing an address in computer memory, which should be like 0x1df33d3fd… but the address is the exact same as the numerical value that you are trying to store into the hashtable. Well, that makes sense because GINT_TO_POINTER does exactly that. But if you are telling the hash table to store whatever is in address 0xa in your program’s address space as the value associated with key, that is most certainly not what you want.

Another potential issue, from the documentation at https://developer.gnome.org/glib/stable/glib-Type-Conversion-Macros.html#GINT-TO-POINTER:CAPS

#define GINT_TO_POINTER(i) ((gpointer) (glong) (i))

Stuffs an integer into a pointer type.

Remember, you may not store pointers in integers. This is not portable in any way, shape or form. These macros only allow storing integers in pointers, and only preserve 32 bits of the integer; values outside the range of a 32-bit integer will be mangled.

This isn’t 64-bit safe.

Hope this is helpful.

cottrell · July 3, 2019, 9:08pm

Thanks again. The format “%ld” is correct for 64-bit int on Linux (which I’m running), though admittedly it’s not portable.

I think you’ve hit my problem with your comment on the keys. I took the glib doc to mean that giving a pointer to a 64-bit int was just a means of getting that int into the table as key, but if it’s really the pointer that’s stored then clearly my code is broken; I need a distinct int object for each key (not just a distinct value for the same object).

On the other hand, I believe my use of GINT_TO_POINTER to set the value (not the key), and
GPOINTER_TO_INT to retrieve it, is correct; the value is a 32-bit int which should make the round-trip successfully, according to the glib doc (and in practice).

StefanSalewski · July 4, 2019, 5:38am

I took a short look at your initial post and at the glib hash API doc yesterday.

My initial impression was, that g_hash_table_insert() expects a real pointer pointing to a real object, while you give it just an object encoded into the pointer. So when the function tries to use it to access the object where that encoded pointer points, the location is fully undefined, and results should be wrong. I was still confused because most of your results seemed to be correct, but maybe that was just “luck”.

pwithnall · July 4, 2019, 9:45am

That is precisely the problem here. You need to allocate storage for each key with g_new0 (gint64, 1) (or, to be more efficient with the allocations, allocate an array of gint64s in one call, and then index into it).

cottrell · July 4, 2019, 12:37pm

The “object encoded into a pointer” is a 32-bit int, whose role is the “value” stored in the table, and that’s legit. This “fake pointer” is never dereferenced, it’s just converted back to an integer value via GPOINTER_TO_INT on g_hash_table_lookup().

It was the key that was wrong: that was a proper pointer (to a 64-bit int) alright, but it was a pointer to one and the same object on each call to g_hash_table_insert(), with just the value stored in the object updated between iterations. I now understand (with Philip’s help) why that’s not going to work. (What I still don’t understand is why it even appeared to work, mostly.)

greg-king · July 4, 2019, 9:39pm

It “appeared to work, mostly” because most of the hashes were different. The first hash was the same as the second hash (yep, a collision).

system · July 18, 2019, 9:39pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.