Determining whether container elements are modifiable from GI annotations

pclayton · July 23, 2020, 9:55pm

Have you already solved

Gtk – 3.0

For me such functions with GSlist result are hard for fully automatic bindings generation.

For SML bindings, I have put in much of the framework required for collections but I haven’t supported G[S]List types because I haven’t needed them yet.

API docs note [element-type GtkWindow][transfer none] : Do you know which function in gobject-introspection API provides element-type?

Well there is

https://developer.gnome.org/gi/stable/gi-GITypeInfo.html#g-type-info-get-param-type

which I have already used for arrays. May that work for GSlist as well?

Yes, I expect it to work for G[S]List and also for GHashTable, which has two parameters, I believe, the key type and the element type.

pclayton · July 30, 2020, 11:15am

I’ve not seen an answer so I’ll ask a simpler question, this time about struct parameters.

Take, for example, gdk_rgba_parse. The parameter rgba is an ‘in’ parameter whose ownership is not transferred to the function temporarily, so the function modifies the parameter without having ownership. Is this allowed? Or do annotations need to be changed so that rgba has direction ‘inout’ (which defaults to transfer ‘full’), as done for e.g. pango_matrix_transform_rectangle. I presume gdk_rgba_parse is fine otherwise lots of functions would be missing annotations but I would be grateful for confirmation.

fmuellner · July 30, 2020, 11:47am

No, the rgba parameter is the instance parameter. Conceptually it is the same as gtk_window_set_title(), where the window parameter isn’t an inout parameter that the function manipulates, but the instance the method operates on.

The difference that GtkWindow is a GObject and GdkRGBA is a GBoxed isn’t relevant for this particular question.

StefanSalewski · July 30, 2020, 12:03pm

gdk_rgba_parse () is indeed an interesting example. I have just inspected the Nim bindings and I am not really happy with it. I would like to pass the GdkRGBA struct as a Nim var parameter because it is modified by GDK, but it is passed as a pointer. If gobject-introspection would call it out or inout type my bindings would pass it as var automatically. (Nim uses var procedure parameters like Pascal did it.) I have to investigate that.

Types like GdkRGBA or Rectangle or Textiter and such are a bit special for the Nim bindings, we allocate such objects on the stack as the objects are very simple objects, and pass its address to GTK. But that fact is not relevant here.

pclayton · July 30, 2020, 2:42pm

Yes, good point! That example didn’t help with my question.

I think a better example would be cairo_transform_to_window. The parameter cr is not the instance parameter according to the GIR file and ownership is not transferred (AFAICS) but it is modified. Perhaps in this case, as cairo_t is a reference counted type so is inherently single-instance, there is no need to say the parameter is ‘inout’ with ownership temporarily transferred. Is that the right way to look at this?

ebassi · July 30, 2020, 4:15pm

Cairo is not an introspectable API, and it’s not based on GObject, so you cannot apply any mechanism in gobject-introspection to make it work.

StefanSalewski · July 30, 2020, 5:00pm

I am a bit confused by your terminology still. I still think that ownership transfer refers to memory deallocation, so it is used for larger types allocated on the heap, and when ownership is transferred then the user is responsible for some sort of deallocation. The in, out and inout specification refers to the modification of the data – an plain integer variable with out direction transfers information to the user, the wirthian languages label such parameters with var keyword. The reply of Florian makes sense of course, but it is bit strange to regard
the GdkRGBA struct as an instance variable like it is a widget which may be passed to GTK as a pointer and which intern state is modified by GTK. But generally it is OK, it works for Nim at least.

[EDIT]

https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations

Transfer modes:

full: the recipient owns the entire value. For a refcounted type, this means the recipient owns a ref on the value. For a container type, this means the recipient owns both container and elements .

pclayton · August 4, 2020, 2:14pm

For an ‘inout’ parameter with some ownership being transferred (‘container’ or ‘full’), ownership is transferred to the function when called and the same ownership is transferred back from the function when it returns. In that sense, ownership is temporarily transferred to the function. (Maybe there is better terminology.)

Certainly, but I think there is more too it. Until recently it was my understanding that ownership is needed to ensure a value persists and ownership must be relinquished using unref or free as appropriate. I now believe that ownership is also needed just to modify a value, even if it isn’t freed.

For example, I believe that is why pango_matrix_transform_rectangle has parameter rect annotated as ‘inout’ which defaults to transfer ‘full’. If rect were just read it could have been an ‘in’ parameter but, because it is modfied, it must be an ‘inout’. Note that the function can’t possibly free rect because the C type for this ‘inout’ parameter doesn’t have the extra level of indirection usually found for ‘inout’ parameters: afterwards, the caller assumes that *rect is PangoRectangle that it owns.

If the GIR file says a GdkRGBA struct is an instance parameter that’s good enough for me - I’m not going to question the abstraction! It makes sense that an instance parameter can be modified and this doesn’t need to be stated explicitly.

Also, it is understandable that the rules are not applicable to non-GObject types as pointed out by @ebassi.

pclayton · August 10, 2020, 10:06am

@ebassi, that is good to know. I may raise an issue to document the scope of GI annotations.

Unfortunately, that example didn’t help with my question either. Third time lucky. Hopefully a better example is g_socket_receive. The parameter buffer is an ‘in’ parameter whose ownership is not transferred to the function, so the function modifies it without having ownership. Can you confirm that this is not allowed and therefore the annotations must be changed? For example, by adding (out caller-allocates).

ebassi · August 10, 2020, 11:29am

Of course it’s allowed: you can modify arguments that you don’t own, otherwise the entirety of GLib would not be possible, since all setter functions will modify the instance argument. Ownership does not imply mutability: you can modify the contents of things you don’t own, just like you can own things you can’t modify.

In this case, the caller owns the buffer argument—the caller allocates it, and it must allocate at least size bytes. Then the buffer is passed to the callee, which will fill it out. At most you could construe the case that the argument is inout, but it’s not really accurate: the callee will not read the contents of buffer and change them: buffer is merely the container where the data will be stored. Additionally, since it’s a char* and not a container type—i.e. GList or GArray—the transfer: container annotation does not apply. Ownership describes who is in charge of releasing the memory when the scope ends; in this case, the caller owns the memory, and the data is stored inside the C array. Since it’s not pointer data—the function fills out uint8—the data block is still owned by the caller, not the callee.

Of course, this is a description of a C API: just because we can describe the C API in a machine readable way, it does not mean all other languages can map to it natively. In various cases we had to add a bytes variant that either takes or allocates a GBytes instance; in other cases, language bindings had to write wrappers.

StefanSalewski · August 10, 2020, 1:40pm

I strongly expected such a reply, I think I tried to tell him the same already.

But he has a valid point: It is hard to discover by use of gobject-introspection that the buffer is modified by gio. Current Nim bindings generate

$ grep -A9 g_socket_receive nim_gi/gio.nim 
proc g_socket_receive(self: ptr Socket00; buffer: uint8Array; size: uint64;
    cancellable: ptr Cancellable00; error: ptr ptr glib.Error = nil): int64 {.
    importc, libprag.}

proc receive*(self: Socket; buffer: seq[uint8] | string; cancellable: Cancellable = nil): int64 =
  let size = uint64(buffer.len)
  var gerror: ptr glib.Error
  let resul0 = g_socket_receive(cast[ptr Socket00](self.impl), unsafeaddr(buffer[0]), size, if cancellable.isNil: nil else: cast[ptr Cancellable00](cancellable.impl), addr gerror)
  if gerror != nil:
    let msg = $gerror.message
    g_error_free(gerror[])
    raise newException(GException, msg)
  result = resul0

That should work, as we pass the address of the Nim buffer content to glib. But a fully correct function signature would be

proc receive*(self: Socket; buffer: var seq[uint8] | var string; cancellable: Cancellable = nil): int64 =

Note the var keyword after buffer:

That indicates that the buffer can be modified by the procedure call, while missing of var keyword would give the wrong impression that buffer is not modified.

Do you have an idea how gobject-introspection can tell me that buffer is modiefied? You mentioned inout parameter type, maybe that would help.

ebassi · August 10, 2020, 1:51pm

Anything that is passed by pointer and is not explicitly marked as a C const pointer, or with an inout or out annotations, is by definition a mutable reference that you pass to a callable. We don’t have a “mutable” annotation or keyword in C, or in the C ABI. This is not Rust.

To be fair, the GSocket API is heavily modeled on the equivalent recv() function call from POSIX, which takes a generic void* buffer of bytes, and fills it up; it would be extremely inefficient if recv()/g_socket_receive() returned a new buffer for every call, as it would prevent recycling the buffer in a loop, or allocating the buffer on the stack to minimise memory fragmentation.

This is why it’s typically better to use GSocketClient, GSocketService, or GSocketConnection, which use streams—as the documentation for GSocket mentions.

pclayton · August 10, 2020, 2:05pm

For sure, an instance parameter must be modifiable, as mentioned in an earlier message. I meant to ask about non-instance parameters only.

Thanks - that’s exactly what I wanted to know. So there is no way to know from GI annotations whether a non-instance parameter is not modified.

Interestingly, this means that the const qualifiers in C offer some protection when working with C arrays that isn’t available in other languages via introspection. Of course, that doesn’t extend to other container types in C. It’s just a perk when using C arrays in C.

I don’t understand this: if you can’t modify something, then surely you can’t free it (because freeing would allow the memory to be used for another purpose).

In this case, the caller owns the buffer argument—the caller allocates it, and it must allocate at least size bytes. Then the buffer is passed to the callee, which will fill it out. At most you could construe the case that the argument is inout, but it’s not really accurate: the callee will not read the contents of buffer and change them: buffer is merely the container where the data will be stored. Additionally, since it’s a char* and not a container type—i.e. GList or GArray—the transfer: container annotation does not apply. Ownership describes who is in charge of releasing the memory when the scope ends; in this case, the caller owns the memory, and the data is stored inside the C array. Since it’s not pointer data—the function fills out uint8—the data block is still owned by the caller, not the callee.

It was a similar situation with the 2.60 version of g_input_stream_read, where the parameter buffer was an ‘in’ parameter that was modified without ownership of its elements being transferred. You’ve confirmed that the annotations there were acceptable.

In the 2.62 version of g_input_stream_read , the parameter buffer was change to ‘out caller-allocates’. I’ve questioned whether the same should be done for g_socket_receive. GLib should surely provide a consistent interface.

pclayton · August 10, 2020, 2:35pm

As mentioned in another reply, g_input_stream_read has the parameter buffer annotated as ‘out caller-allocates’, so wouldn’t a high-level language using the stream-based API get a new buffer allocated on every call?

StefanSalewski · August 10, 2020, 2:46pm

For me that indicates the caller, that is the user, provides the buffer. Buffer may be allocated by the user on the heap, or maybe user can provide the address of an array living on the stack. In no case there is a reason to allocate a buffer more than once, as we can use the same buffer again, gio will fill the buffer with new data for each call.

ebassi · August 10, 2020, 2:49pm

Sure, and maybe it’s entirely okay to change the annotation of g_socket_receive() to be out caller-allocates to match the change in GInputStream. After all: yes, the bytes buffer is allocated by the caller.

The performance angle matters a lot less if you’re using a high level language; after all, you’re allocating a lot of stuff anyway, and it’s up to the language to deal with memory fragmentation.

As I said above: either constness or an out direction can be used to determine whether an argument is a mutable reference or not.

pclayton · August 10, 2020, 5:17pm

@StefanSalewski, that suggests to me that you don’t treat ‘in’ any differently from ‘out caller-allocates’. That’s the case in C and seems reasonable for bindings with a similar level of abstraction in that respect. For the SML bindings I’m developing, there is a higher level of abstraction: ‘out’ means a value is passed from the callee to the caller, and nothing is passed in the other direction regardless of who allocates the parameter.

pclayton · August 11, 2020, 9:00am

Yes, though in situations where most work is done by the library calls, e.g. computing a checksum of a file, I wouldn’t expect performance to suffer much due to use of a high-level language.

Allocating a new buffer on each read introduces scope for inefficiency that reuse of the same buffer would avoid. Perhaps this is just the price to pay for a higher level of abstraction and I have found that the price isn’t too bad, at least in the checksum example: a small performance loss and the need to manually trigger GC in the application code. I compared two examples that compute a checksum: 1. using a binding to g_input_stream_read where the caller supplies the same buffer on each call and 2. using a binding to g_input_stream_read that allocates a new buffer on each call. For 1, I found the average time for a SHA-256 sum of a 3.9 GiB file to be 23.85 s. For 2, the performance was terrible due to virtual memory use but by explicitly triggering GC in the application code when each buffer is no longer required, the average time was 25.35 s (with ~115 % CPU use due to the parallelized GC). That’s acceptable performance but requiring memory management hints in the application code has its downsides.

I presume the constness information here is from the c:type attribute of the GIR file and this isn’t available via the girepository API.

ebassi · August 11, 2020, 12:39pm

The C ABI has no concept of constness, so it cannot be available to libraries that wrap that C ABI via libffi, like libgirepository.

system · August 25, 2020, 12:50pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.