Determining whether container elements are modifiable from GI annotations

Am I right in thinking that an introspectable C function requires ownership of the elements of a container parameter to be passed to it just to modify, not free, the elements, even if the elements are not allocated, i.e. don’t have ownership?

It sounds obvious but I originally thought a C function could modify C array elements with a basic type, e.g. guint8, without ownership being transferred, due to the 2.60 version of g_input_stream_read where the ‘in’ parameter buffer was modified without ownership of its elements being transferred. I can see that without transferring ownership, this could result in bindings allowing code that writes to read-only data. Still I’d like to check that I’ve now understood correctly.

Aside: The fix in the 2.62 version of g_input_stream_read uses an ‘out caller-allocates’ parameter. Although that means the same array can be used for successive reads when calling from C, it’s not possible to reuse the same parameter for a high-level language that hides the caller/callee allocation, which could be detrimental for performance, as I’ve noted.

Your question is a bit difficult. And I am currently not sure how much you know about gobject-introspection. Maybe you are the one who created the fortran bindings, I can not remember. I am the Nim guy.

where the ‘in’ parameter buffer was modified without ownership of its elements being transferred.

That statement seems to be strange. I think ownership is never transferred from the user program to gtklibs. Only the other way – gtklibs may allocates memory and the user program can be responsible to free it later.

Your example g_input_stream_read () is very hard for binding generation of course. Not because it is so hard to understand, but because of the fact that the user has to provide the buffer which is then filled by gio lib. Generating such a binding manually is easy, and yes we can reuse the same buffer for multiple calls in principle. But generating that automatically only by using data available from gobject-introspection is very hard.

And finally, do we want to use that function really from other languages? Most languages have their own functions for stream handling, so why care to use gio for that.

Indeed the automatically Nim generated wrapper code is not that bad:

grep -A20 "proc g_input_stream_read(" ~/.nimble/pkgs/gintro-#head/gintro/gio.nim 
proc g_input_stream_read(self: ptr InputStream00; buffer: uint8Array; count: var uint64;
    cancellable: ptr Cancellable00; error: ptr ptr glib.Error = nil): int64 {.
    importc, libprag.}

proc read*(self: InputStream; buffer: var seq[uint8]; count: var uint64;
    cancellable: Cancellable = nil): int64 =
  var gerror: ptr glib.Error
  let resul0 = g_input_stream_read(cast[ptr InputStream00](self.impl), unsafeaddr(buffer[0]), count, if cancellable.isNil: nil else: cast[ptr Cancellable00](cancellable.impl), addr gerror)
  if gerror != nil:
    let msg = $gerror.message
    g_error_free(gerror[])
    raise newException(GException, msg)
  result = resul0

The first proc is the low level gio call, and the exported read() proc is the high level Nim interface. User provides a seq[uint8] as buffer, parameter is marked with var keyword to be mutable. In Nim a seq is a variable size container, we can query actual length by len() proc. Problem here in the auto generated bindings is that we still use the count parameter, but do not compare it to the actual capacity of the user provided seq. So a too large count can generate write access to unallocated memory. What is missing here is something like

if count > buffer.len:
  buffer.setLen(count)
1 Like

Your question is a bit difficult. And I am currently not sure how much you know about gobject-introspection. Maybe you are the one who created the fortran bindings, I can not remember. I am the Nim guy.

Thanks for the response. I am not the Fortran guy, I am developing bindings for Standard ML but you won’t have seen these because I have not yet published them.

where the ‘in’ parameter buffer was modified without ownership of its elements being transferred.

That statement seems to be strange. I think ownership is never transferred from the user program to gtklibs. Only the other way – gtklibs may allocates memory and the user program can be responsible to free it later.

I don’t see any indication in the annotation documentation that ownership can’t be passed to a C function. In practice, from what I have seen, I don’t think your claim is entirely true. (Instead of saying ‘never’, I think it would be fair to say ‘almost never’.) For a counter example:

  1. There do exist functions that take ownership via an ‘in’ parameter. For example, in g_environ_setenv and g_environ_unsetenv, envp is an ‘in’ parameter with transfer mode ‘full’. However, as an ‘in’ parameter does not return ownership, these functions transfer ownership of envp back to the caller via the return value. It is an unusual style - it would be more idomatic to make envp an ‘inout’ parameter - but perhaps there were C reasons for this. (I believe my my Standard ML bindings would give the same interface either way.) Still, this style seems acceptable.

  2. An ‘inout’ parameter with transfer mode ‘full’ (the default for an ‘inout’ parameter) does transfer ownership to the C function. There are quite a few functions that do this, for example gtk_init. Although ownership is also transferred back to the caller and there is no overall change in ownership, giving the C function temporary ownership allows it to reallocate the parameter. In the case that the ‘inout’ parameter is nullable, it may be freed with nothing allocated in its place.

Your example g_input_stream_read () is very hard for binding generation of course. Not because it is so hard to understand, but because of the fact that the user has to provide the buffer which is then filled by gio lib. Generating such a binding manually is easy, and yes we can reuse the same buffer for multiple calls in principle. But generating that automatically only by using data available from gobject-introspection is very hard.

I did not find it a problem to generate such a binding, though I have put quite a lot of effort into supporting C arrays. At the moment I avoid copying unowned C arrays returned from C functions for efficiency on the grounds that their lifespan should be sufficiently long but I found that there was nothing, in principle, preventing such an unowned guint8 array being overwritten by passing it to the 2.60 version of g_input_stream_read. Upon investigation, I have come to realize that it was g_input_stream_read breaking the rules.

And finally, do we want to use that function really from other languages? Most languages have their own functions for stream handling, so why care to use gio for that.

Indeed, it is not an essential function to support. Still, I am not excluding usable functions, even if they have alternatives in the target high-level language, so that it is easier to copy existing GLib-based code.

Indeed the automatically Nim generated wrapper code is not that bad:

grep -A20 "proc g_input_stream_read(" ~/.nimble/pkgs/gintro-#head/gintro/gio.nim 
proc g_input_stream_read(self: ptr InputStream00; buffer: uint8Array; count: var uint64;
    cancellable: ptr Cancellable00; error: ptr ptr glib.Error = nil): int64 {.
    importc, libprag.}

proc read*(self: InputStream; buffer: var seq[uint8]; count: var uint64;
    cancellable: Cancellable = nil): int64 =
  var gerror: ptr glib.Error
  let resul0 = g_input_stream_read(cast[ptr InputStream00](self.impl), unsafeaddr(buffer[0]), count, if cancellable.isNil: nil else: cast[ptr Cancellable00](cancellable.impl), addr gerror)
  if gerror != nil:
    let msg = $gerror.message
    g_error_free(gerror[])
    raise newException(GException, msg)
  result = resul0

The first proc is the low level gio call, and the exported read() proc is the high level Nim interface. User provides a seq[uint8] as buffer, parameter is marked with var keyword to be mutable. In Nim a seq is a variable size container, we can query actual length by len() proc. …

Is that code based on the 2.60 version of g_input_stream_read ? Presumably the caller wouldn’t pass in the buffer with the 2.62 version of g_input_stream_read ?

Fine that we soon will get one more GTK bindings. But you are a bit late, Standard ML is a really old language, first mentioned in 1983.

Do you start from scratch with your bindings, only consulting the gobject-introspection API docs, or from other existent bindings?

I started from scratch, and had a hard time with the API docs. Later, when I had already worked for more than 1000 hours on it, Mr Bassi told me that API docs are not really designed for someone starting from scratch :frowning: Now I have spent 1400 hours total, and is still far from complete…

For the ownership transfer, for me transferring ownership to a C library function would indicate that the user allocates a buffer and then the C lib frees it. Such functions may exist, but I can not remember one.

For Nim I tried also to support all gtk functions. Mostly because I can not properly decide which may be useful for users and which not. But not for all C functions gobject-introspection provides information.

Does Standard ML uses a Garbage Collector or manual memory management? For Nim we had a GC in the past, but now prefer a fully deterministic automatic memory management, similar to C++ destructors.

[EDIT] And yes, my above Nim bindings example uses still gio 2.60.

Have you already solved

https://developer.gnome.org/gtk3/stable/GtkApplication.html#gtk-application-get-windows

For me such functions with GSlist result are hard for fully automatic bindings generation.

API docs note [element-type GtkWindow][transfer none] : Do you know which function in gobject-introspection API provides element-type?

Well there is

https://developer.gnome.org/gi/stable/gi-GITypeInfo.html#g-type-info-get-param-type

which I have already used for arrays. May that work for GSlist as well?

Fine that we soon will get one more GTK bindings. But you are a bit late, Standard ML is a really old language, first mentioned in 1983.

Well, some languages are even older, but they have managed to evolve their standards due to demand. Standard ML (SML) is still a useful functional language but suffers from a lack of libraries and is creaking in some other respects, e.g. no built-in Unicode support, no concept of classes. Whilst the (overly) stable standard (‘The Definition’) has led to a range of implementations, the standard did not include an FFI so there is considerable variation in the FFIs and providing libraries is an even greater challenge. Still SML continues to be used in niche areas.

Do you start from scratch with your bindings, only consulting the gobject-introspection API docs, or from other existent bindings?

I started from scratch, and had a hard time with the API docs. Later, when I had already worked for more than 1000 hours on it, Mr Bassi told me that API docs are not really designed for someone starting from scratch :frowning: Now I have spent 1400 hours total, and is still far from complete…

I started modifying the mGTK project which was a useful proof of concept but very limited in terms of coverage and in terms of maintainability because it predated GIR and relied on DEFS files. I did then start from scratch using GIR, inheriting some of the ideas from mGTK. Like others, I was lured into using the girepository API because it was easy to get started - write bindings for girepository and you’re off! After a lot of effort I realized that it couldn’t be used to generate platform-independent source code because the sizes of numeric C types are resolved for a specific platform, namely the platform of the TYPELIB files you are using. Worse, TYPELIB files don’t store alias types - they are resolved to the underlying type - so the generated SML code didn’t declare alias types so application code couldn’t mention alias types, which was a usability issue because we like to mention types in SML programs!

To avoid rewriting much of what I had done, I ended up writing an SML interface to GIR files that was almost a drop in replacement for the SML interface to girepository. Not a small undertaking. It differs in ways you would expect, e.g. the enumeration for GITypeTag also has enums for INT, UINT, SHORT, USHORT etc., and there is a module GIRepository.AliasInfo. The GIR interface also has extra functions that either weren’t required for TYPELIB files, e.g. GIRepository.FunctionInfo.getMovedTo, or provide additional information, e.g. GIRepository.Repository.getPackages, and omits some functions that provide platform-specific information, e.g. GIRepository.FieldInfo.getSize/getOffset. Also, some functions in GIRepository.Repository are parameterized by a map from namespace to version, to allow multiple versions of the same namespace to be loaded in the same session.

The SML interface to girepository was useful for validating my SML interface to GIR files so I’ve kept it working in the codebase as its output is useful for validation.

I estimate my total effort, as a background task over the years, exceeds 4,000 hours. I’m not sure if that will make you feel better or worse.

For the ownership transfer, for me transferring ownership to a C library function would indicate that the user allocates a buffer and then the C lib frees it. Such functions may exist, but I can not remember one.

The generated SML bindings code will create a duplicate where ownership is passed to a C function. If ownership is passed back, then bindings code will assume ownership of that duplicate.

For Nim I tried also to support all gtk functions. Mostly because I can not properly decide which may be useful for users and which not. But not for all C functions gobject-introspection provides information.

I have explicitly excluded functions that don’t make sense for bindings, but I can’t claim that it is complete. As mentioned, I wouldn’t exclude a perfectly usable function.

Does Standard ML uses a Garbage Collector or manual memory management? For Nim we had a GC in the past, but now prefer a fully deterministic automatic memory management, similar to C++ destructors.

To my knowledge, the Definition of SML doesn’t specify implementation details like whether to use garbage collection but most implementations do, I believe. Both compilers that I am targeting (MLton, Poly/ML) do use garbage collection. I’ve not kept track of all the implementations and their spin-offs, for example there is RTMLton which introduces a real-time garbage collector, which I know nothing about!

[EDIT] And yes, my above Nim bindings example uses still gio 2.60.

Right, so with the 2.62 version, buffer wouldn’t be an input of the proc because it is ‘out caller-allocates’?

Have you already solved

https://developer.gnome.org/gtk3/stable/GtkApplication.html#gtk-application-get-windows

For me such functions with GSlist result are hard for fully automatic bindings generation.

For SML bindings, I have put in much of the framework required for collections but I haven’t supported G[S]List types because I haven’t needed them yet.

API docs note [element-type GtkWindow][transfer none] : Do you know which function in gobject-introspection API provides element-type?

Well there is

https://developer.gnome.org/gi/stable/gi-GITypeInfo.html#g-type-info-get-param-type

which I have already used for arrays. May that work for GSlist as well?

Yes, I expect it to work for G[S]List and also for GHashTable, which has two parameters, I believe, the key type and the element type.

I’ve not seen an answer so I’ll ask a simpler question, this time about struct parameters.

Take, for example, gdk_rgba_parse. The parameter rgba is an ‘in’ parameter whose ownership is not transferred to the function temporarily, so the function modifies the parameter without having ownership. Is this allowed? Or do annotations need to be changed so that rgba has direction ‘inout’ (which defaults to transfer ‘full’), as done for e.g. pango_matrix_transform_rectangle. I presume gdk_rgba_parse is fine otherwise lots of functions would be missing annotations but I would be grateful for confirmation.

No, the rgba parameter is the instance parameter. Conceptually it is the same as gtk_window_set_title(), where the window parameter isn’t an inout parameter that the function manipulates, but the instance the method operates on.

The difference that GtkWindow is a GObject and GdkRGBA is a GBoxed isn’t relevant for this particular question.

1 Like

gdk_rgba_parse () is indeed an interesting example. I have just inspected the Nim bindings and I am not really happy with it. I would like to pass the GdkRGBA struct as a Nim var parameter because it is modified by GDK, but it is passed as a pointer. If gobject-introspection would call it out or inout type my bindings would pass it as var automatically. (Nim uses var procedure parameters like Pascal did it.) I have to investigate that.

Types like GdkRGBA or Rectangle or Textiter and such are a bit special for the Nim bindings, we allocate such objects on the stack as the objects are very simple objects, and pass its address to GTK. But that fact is not relevant here.

Yes, good point! That example didn’t help with my question.

I think a better example would be cairo_transform_to_window. The parameter cr is not the instance parameter according to the GIR file and ownership is not transferred (AFAICS) but it is modified. Perhaps in this case, as cairo_t is a reference counted type so is inherently single-instance, there is no need to say the parameter is ‘inout’ with ownership temporarily transferred. Is that the right way to look at this?

Cairo is not an introspectable API, and it’s not based on GObject, so you cannot apply any mechanism in gobject-introspection to make it work.

I am a bit confused by your terminology still. I still think that ownership transfer refers to memory deallocation, so it is used for larger types allocated on the heap, and when ownership is transferred then the user is responsible for some sort of deallocation. The in, out and inout specification refers to the modification of the data – an plain integer variable with out direction transfers information to the user, the wirthian languages label such parameters with var keyword. The reply of Florian makes sense of course, but it is bit strange to regard
the GdkRGBA struct as an instance variable like it is a widget which may be passed to GTK as a pointer and which intern state is modified by GTK. But generally it is OK, it works for Nim at least.

[EDIT]

https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations

Transfer modes:

full: the recipient owns the entire value. For a refcounted type, this means the recipient owns a ref on the value. For a container type, this means the recipient owns both container and elements .

For an ‘inout’ parameter with some ownership being transferred (‘container’ or ‘full’), ownership is transferred to the function when called and the same ownership is transferred back from the function when it returns. In that sense, ownership is temporarily transferred to the function. (Maybe there is better terminology.)

Certainly, but I think there is more too it. Until recently it was my understanding that ownership is needed to ensure a value persists and ownership must be relinquished using unref or free as appropriate. I now believe that ownership is also needed just to modify a value, even if it isn’t freed.

For example, I believe that is why pango_matrix_transform_rectangle has parameter rect annotated as ‘inout’ which defaults to transfer ‘full’. If rect were just read it could have been an ‘in’ parameter but, because it is modfied, it must be an ‘inout’. Note that the function can’t possibly free rect because the C type for this ‘inout’ parameter doesn’t have the extra level of indirection usually found for ‘inout’ parameters: afterwards, the caller assumes that *rect is PangoRectangle that it owns.

If the GIR file says a GdkRGBA struct is an instance parameter that’s good enough for me - I’m not going to question the abstraction! It makes sense that an instance parameter can be modified and this doesn’t need to be stated explicitly.

Also, it is understandable that the rules are not applicable to non-GObject types as pointed out by @ebassi.

@ebassi, that is good to know. I may raise an issue to document the scope of GI annotations.

Unfortunately, that example didn’t help with my question either. Third time lucky. Hopefully a better example is g_socket_receive. The parameter buffer is an ‘in’ parameter whose ownership is not transferred to the function, so the function modifies it without having ownership. Can you confirm that this is not allowed and therefore the annotations must be changed? For example, by adding (out caller-allocates).

Of course it’s allowed: you can modify arguments that you don’t own, otherwise the entirety of GLib would not be possible, since all setter functions will modify the instance argument. Ownership does not imply mutability: you can modify the contents of things you don’t own, just like you can own things you can’t modify.

In this case, the caller owns the buffer argument—the caller allocates it, and it must allocate at least size bytes. Then the buffer is passed to the callee, which will fill it out. At most you could construe the case that the argument is inout, but it’s not really accurate: the callee will not read the contents of buffer and change them: buffer is merely the container where the data will be stored. Additionally, since it’s a char* and not a container type—i.e. GList or GArray—the transfer: container annotation does not apply. Ownership describes who is in charge of releasing the memory when the scope ends; in this case, the caller owns the memory, and the data is stored inside the C array. Since it’s not pointer data—the function fills out uint8—the data block is still owned by the caller, not the callee.

Of course, this is a description of a C API: just because we can describe the C API in a machine readable way, it does not mean all other languages can map to it natively. In various cases we had to add a bytes variant that either takes or allocates a GBytes instance; in other cases, language bindings had to write wrappers.

I strongly expected such a reply, I think I tried to tell him the same already.

But he has a valid point: It is hard to discover by use of gobject-introspection that the buffer is modified by gio. Current Nim bindings generate

$ grep -A9 g_socket_receive nim_gi/gio.nim 
proc g_socket_receive(self: ptr Socket00; buffer: uint8Array; size: uint64;
    cancellable: ptr Cancellable00; error: ptr ptr glib.Error = nil): int64 {.
    importc, libprag.}

proc receive*(self: Socket; buffer: seq[uint8] | string; cancellable: Cancellable = nil): int64 =
  let size = uint64(buffer.len)
  var gerror: ptr glib.Error
  let resul0 = g_socket_receive(cast[ptr Socket00](self.impl), unsafeaddr(buffer[0]), size, if cancellable.isNil: nil else: cast[ptr Cancellable00](cancellable.impl), addr gerror)
  if gerror != nil:
    let msg = $gerror.message
    g_error_free(gerror[])
    raise newException(GException, msg)
  result = resul0

That should work, as we pass the address of the Nim buffer content to glib. But a fully correct function signature would be

proc receive*(self: Socket; buffer: var seq[uint8] | var string; cancellable: Cancellable = nil): int64 =

Note the var keyword after buffer:

That indicates that the buffer can be modified by the procedure call, while missing of var keyword would give the wrong impression that buffer is not modified.

Do you have an idea how gobject-introspection can tell me that buffer is modiefied? You mentioned inout parameter type, maybe that would help.

Anything that is passed by pointer and is not explicitly marked as a C const pointer, or with an inout or out annotations, is by definition a mutable reference that you pass to a callable. We don’t have a “mutable” annotation or keyword in C, or in the C ABI. This is not Rust.

To be fair, the GSocket API is heavily modeled on the equivalent recv() function call from POSIX, which takes a generic void* buffer of bytes, and fills it up; it would be extremely inefficient if recv()/g_socket_receive() returned a new buffer for every call, as it would prevent recycling the buffer in a loop, or allocating the buffer on the stack to minimise memory fragmentation.

This is why it’s typically better to use GSocketClient, GSocketService, or GSocketConnection, which use streams—as the documentation for GSocket mentions.

For sure, an instance parameter must be modifiable, as mentioned in an earlier message. I meant to ask about non-instance parameters only.

Thanks - that’s exactly what I wanted to know. So there is no way to know from GI annotations whether a non-instance parameter is not modified.

Interestingly, this means that the const qualifiers in C offer some protection when working with C arrays that isn’t available in other languages via introspection. Of course, that doesn’t extend to other container types in C. It’s just a perk when using C arrays in C.

I don’t understand this: if you can’t modify something, then surely you can’t free it (because freeing would allow the memory to be used for another purpose).

It was a similar situation with the 2.60 version of g_input_stream_read, where the parameter buffer was an ‘in’ parameter that was modified without ownership of its elements being transferred. You’ve confirmed that the annotations there were acceptable.

In the 2.62 version of g_input_stream_read , the parameter buffer was change to ‘out caller-allocates’. I’ve questioned whether the same should be done for g_socket_receive. GLib should surely provide a consistent interface.

As mentioned in another reply, g_input_stream_read has the parameter buffer annotated as ‘out caller-allocates’, so wouldn’t a high-level language using the stream-based API get a new buffer allocated on every call?