Determining whether container elements are modifiable from GI annotations

StefanSalewski · July 23, 2020, 12:29pm

Fine that we soon will get one more GTK bindings. But you are a bit late, Standard ML is a really old language, first mentioned in 1983.

Do you start from scratch with your bindings, only consulting the gobject-introspection API docs, or from other existent bindings?

I started from scratch, and had a hard time with the API docs. Later, when I had already worked for more than 1000 hours on it, Mr Bassi told me that API docs are not really designed for someone starting from scratch Now I have spent 1400 hours total, and is still far from complete…

For the ownership transfer, for me transferring ownership to a C library function would indicate that the user allocates a buffer and then the C lib frees it. Such functions may exist, but I can not remember one.

For Nim I tried also to support all gtk functions. Mostly because I can not properly decide which may be useful for users and which not. But not for all C functions gobject-introspection provides information.

Does Standard ML uses a Garbage Collector or manual memory management? For Nim we had a GC in the past, but now prefer a fully deterministic automatic memory management, similar to C++ destructors.

[EDIT] And yes, my above Nim bindings example uses still gio 2.60.

StefanSalewski · July 23, 2020, 6:50pm

Have you already solved

https://developer.gnome.org/gtk3/stable/GtkApplication.html#gtk-application-get-windows

For me such functions with GSlist result are hard for fully automatic bindings generation.

API docs note [element-type GtkWindow][transfer none] : Do you know which function in gobject-introspection API provides element-type?

Well there is

https://developer.gnome.org/gi/stable/gi-GITypeInfo.html#g-type-info-get-param-type

which I have already used for arrays. May that work for GSlist as well?

pclayton · July 23, 2020, 9:41pm

Fine that we soon will get one more GTK bindings. But you are a bit late, Standard ML is a really old language, first mentioned in 1983.

Well, some languages are even older, but they have managed to evolve their standards due to demand. Standard ML (SML) is still a useful functional language but suffers from a lack of libraries and is creaking in some other respects, e.g. no built-in Unicode support, no concept of classes. Whilst the (overly) stable standard (‘The Definition’) has led to a range of implementations, the standard did not include an FFI so there is considerable variation in the FFIs and providing libraries is an even greater challenge. Still SML continues to be used in niche areas.

Do you start from scratch with your bindings, only consulting the gobject-introspection API docs, or from other existent bindings?

I started from scratch, and had a hard time with the API docs. Later, when I had already worked for more than 1000 hours on it, Mr Bassi told me that API docs are not really designed for someone starting from scratch Now I have spent 1400 hours total, and is still far from complete…

I started modifying the mGTK project which was a useful proof of concept but very limited in terms of coverage and in terms of maintainability because it predated GIR and relied on DEFS files. I did then start from scratch using GIR, inheriting some of the ideas from mGTK. Like others, I was lured into using the girepository API because it was easy to get started - write bindings for girepository and you’re off! After a lot of effort I realized that it couldn’t be used to generate platform-independent source code because the sizes of numeric C types are resolved for a specific platform, namely the platform of the TYPELIB files you are using. Worse, TYPELIB files don’t store alias types - they are resolved to the underlying type - so the generated SML code didn’t declare alias types so application code couldn’t mention alias types, which was a usability issue because we like to mention types in SML programs!

To avoid rewriting much of what I had done, I ended up writing an SML interface to GIR files that was almost a drop in replacement for the SML interface to girepository. Not a small undertaking. It differs in ways you would expect, e.g. the enumeration for GITypeTag also has enums for INT, UINT, SHORT, USHORT etc., and there is a module GIRepository.AliasInfo. The GIR interface also has extra functions that either weren’t required for TYPELIB files, e.g. GIRepository.FunctionInfo.getMovedTo, or provide additional information, e.g. GIRepository.Repository.getPackages, and omits some functions that provide platform-specific information, e.g. GIRepository.FieldInfo.getSize/getOffset. Also, some functions in GIRepository.Repository are parameterized by a map from namespace to version, to allow multiple versions of the same namespace to be loaded in the same session.

The SML interface to girepository was useful for validating my SML interface to GIR files so I’ve kept it working in the codebase as its output is useful for validation.

I estimate my total effort, as a background task over the years, exceeds 4,000 hours. I’m not sure if that will make you feel better or worse.

For the ownership transfer, for me transferring ownership to a C library function would indicate that the user allocates a buffer and then the C lib frees it. Such functions may exist, but I can not remember one.

The generated SML bindings code will create a duplicate where ownership is passed to a C function. If ownership is passed back, then bindings code will assume ownership of that duplicate.

For Nim I tried also to support all gtk functions. Mostly because I can not properly decide which may be useful for users and which not. But not for all C functions gobject-introspection provides information.

I have explicitly excluded functions that don’t make sense for bindings, but I can’t claim that it is complete. As mentioned, I wouldn’t exclude a perfectly usable function.

Does Standard ML uses a Garbage Collector or manual memory management? For Nim we had a GC in the past, but now prefer a fully deterministic automatic memory management, similar to C++ destructors.

To my knowledge, the Definition of SML doesn’t specify implementation details like whether to use garbage collection but most implementations do, I believe. Both compilers that I am targeting (MLton, Poly/ML) do use garbage collection. I’ve not kept track of all the implementations and their spin-offs, for example there is RTMLton which introduces a real-time garbage collector, which I know nothing about!

[EDIT] And yes, my above Nim bindings example uses still gio 2.60.

Right, so with the 2.62 version, buffer wouldn’t be an input of the proc because it is ‘out caller-allocates’?

pclayton · July 23, 2020, 9:55pm

Have you already solved

Gtk – 3.0

For me such functions with GSlist result are hard for fully automatic bindings generation.

For SML bindings, I have put in much of the framework required for collections but I haven’t supported G[S]List types because I haven’t needed them yet.

API docs note [element-type GtkWindow][transfer none] : Do you know which function in gobject-introspection API provides element-type?

Well there is

https://developer.gnome.org/gi/stable/gi-GITypeInfo.html#g-type-info-get-param-type

which I have already used for arrays. May that work for GSlist as well?

Yes, I expect it to work for G[S]List and also for GHashTable, which has two parameters, I believe, the key type and the element type.

pclayton · July 30, 2020, 11:15am

I’ve not seen an answer so I’ll ask a simpler question, this time about struct parameters.

Take, for example, gdk_rgba_parse. The parameter rgba is an ‘in’ parameter whose ownership is not transferred to the function temporarily, so the function modifies the parameter without having ownership. Is this allowed? Or do annotations need to be changed so that rgba has direction ‘inout’ (which defaults to transfer ‘full’), as done for e.g. pango_matrix_transform_rectangle. I presume gdk_rgba_parse is fine otherwise lots of functions would be missing annotations but I would be grateful for confirmation.

fmuellner · July 30, 2020, 11:47am

No, the rgba parameter is the instance parameter. Conceptually it is the same as gtk_window_set_title(), where the window parameter isn’t an inout parameter that the function manipulates, but the instance the method operates on.

The difference that GtkWindow is a GObject and GdkRGBA is a GBoxed isn’t relevant for this particular question.

StefanSalewski · July 30, 2020, 12:03pm

gdk_rgba_parse () is indeed an interesting example. I have just inspected the Nim bindings and I am not really happy with it. I would like to pass the GdkRGBA struct as a Nim var parameter because it is modified by GDK, but it is passed as a pointer. If gobject-introspection would call it out or inout type my bindings would pass it as var automatically. (Nim uses var procedure parameters like Pascal did it.) I have to investigate that.

Types like GdkRGBA or Rectangle or Textiter and such are a bit special for the Nim bindings, we allocate such objects on the stack as the objects are very simple objects, and pass its address to GTK. But that fact is not relevant here.

pclayton · July 30, 2020, 2:42pm

Yes, good point! That example didn’t help with my question.

I think a better example would be cairo_transform_to_window. The parameter cr is not the instance parameter according to the GIR file and ownership is not transferred (AFAICS) but it is modified. Perhaps in this case, as cairo_t is a reference counted type so is inherently single-instance, there is no need to say the parameter is ‘inout’ with ownership temporarily transferred. Is that the right way to look at this?

ebassi · July 30, 2020, 4:15pm

Cairo is not an introspectable API, and it’s not based on GObject, so you cannot apply any mechanism in gobject-introspection to make it work.

StefanSalewski · July 30, 2020, 5:00pm

I am a bit confused by your terminology still. I still think that ownership transfer refers to memory deallocation, so it is used for larger types allocated on the heap, and when ownership is transferred then the user is responsible for some sort of deallocation. The in, out and inout specification refers to the modification of the data – an plain integer variable with out direction transfers information to the user, the wirthian languages label such parameters with var keyword. The reply of Florian makes sense of course, but it is bit strange to regard
the GdkRGBA struct as an instance variable like it is a widget which may be passed to GTK as a pointer and which intern state is modified by GTK. But generally it is OK, it works for Nim at least.

[EDIT]

https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations

Transfer modes:

full: the recipient owns the entire value. For a refcounted type, this means the recipient owns a ref on the value. For a container type, this means the recipient owns both container and elements .

pclayton · August 4, 2020, 2:14pm

For an ‘inout’ parameter with some ownership being transferred (‘container’ or ‘full’), ownership is transferred to the function when called and the same ownership is transferred back from the function when it returns. In that sense, ownership is temporarily transferred to the function. (Maybe there is better terminology.)

Certainly, but I think there is more too it. Until recently it was my understanding that ownership is needed to ensure a value persists and ownership must be relinquished using unref or free as appropriate. I now believe that ownership is also needed just to modify a value, even if it isn’t freed.

For example, I believe that is why pango_matrix_transform_rectangle has parameter rect annotated as ‘inout’ which defaults to transfer ‘full’. If rect were just read it could have been an ‘in’ parameter but, because it is modfied, it must be an ‘inout’. Note that the function can’t possibly free rect because the C type for this ‘inout’ parameter doesn’t have the extra level of indirection usually found for ‘inout’ parameters: afterwards, the caller assumes that *rect is PangoRectangle that it owns.

If the GIR file says a GdkRGBA struct is an instance parameter that’s good enough for me - I’m not going to question the abstraction! It makes sense that an instance parameter can be modified and this doesn’t need to be stated explicitly.

Also, it is understandable that the rules are not applicable to non-GObject types as pointed out by @ebassi.

pclayton · August 10, 2020, 10:06am

@ebassi, that is good to know. I may raise an issue to document the scope of GI annotations.

Unfortunately, that example didn’t help with my question either. Third time lucky. Hopefully a better example is g_socket_receive. The parameter buffer is an ‘in’ parameter whose ownership is not transferred to the function, so the function modifies it without having ownership. Can you confirm that this is not allowed and therefore the annotations must be changed? For example, by adding (out caller-allocates).

ebassi · August 10, 2020, 11:29am

Of course it’s allowed: you can modify arguments that you don’t own, otherwise the entirety of GLib would not be possible, since all setter functions will modify the instance argument. Ownership does not imply mutability: you can modify the contents of things you don’t own, just like you can own things you can’t modify.

In this case, the caller owns the buffer argument—the caller allocates it, and it must allocate at least size bytes. Then the buffer is passed to the callee, which will fill it out. At most you could construe the case that the argument is inout, but it’s not really accurate: the callee will not read the contents of buffer and change them: buffer is merely the container where the data will be stored. Additionally, since it’s a char* and not a container type—i.e. GList or GArray—the transfer: container annotation does not apply. Ownership describes who is in charge of releasing the memory when the scope ends; in this case, the caller owns the memory, and the data is stored inside the C array. Since it’s not pointer data—the function fills out uint8—the data block is still owned by the caller, not the callee.

Of course, this is a description of a C API: just because we can describe the C API in a machine readable way, it does not mean all other languages can map to it natively. In various cases we had to add a bytes variant that either takes or allocates a GBytes instance; in other cases, language bindings had to write wrappers.

StefanSalewski · August 10, 2020, 1:40pm

I strongly expected such a reply, I think I tried to tell him the same already.

But he has a valid point: It is hard to discover by use of gobject-introspection that the buffer is modified by gio. Current Nim bindings generate

$ grep -A9 g_socket_receive nim_gi/gio.nim 
proc g_socket_receive(self: ptr Socket00; buffer: uint8Array; size: uint64;
    cancellable: ptr Cancellable00; error: ptr ptr glib.Error = nil): int64 {.
    importc, libprag.}

proc receive*(self: Socket; buffer: seq[uint8] | string; cancellable: Cancellable = nil): int64 =
  let size = uint64(buffer.len)
  var gerror: ptr glib.Error
  let resul0 = g_socket_receive(cast[ptr Socket00](self.impl), unsafeaddr(buffer[0]), size, if cancellable.isNil: nil else: cast[ptr Cancellable00](cancellable.impl), addr gerror)
  if gerror != nil:
    let msg = $gerror.message
    g_error_free(gerror[])
    raise newException(GException, msg)
  result = resul0

That should work, as we pass the address of the Nim buffer content to glib. But a fully correct function signature would be

proc receive*(self: Socket; buffer: var seq[uint8] | var string; cancellable: Cancellable = nil): int64 =

Note the var keyword after buffer:

That indicates that the buffer can be modified by the procedure call, while missing of var keyword would give the wrong impression that buffer is not modified.

Do you have an idea how gobject-introspection can tell me that buffer is modiefied? You mentioned inout parameter type, maybe that would help.

ebassi · August 10, 2020, 1:51pm

Anything that is passed by pointer and is not explicitly marked as a C const pointer, or with an inout or out annotations, is by definition a mutable reference that you pass to a callable. We don’t have a “mutable” annotation or keyword in C, or in the C ABI. This is not Rust.

To be fair, the GSocket API is heavily modeled on the equivalent recv() function call from POSIX, which takes a generic void* buffer of bytes, and fills it up; it would be extremely inefficient if recv()/g_socket_receive() returned a new buffer for every call, as it would prevent recycling the buffer in a loop, or allocating the buffer on the stack to minimise memory fragmentation.

This is why it’s typically better to use GSocketClient, GSocketService, or GSocketConnection, which use streams—as the documentation for GSocket mentions.

pclayton · August 10, 2020, 2:05pm

For sure, an instance parameter must be modifiable, as mentioned in an earlier message. I meant to ask about non-instance parameters only.

Thanks - that’s exactly what I wanted to know. So there is no way to know from GI annotations whether a non-instance parameter is not modified.

Interestingly, this means that the const qualifiers in C offer some protection when working with C arrays that isn’t available in other languages via introspection. Of course, that doesn’t extend to other container types in C. It’s just a perk when using C arrays in C.

I don’t understand this: if you can’t modify something, then surely you can’t free it (because freeing would allow the memory to be used for another purpose).

In this case, the caller owns the buffer argument—the caller allocates it, and it must allocate at least size bytes. Then the buffer is passed to the callee, which will fill it out. At most you could construe the case that the argument is inout, but it’s not really accurate: the callee will not read the contents of buffer and change them: buffer is merely the container where the data will be stored. Additionally, since it’s a char* and not a container type—i.e. GList or GArray—the transfer: container annotation does not apply. Ownership describes who is in charge of releasing the memory when the scope ends; in this case, the caller owns the memory, and the data is stored inside the C array. Since it’s not pointer data—the function fills out uint8—the data block is still owned by the caller, not the callee.

It was a similar situation with the 2.60 version of g_input_stream_read, where the parameter buffer was an ‘in’ parameter that was modified without ownership of its elements being transferred. You’ve confirmed that the annotations there were acceptable.

In the 2.62 version of g_input_stream_read , the parameter buffer was change to ‘out caller-allocates’. I’ve questioned whether the same should be done for g_socket_receive. GLib should surely provide a consistent interface.

pclayton · August 10, 2020, 2:35pm

As mentioned in another reply, g_input_stream_read has the parameter buffer annotated as ‘out caller-allocates’, so wouldn’t a high-level language using the stream-based API get a new buffer allocated on every call?

StefanSalewski · August 10, 2020, 2:46pm

For me that indicates the caller, that is the user, provides the buffer. Buffer may be allocated by the user on the heap, or maybe user can provide the address of an array living on the stack. In no case there is a reason to allocate a buffer more than once, as we can use the same buffer again, gio will fill the buffer with new data for each call.

ebassi · August 10, 2020, 2:49pm

Sure, and maybe it’s entirely okay to change the annotation of g_socket_receive() to be out caller-allocates to match the change in GInputStream. After all: yes, the bytes buffer is allocated by the caller.

The performance angle matters a lot less if you’re using a high level language; after all, you’re allocating a lot of stuff anyway, and it’s up to the language to deal with memory fragmentation.

As I said above: either constness or an out direction can be used to determine whether an argument is a mutable reference or not.

pclayton · August 10, 2020, 5:17pm

@StefanSalewski, that suggests to me that you don’t treat ‘in’ any differently from ‘out caller-allocates’. That’s the case in C and seems reasonable for bindings with a similar level of abstraction in that respect. For the SML bindings I’m developing, there is a higher level of abstraction: ‘out’ means a value is passed from the callee to the caller, and nothing is passed in the other direction regardless of who allocates the parameter.