Determining whether container elements are modifiable from GI annotations

Am I right in thinking that an introspectable C function requires ownership of the elements of a container parameter to be passed to it just to modify, not free, the elements, even if the elements are not allocated, i.e. don’t have ownership?

It sounds obvious but I originally thought a C function could modify C array elements with a basic type, e.g. guint8, without ownership being transferred, due to the 2.60 version of g_input_stream_read where the ‘in’ parameter buffer was modified without ownership of its elements being transferred. I can see that without transferring ownership, this could result in bindings allowing code that writes to read-only data. Still I’d like to check that I’ve now understood correctly.

Aside: The fix in the 2.62 version of g_input_stream_read uses an ‘out caller-allocates’ parameter. Although that means the same array can be used for successive reads when calling from C, it’s not possible to reuse the same parameter for a high-level language that hides the caller/callee allocation, which could be detrimental for performance, as I’ve noted.

Your question is a bit difficult. And I am currently not sure how much you know about gobject-introspection. Maybe you are the one who created the fortran bindings, I can not remember. I am the Nim guy.

where the ‘in’ parameter buffer was modified without ownership of its elements being transferred.

That statement seems to be strange. I think ownership is never transferred from the user program to gtklibs. Only the other way – gtklibs may allocates memory and the user program can be responsible to free it later.

Your example g_input_stream_read () is very hard for binding generation of course. Not because it is so hard to understand, but because of the fact that the user has to provide the buffer which is then filled by gio lib. Generating such a binding manually is easy, and yes we can reuse the same buffer for multiple calls in principle. But generating that automatically only by using data available from gobject-introspection is very hard.

And finally, do we want to use that function really from other languages? Most languages have their own functions for stream handling, so why care to use gio for that.

Indeed the automatically Nim generated wrapper code is not that bad:

grep -A20 "proc g_input_stream_read(" ~/.nimble/pkgs/gintro-#head/gintro/gio.nim 
proc g_input_stream_read(self: ptr InputStream00; buffer: uint8Array; count: var uint64;
    cancellable: ptr Cancellable00; error: ptr ptr glib.Error = nil): int64 {.
    importc, libprag.}

proc read*(self: InputStream; buffer: var seq[uint8]; count: var uint64;
    cancellable: Cancellable = nil): int64 =
  var gerror: ptr glib.Error
  let resul0 = g_input_stream_read(cast[ptr InputStream00](self.impl), unsafeaddr(buffer[0]), count, if cancellable.isNil: nil else: cast[ptr Cancellable00](cancellable.impl), addr gerror)
  if gerror != nil:
    let msg = $gerror.message
    g_error_free(gerror[])
    raise newException(GException, msg)
  result = resul0

The first proc is the low level gio call, and the exported read() proc is the high level Nim interface. User provides a seq[uint8] as buffer, parameter is marked with var keyword to be mutable. In Nim a seq is a variable size container, we can query actual length by len() proc. Problem here in the auto generated bindings is that we still use the count parameter, but do not compare it to the actual capacity of the user provided seq. So a too large count can generate write access to unallocated memory. What is missing here is something like

if count > buffer.len:
  buffer.setLen(count)
1 Like

Your question is a bit difficult. And I am currently not sure how much you know about gobject-introspection. Maybe you are the one who created the fortran bindings, I can not remember. I am the Nim guy.

Thanks for the response. I am not the Fortran guy, I am developing bindings for Standard ML but you won’t have seen these because I have not yet published them.

where the ‘in’ parameter buffer was modified without ownership of its elements being transferred.

That statement seems to be strange. I think ownership is never transferred from the user program to gtklibs. Only the other way – gtklibs may allocates memory and the user program can be responsible to free it later.

I don’t see any indication in the annotation documentation that ownership can’t be passed to a C function. In practice, from what I have seen, I don’t think your claim is entirely true. (Instead of saying ‘never’, I think it would be fair to say ‘almost never’.) For a counter example:

  1. There do exist functions that take ownership via an ‘in’ parameter. For example, in g_environ_setenv and g_environ_unsetenv, envp is an ‘in’ parameter with transfer mode ‘full’. However, as an ‘in’ parameter does not return ownership, these functions transfer ownership of envp back to the caller via the return value. It is an unusual style - it would be more idomatic to make envp an ‘inout’ parameter - but perhaps there were C reasons for this. (I believe my my Standard ML bindings would give the same interface either way.) Still, this style seems acceptable.

  2. An ‘inout’ parameter with transfer mode ‘full’ (the default for an ‘inout’ parameter) does transfer ownership to the C function. There are quite a few functions that do this, for example gtk_init. Although ownership is also transferred back to the caller and there is no overall change in ownership, giving the C function temporary ownership allows it to reallocate the parameter. In the case that the ‘inout’ parameter is nullable, it may be freed with nothing allocated in its place.

Your example g_input_stream_read () is very hard for binding generation of course. Not because it is so hard to understand, but because of the fact that the user has to provide the buffer which is then filled by gio lib. Generating such a binding manually is easy, and yes we can reuse the same buffer for multiple calls in principle. But generating that automatically only by using data available from gobject-introspection is very hard.

I did not find it a problem to generate such a binding, though I have put quite a lot of effort into supporting C arrays. At the moment I avoid copying unowned C arrays returned from C functions for efficiency on the grounds that their lifespan should be sufficiently long but I found that there was nothing, in principle, preventing such an unowned guint8 array being overwritten by passing it to the 2.60 version of g_input_stream_read. Upon investigation, I have come to realize that it was g_input_stream_read breaking the rules.

And finally, do we want to use that function really from other languages? Most languages have their own functions for stream handling, so why care to use gio for that.

Indeed, it is not an essential function to support. Still, I am not excluding usable functions, even if they have alternatives in the target high-level language, so that it is easier to copy existing GLib-based code.

Indeed the automatically Nim generated wrapper code is not that bad:

grep -A20 "proc g_input_stream_read(" ~/.nimble/pkgs/gintro-#head/gintro/gio.nim 
proc g_input_stream_read(self: ptr InputStream00; buffer: uint8Array; count: var uint64;
    cancellable: ptr Cancellable00; error: ptr ptr glib.Error = nil): int64 {.
    importc, libprag.}

proc read*(self: InputStream; buffer: var seq[uint8]; count: var uint64;
    cancellable: Cancellable = nil): int64 =
  var gerror: ptr glib.Error
  let resul0 = g_input_stream_read(cast[ptr InputStream00](self.impl), unsafeaddr(buffer[0]), count, if cancellable.isNil: nil else: cast[ptr Cancellable00](cancellable.impl), addr gerror)
  if gerror != nil:
    let msg = $gerror.message
    g_error_free(gerror[])
    raise newException(GException, msg)
  result = resul0

The first proc is the low level gio call, and the exported read() proc is the high level Nim interface. User provides a seq[uint8] as buffer, parameter is marked with var keyword to be mutable. In Nim a seq is a variable size container, we can query actual length by len() proc. …

Is that code based on the 2.60 version of g_input_stream_read ? Presumably the caller wouldn’t pass in the buffer with the 2.62 version of g_input_stream_read ?

Fine that we soon will get one more GTK bindings. But you are a bit late, Standard ML is a really old language, first mentioned in 1983.

Do you start from scratch with your bindings, only consulting the gobject-introspection API docs, or from other existent bindings?

I started from scratch, and had a hard time with the API docs. Later, when I had already worked for more than 1000 hours on it, Mr Bassi told me that API docs are not really designed for someone starting from scratch :frowning: Now I have spent 1400 hours total, and is still far from complete…

For the ownership transfer, for me transferring ownership to a C library function would indicate that the user allocates a buffer and then the C lib frees it. Such functions may exist, but I can not remember one.

For Nim I tried also to support all gtk functions. Mostly because I can not properly decide which may be useful for users and which not. But not for all C functions gobject-introspection provides information.

Does Standard ML uses a Garbage Collector or manual memory management? For Nim we had a GC in the past, but now prefer a fully deterministic automatic memory management, similar to C++ destructors.

[EDIT] And yes, my above Nim bindings example uses still gio 2.60.

Have you already solved

https://developer.gnome.org/gtk3/stable/GtkApplication.html#gtk-application-get-windows

For me such functions with GSlist result are hard for fully automatic bindings generation.

API docs note [element-type GtkWindow][transfer none] : Do you know which function in gobject-introspection API provides element-type?

Well there is

https://developer.gnome.org/gi/stable/gi-GITypeInfo.html#g-type-info-get-param-type

which I have already used for arrays. May that work for GSlist as well?

Fine that we soon will get one more GTK bindings. But you are a bit late, Standard ML is a really old language, first mentioned in 1983.

Well, some languages are even older, but they have managed to evolve their standards due to demand. Standard ML (SML) is still a useful functional language but suffers from a lack of libraries and is creaking in some other respects, e.g. no built-in Unicode support, no concept of classes. Whilst the (overly) stable standard (‘The Definition’) has led to a range of implementations, the standard did not include an FFI so there is considerable variation in the FFIs and providing libraries is an even greater challenge. Still SML continues to be used in niche areas.

Do you start from scratch with your bindings, only consulting the gobject-introspection API docs, or from other existent bindings?

I started from scratch, and had a hard time with the API docs. Later, when I had already worked for more than 1000 hours on it, Mr Bassi told me that API docs are not really designed for someone starting from scratch :frowning: Now I have spent 1400 hours total, and is still far from complete…

I started modifying the mGTK project which was a useful proof of concept but very limited in terms of coverage and in terms of maintainability because it predated GIR and relied on DEFS files. I did then start from scratch using GIR, inheriting some of the ideas from mGTK. Like others, I was lured into using the girepository API because it was easy to get started - write bindings for girepository and you’re off! After a lot of effort I realized that it couldn’t be used to generate platform-independent source code because the sizes of numeric C types are resolved for a specific platform, namely the platform of the TYPELIB files you are using. Worse, TYPELIB files don’t store alias types - they are resolved to the underlying type - so the generated SML code didn’t declare alias types so application code couldn’t mention alias types, which was a usability issue because we like to mention types in SML programs!

To avoid rewriting much of what I had done, I ended up writing an SML interface to GIR files that was almost a drop in replacement for the SML interface to girepository. Not a small undertaking. It differs in ways you would expect, e.g. the enumeration for GITypeTag also has enums for INT, UINT, SHORT, USHORT etc., and there is a module GIRepository.AliasInfo. The GIR interface also has extra functions that either weren’t required for TYPELIB files, e.g. GIRepository.FunctionInfo.getMovedTo, or provide additional information, e.g. GIRepository.Repository.getPackages, and omits some functions that provide platform-specific information, e.g. GIRepository.FieldInfo.getSize/getOffset. Also, some functions in GIRepository.Repository are parameterized by a map from namespace to version, to allow multiple versions of the same namespace to be loaded in the same session.

The SML interface to girepository was useful for validating my SML interface to GIR files so I’ve kept it working in the codebase as its output is useful for validation.

I estimate my total effort, as a background task over the years, exceeds 4,000 hours. I’m not sure if that will make you feel better or worse.

For the ownership transfer, for me transferring ownership to a C library function would indicate that the user allocates a buffer and then the C lib frees it. Such functions may exist, but I can not remember one.

The generated SML bindings code will create a duplicate where ownership is passed to a C function. If ownership is passed back, then bindings code will assume ownership of that duplicate.

For Nim I tried also to support all gtk functions. Mostly because I can not properly decide which may be useful for users and which not. But not for all C functions gobject-introspection provides information.

I have explicitly excluded functions that don’t make sense for bindings, but I can’t claim that it is complete. As mentioned, I wouldn’t exclude a perfectly usable function.

Does Standard ML uses a Garbage Collector or manual memory management? For Nim we had a GC in the past, but now prefer a fully deterministic automatic memory management, similar to C++ destructors.

To my knowledge, the Definition of SML doesn’t specify implementation details like whether to use garbage collection but most implementations do, I believe. Both compilers that I am targeting (MLton, Poly/ML) do use garbage collection. I’ve not kept track of all the implementations and their spin-offs, for example there is RTMLton which introduces a real-time garbage collector, which I know nothing about!

[EDIT] And yes, my above Nim bindings example uses still gio 2.60.

Right, so with the 2.62 version, buffer wouldn’t be an input of the proc because it is ‘out caller-allocates’?

Have you already solved

https://developer.gnome.org/gtk3/stable/GtkApplication.html#gtk-application-get-windows

For me such functions with GSlist result are hard for fully automatic bindings generation.

For SML bindings, I have put in much of the framework required for collections but I haven’t supported G[S]List types because I haven’t needed them yet.

API docs note [element-type GtkWindow][transfer none] : Do you know which function in gobject-introspection API provides element-type?

Well there is

https://developer.gnome.org/gi/stable/gi-GITypeInfo.html#g-type-info-get-param-type

which I have already used for arrays. May that work for GSlist as well?

Yes, I expect it to work for G[S]List and also for GHashTable, which has two parameters, I believe, the key type and the element type.

I’ve not seen an answer so I’ll ask a simpler question, this time about struct parameters.

Take, for example, gdk_rgba_parse. The parameter rgba is an ‘in’ parameter whose ownership is not transferred to the function temporarily, so the function modifies the parameter without having ownership. Is this allowed? Or do annotations need to be changed so that rgba has direction ‘inout’ (which defaults to transfer ‘full’), as done for e.g. pango_matrix_transform_rectangle. I presume gdk_rgba_parse is fine otherwise lots of functions would be missing annotations but I would be grateful for confirmation.

No, the rgba parameter is the instance parameter. Conceptually it is the same as gtk_window_set_title(), where the window parameter isn’t an inout parameter that the function manipulates, but the instance the method operates on.

The difference that GtkWindow is a GObject and GdkRGBA is a GBoxed isn’t relevant for this particular question.

1 Like

gdk_rgba_parse () is indeed an interesting example. I have just inspected the Nim bindings and I am not really happy with it. I would like to pass the GdkRGBA struct as a Nim var parameter because it is modified by GDK, but it is passed as a pointer. If gobject-introspection would call it out or inout type my bindings would pass it as var automatically. (Nim uses var procedure parameters like Pascal did it.) I have to investigate that.

Types like GdkRGBA or Rectangle or Textiter and such are a bit special for the Nim bindings, we allocate such objects on the stack as the objects are very simple objects, and pass its address to GTK. But that fact is not relevant here.

Yes, good point! That example didn’t help with my question.

I think a better example would be cairo_transform_to_window. The parameter cr is not the instance parameter according to the GIR file and ownership is not transferred (AFAICS) but it is modified. Perhaps in this case, as cairo_t is a reference counted type so is inherently single-instance, there is no need to say the parameter is ‘inout’ with ownership temporarily transferred. Is that the right way to look at this?

Cairo is not an introspectable API, and it’s not based on GObject, so you cannot apply any mechanism in gobject-introspection to make it work.

I am a bit confused by your terminology still. I still think that ownership transfer refers to memory deallocation, so it is used for larger types allocated on the heap, and when ownership is transferred then the user is responsible for some sort of deallocation. The in, out and inout specification refers to the modification of the data – an plain integer variable with out direction transfers information to the user, the wirthian languages label such parameters with var keyword. The reply of Florian makes sense of course, but it is bit strange to regard
the GdkRGBA struct as an instance variable like it is a widget which may be passed to GTK as a pointer and which intern state is modified by GTK. But generally it is OK, it works for Nim at least.

[EDIT]

https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations

Transfer modes:

full: the recipient owns the entire value. For a refcounted type, this means the recipient owns a ref on the value. For a container type, this means the recipient owns both container and elements .