Can bindings for TextIter be memory-safe?

I am working on autogenerating Gtk 4 bindings for OCaml (at GitHub - btj/ocaml-gtk ). I have the very basics (calling constructors and methods for GObject and its subclasses) working. So far, I think my bindings are memory-safe; i.e. OCaml programs cannot trigger undefined behavior through these bindings. But now I am looking at adding support for APIs involving TextIter. I wonder if it is possible to add such support while maintaining memory-safety.

I guess my main question is this: if I create two TextIters that point to some span of text in the middle of a large document, and then clear the TextBuffer, and then try to get the text between the two TextIters, can this cause arbitrary undefined behavior? More generally: how bad is it to use an invalid iterator: will this always crash cleanly (perhaps even with a helpful error message), or can this do arbitrarily weird things?

If using invalid iterators can do arbitrarily nasty things, is there a feasible way to still create safe language bindings? Are there any language bindings out there (for any language) that attempt this?

Answering my own question: I had a peek inside gtktextiter.c and gtktextbtree.c and it appears that the first thing any function that uses a TextIter does is check that the TextIter’s chars_changed_stamp equals the B-tree’s chars_changed_stamp; this value is initialized to a random value and incremented whenever the contents change so it seems like this will either segfault (if the tree is gone) or produce a nice warning if the TextIter is invalid. So it seems safe for a language binding not to take special measures here.

1 Like

By definition, this is memory unsafe. In the Rust bindings we probably need to add a lifetime to this so it does not outlive the buffer. I am not sure how you would handle this in OCaml.

I just submitted this bug report. Once this is fixed, it seems fine to treat TextIter records as plain values: misusing it causes either a segfault on the offending call, which seems easy enough to diagnose, or a helpful warning saying an invalid TextIter is being used.

BTW there are several other types we had a similar problem with in Rust bindings:

  • GtkBitsetIter
  • PangoAttrIterator
  • PangoGlyphItemIter
  • PangoScriptIter

So this is not just a one-off thing, it may be a general issue of API policy.

In many cases, it’s basically impossible to invalidate an iterator after the object that created it. For GObjects, we could store a weak reference into the iterator, to nullify it when the object goes away, but that wouldn’t work for non-GObjects. Additionally, most iterator types are stack-allocated structures, and we can’t change them in an ABI-compatible way.

Ideally, we’d have a proper iterator type in GLib/GObject, but it would still require breaking API all over the place.

Adding a lifetime reference to the introspection data is another option, but it’d likely be advisory and only live in the GIR.

If it were ever possible to change the API, the proper way would probably be copying what GVariantIter does:

  • Have a heap allocated new/free/copy holding an owning reference, and expose it to introspection.
  • Have a stack allocated init that does not hold an owning reference, and is not available to introspected bindings. Manual bindings can be written for languages that support stack allocation. (C++, Vala, Rust, etc)