Can bindings for TextIter be memory-safe?

btj · January 23, 2023, 3:55pm

I am working on autogenerating Gtk 4 bindings for OCaml (at GitHub - btj/ocaml-gtk ). I have the very basics (calling constructors and methods for GObject and its subclasses) working. So far, I think my bindings are memory-safe; i.e. OCaml programs cannot trigger undefined behavior through these bindings. But now I am looking at adding support for APIs involving TextIter. I wonder if it is possible to add such support while maintaining memory-safety.

I guess my main question is this: if I create two TextIters that point to some span of text in the middle of a large document, and then clear the TextBuffer, and then try to get the text between the two TextIters, can this cause arbitrary undefined behavior? More generally: how bad is it to use an invalid iterator: will this always crash cleanly (perhaps even with a helpful error message), or can this do arbitrarily weird things?

If using invalid iterators can do arbitrarily nasty things, is there a feasible way to still create safe language bindings? Are there any language bindings out there (for any language) that attempt this?

btj · January 23, 2023, 4:58pm

Answering my own question: I had a peek inside gtktextiter.c and gtktextbtree.c and it appears that the first thing any function that uses a TextIter does is check that the TextIter’s chars_changed_stamp equals the B-tree’s chars_changed_stamp; this value is initialized to a random value and incremented whenever the contents change so it seems like this will either segfault (if the tree is gone) or produce a nice warning if the TextIter is invalid. So it seems safe for a language binding not to take special measures here.

jfrancis · January 23, 2023, 6:58pm

By definition, this is memory unsafe. In the Rust bindings we probably need to add a lifetime to this so it does not outlive the buffer. I am not sure how you would handle this in OCaml.

btj · January 24, 2023, 9:22am

I just submitted this bug report. Once this is fixed, it seems fine to treat TextIter records as plain values: misusing it causes either a segfault on the offending call, which seems easy enough to diagnose, or a helpful warning saying an invalid TextIter is being used.

jfrancis · January 24, 2023, 2:23pm

BTW there are several other types we had a similar problem with in Rust bindings:

GtkBitsetIter
PangoAttrIterator
PangoGlyphItemIter
PangoScriptIter

So this is not just a one-off thing, it may be a general issue of API policy.

ebassi · January 24, 2023, 2:31pm

In many cases, it’s basically impossible to invalidate an iterator after the object that created it. For GObjects, we could store a weak reference into the iterator, to nullify it when the object goes away, but that wouldn’t work for non-GObjects. Additionally, most iterator types are stack-allocated structures, and we can’t change them in an ABI-compatible way.

Ideally, we’d have a proper iterator type in GLib/GObject, but it would still require breaking API all over the place.

Adding a lifetime reference to the introspection data is another option, but it’d likely be advisory and only live in the GIR.

jfrancis · January 24, 2023, 2:37pm

If it were ever possible to change the API, the proper way would probably be copying what GVariantIter does:

Have a heap allocated new/free/copy holding an owning reference, and expose it to introspection.
Have a stack allocated init that does not hold an owning reference, and is not available to introspected bindings. Manual bindings can be written for languages that support stack allocation. (C++, Vala, Rust, etc)

system · February 23, 2023, 2:37pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.