What would the steps be for creating a new language binding for a high-level scripting language? (I’m thinking in particular of Janet — a small interpreted language, implemented in C. I’d like to be able to write GTK desktop apps with it on Debian GNU/Linux. But hopefully this thread would be of use to anyone embarking on such a journey.)
Is there a tutorial available?
What would a rough outline of the required steps look like for creating a new language binding using GObject Introspection?
I assume that one early prerequisite is to be able to write a GTK hello-world app in C. Correct? I found the docs for that at [GTK 4 Reference manual] https://developer.gnome.org/gtk4/stable/. What other prerequisite skills/knowledge would you suggest?
My understanding is that it would involve using libgirepository. I found some docs for that at https://developer.gnome.org/gi/stable/. (Sorry, new user, cannot put more than 2 links in this post.)
Most important tip: Do not start from scratch. GObject-Introspection API docs are not enough to do that. I did, and spent 1600 hours for the gintro Nim bindings, with a lot help of Mr. Bassi, Mr. Droege and others. And the bindings are not that good as the Rust bindings still and maybe never will.
Today I would not try again, but hire some GTK and Rusts devs to create the Nim bindings. Indeed we still consider that, but due to the small number of users we would need a sponsor, budget may be about 50k Euro?
We created an initial low level version of Nim bindings in 2015 already, by processing the GTK header files. But we noticed that low level bindings are useless, so we used Gobject-Introspection. Later Mr. Bassi recommended that maybe using the XML gir files directly may be an alternative. That way one can also extract comments and some other data, which gobject-introspection do not provide.
As Janet is a scripting language, you may investigate how the Python bindings work. I did not, as Nim is compiled. I think I looked at D and C++ bindings in 2016, I think Rust had not that good GTK support that time.
But the experts for bindings are Mr. Bassi and Mr Droege.
It seems probably not a good idea to write new bindings without a complete understanding of the GObject memory management, as well as your target language’s GC. Maybe you can start with using GIFunctionInfo to call some of the simpler toplevel GLib functions i.e. ones that operate on primitives and strings, and then go from there.
More comprehensive documentation is needed in this area. However, it is difficult to envisage a tutorial that could address a wide audience given the range of target languages and various possibilities. The current guidance provides useful points but I don’t think it is prescriptive enough, and there may not be enough context for newcomers to understand the points being made.
The very first decision IMO is about the architecture, in particular whether to generate bindings from GIR files or TYPELIB files.
A GIR file is an XML description of introspected entities. The XML format is captured in a schema. A GIR file provides rich information that is intended to be used offline and is suitable for generating bindings code. Note that GIR files may contain some platform-specific data but are mostly platform-independent. For example: they still refer to C types such as gint and glong; they don’t provide the size of a struct type or field offsets in a struct; a value such as GLib.SIZEOF_LONG is inherently platform-specific; they contain documentation.
A TYPELIB file is an efficient binary description of introspected entities, accessed via libgirepository, designed to be used for run-time introspection. It is compiled for the target platform with unnecessary information removed. For example: numeric type sizes are resolved for the platform, so you can’t in general determine the original C type; there are no alias types; function shadowing has been resolved.
Of course, it is possible to use GIR files at run-time but that is likely to be unnecessary and certainly very inefficient. Conversely, it is possible to use TYPELIB files offline, as mentioned in the current guidance, but this approach needs a big warning sign. It is easier to get started using TYPELIB files via libgirepository because the library provides a ready-made introspection API. With GIR files, you need to parse the GIR files and create your own API which involves determining default values for entities not mentioned in the XML, resolving function shadowing/moving, managing dependencies on other GIR files, possibly other things. For this reason it can be tempting to use TYPELIB files to generate bindings offline without considering the consequences. This approach should be used only if you expect to generate code for each target platform, not if you want to generate one platform-independent library. Also, this should not be done if you need alias types in the generated code or, in future, you want to include the documentation in the generated code. Generally, this should probably not be done if you want a human-readable library. This approach may be suitable if, for example, you generate code behind-the-scenes that is just machine read.
(Aside: I learnt this the hard way. I developed SML bindings to libgirepository only to find the generated code wasn’t platform-independent. To reuse all that I had done, I developed an interface to GIR files that has an almost identical API. For example, for GIRepository.Repository, compare the TYPELIB API with the GIR API.)
In summary, the decision about whether to use GIR files or TYPELIB files will depend on how you expect to use the introspection metadata. Considerations may include: development stages (editing, compiling, running); distribution of your bindings and applications that use them, etc.
As others have said, it is not a small undertaking and you really do need a good understanding of the C API. The amount of work in producing bindings will depend on various factors. If your target language is quite low-level, close to C with little abstraction, you should have less work than for a language with a high level of abstraction.
The only other bindings for an interpreted Lisp written in C that I know of is guile-gi.
Thanks! Looks like a useful reference! Note, Janet is certainly a Lisp-like language, but not sure if it’s really a lisp per se (it has regular array and hashe data structures, rather than cons cells).
… [understanding of] your target language’s GC
Yes. I see there’s a chapter of the Janet docs on this. Thanks.
Phil, sorry for the delay in getting back to you. Thanks so much for your detailed reply.
You mention the choice of:
using libgirepository (read TYPELIB at runtime), vs
using the GIR files (at build-time?)
The first bullet point in the Guidelines doc mentions deciding between being implementation-agnostic vs implementation-specific. Does that correlate with your two points above, or is it orthogonal to using GIR vs TYPELIB?
I see that the guidelines doc suggests using libgirepositry for an interpreted language.
Is there a tool to generate a skeleton language binding to get you started if you give it such parameters as interpreted vs compiled, dynamic vs statically-typed, or some other params?
Regardless, it appears to me that there’s at least a few prerequisites:
refresh one’s C skills if necessary,
familiarize one’s self with using GTK from C if not already familiar, and
learn your language’s API for communicating with native C libraries (ex. Janet’s CAPI)
One thing I don’t yet see is the connection between Janet’s CAPI and what libgirepository provides.
I think that a high-level overview of how the whole thing works would be useful, even if it doesn’t contain all the details. That is, an outline describing:
what code (GIR, TYPELIB) is generated (or maybe supplied with GTK),
what code you yourself have to write,
which pieces connect to what,
what needs to be compiled and what build products are produced, and the tools used for this,
what other tools there are available to smooth along the process.
(Though, I’m sure that my learning more about the HLL’s CAPI would clarify some of the above.)
I’d be curious to read a comparison of using Swig to create bindings vs libgirepository/GIR/TYPELIB, if anyone knows of any links for that. (Was that how the previous Python binding (PyGTK) was built?)
For gobject-based libraries, I don’t think I would suggest using swig for any non-trivial purposes. The gir XML is roughly solving the same task, but it contains more information. Swig may work if you don’t need to do much and just want to get something simple up and running quickly.
One point I forget in my first post is that you have to decide if you will use proxy objects and the gobject toggle references, or if if you will use only gobjects based types for subclassing. The gintro Nim bindings use currently proxy objects and toggle references. But Mr. Droege told us recently that the Rust bindings have managed to avoid toggle references, which saves some overhead and better allow low level stuff.
Yes, build-time, i.e. sometime no later than compile-time.
I read that as an orthogonal matter: where you have a language with multiple compilers/interpreters available, there may be a choice between using standard features of the language that should work with all compilers/interpreters (implementation-agnostic) and using features specific to a particular compiler/interpreter (implementation-specific). This isn’t even a simple binary choice. You could use some implementation-agnostic features and some implementation-specific features. For example, in Giraffe Library, which generates platform-independent SML source code, the signatures are implementation-agnostic but the structures are implementation-specific (necessarily so because there is no standard C FFI in SML, each compiler has its own).
A reasonable suggestion: an interpreter will resolve dependencies on the target platform (in the same process in which the code runs) and this information can be efficiently obtained via libgirepository. (Note that if the language allows type annotations, you can’t write alias types e.g. Gtk.Allocationusing libgirepository: you would have to refer to its underlying typeGdk.Rectangle. Type aliases are available only in the GIR files.)
I don’t know of such a tool and I suspect nobody’s attempted to create a general tool because there is so much variation in the possible mechanisms for language bindings. (For example, there may be approaches that don’t generate any code because an interpreter is extended to interact with libgirepository, using libffi to dynamically create calls to functions.)
Yes, a complete understanding of these topics is required.
Looking briefly at Janet’s C API described in the link you provided, it looks like you want to generate a Janet module, one module for each GObject namespace, that is compiled up front. Your Janet functions will be calling the C functions and you’ll need a #include directive at the top for the C functions you depend on, e.g. #include <gtk/gtk.h>. That in itself is enough to require you to generate code from GIR files because the C include information is not available via libgirepository.
Even then, there is a key decision to make. Do you:
distribute pre-generated modules that users can compile on their machines or
expect users to generate these modules on their machines, i.e. run your code generator?
If you do 1, then you will need to guard all functions according to their availability, e.g.
If you do 2, then you shouldn’t need to guard functions because users should use the GIR files for the version they have installed.
The code generator in Giraffe Library could be useful as a starting point because it already generates C wrapper functions for one of the compilers, which is a similar task. However, I would advise doing several examples by hand first, writing the Janet modules manually, before generating anything. You will need to understand memory management and using Janet’s GC. Even for the basic Hello World example, you will need to put a mechanism in place for calling Janet functions from GObject closures.
For example, libgirepository tells you about the arguments of a C function - their type, direction, nullability, etc. You would use this to determine how to wrap/unwrap the arguments in the Janet wrapper function.
There are different possible architectures but documentation as you describe for common architectures would be very useful. Possibly some sort of decision chart helping you decide on an approach would be useful.
I don’t know about Swig. A quick read suggests that it would work from the C source code. If so, you would have to provide some configuration for Swig for every library that would have to be maintained going forward, which sounds like quite a burden. Also, the introspection metadata is provided in gtk-doc comments and is more abstract than what is available in the C code, e.g. C function arguments are classified as ‘in’/‘inout’/‘out’ parameters but I can’t see how Swig would know about that.
Re. distribution, Janet comes with a jpm package manager tool which can also build native code, so I’d need to explore how that fits in here as well (in addition to all the review/learning of prerequisites).
Ah, yes, that would be great. If I’m understanding correctly, maybe some prose could be added to the architecture page.
Ah, I’d forgotten about the gtk-doc comment annotations. Thanks.