Pango passes wrong glyph indices to custom renderer

I’m trying to implement a custom renderer for Pango, but in the draw_glyphs function, I run into the problem that the PangoGlyph glyph codes in the glyph info I get passed by Pango through the PangoGlyphInfos in the PangoGlyphString are wrong:

draw_glyphs(PangoFont* font, PangoGlyphString* glyphs, int x, int y) {
    for (int i = 0; i < glyphs->num_glyphs; i++) {
        PangoGlyphInfo *gi = &glyphs->glyphs[i];
        printf("glyph: %u", gi->glyph);
        draw_glyph(font, gi->glyph, p->x, p->y);
        x_position += gi->geometry.width;
    }
}

If I look at the output of this function, if in the input text i have a c for example, the value of gi->glyph for that is 102 which according to the ASCII table is the letter f, not the letter c.

On top of that, this offset changes depending on the font I use - in the LiberationSans family, with the regular, bold and italic variants the offset is -29, so for an uppercase Z in the input text I get a greater-than sign; with the bold-italic variant, the offset is +3, so I get an f for a c in the input. With other fonts, the offset is again different.

Am I supposed to do any mapping from the glyph code, or could my .ttf files be corrupted in some way ? (I am using a custom font map as well, so it uses custom .ttf files instead of system provided ones via the provided PangoFT2FontMap, but in font viewers they both seem okay)

What I could do is keep track of how many glyphs I’ve drawn in total and get the actual charcode from the text string - but that feels hacky, and also it would probably break if I turn on ellipsization.

PangoGlyph is a typedef for guint32 and represents glyph indices in a font, not ASCII or Unicode. Mapping of font internal indices to / from Unicode is done via the cmap table.

See also Pango – 1.0: Fonts and Glyphs

Ooh okay, RTFM problem I guess XD

But, I’m still a little confused as to how to use the cmap table - is there a function that I can pass the glyph index to that gives me back a character code, or how should I do the map-back ?

Mapping of Unicode characters to glyph indices is usually done by the shaping engine (i.e. Harfbuzz). I don’t know if this is exposed to the user, however: [HarfBuzz] Getting glyph information using Harfbuzz API.

If you’re using PangoCairo there is cairo_scaled_font_text_to_glyphs

I’m using Pango without Cairo as I am building my own rendering backend. The link to the mailing list was a good pointer because it mentioned hb_font_get_nominal_glyph which allows me to get a glyph ID from a unicode character code.

So basically now I have to iterate through unicode char codes 0x0000 to 0xFFFF and save the glyph IDs to a map, and then perform a lookup in that map when I get passed a PangoGlyph, right ?

Yes, that should work! Note however that advanced glyphs used for OpenType features (like ligatures) are not part of cmap. After all you cannot easily map them to unicode codepoints.

Maybe you don’t need shaping at all? To work only at the text level you may apply itemization and line breaking. See this overview of the rendering pipeline

And when do I need shaping ? Sorry, but that is not very clear for me despite reading the overview you linked.

Shaping makes fine adjustments to glyph positions (see kerning) and applies overall “embellishments” (for example ligatures). Those things don’t make much sense when working at the text level.

Out of curiosity, how do you plan to draw text?

I’m implementing markup text for a custom GUI library based on OSG for FlightGear, a free, open source flight simulator.
Could you please clarify what you mean with “working at the text level” ?

Could you please clarify what you mean with “working at the text level” ?

I mean anything that can be done on strings (sequences of unicode codepoints). From the moment you do shaping, on output you get a sequence of font internal glyphs (set of bezier curves) and their positions. You don’t have unicode codepoints anymore.

I’m implementing markup text for a custom GUI library based on OSG for FlightGear, a free, open source flight simulator

Nice! May I ask why you have to convert the font’s glyph ids back to unicode codepoints? Font rendering libraries tipically operate on glyph ids. For example:

I see that OSG has osgText::Glyph::Glyph(Font *font, unsigned int glyphCode), do you know what glyphCode represents? Is it a Unicode codepoint or a glyph id?

I’m getting glyph images through osgText::Font::getGlyph, and that function uses the unicode codepoint, not the glyph code.

ATM I’m just doing this:

pango_layout_set_markup(_layout, markup);
pango_renderer_draw_layout(_renderer, _layout);

where _layout is a PangoLayout, markup is the input markup string, and _renderer a derived PangoRenderer (which implements draw_glyphs).
What would I need to change to skip the shaping step ?

What would I need to change to skip the shaping step?

There are many ways, but you could override draw_glyph_item rather than draw_glyphs. This gives both the glyph ids and the original associated text.

Do you have to support i18n scripts like arabic, CJK, etc.? You could disable extended shaping features via harfbuzz:

Ok, consider however that you cannot do advanced text rendering that way. Perhaps in the future you may switch to a dedicated, GL-based text rendering library. There are many out there using SDF (Signed Distance Fields)

Yes, CJK, and in the future probably also arabic.

1 Like

So no ligatures etc as needed for arabic fonts ? Could you recommend one that is preferably written in C++, not C ?