Pango Indexes to String Positions Correctly

williamkappler · June 17, 2023, 8:38am

To get straight to the point: in C/C++, given Pango indexes obtained from a function like pango_layout_xy_to_index, how am I supposed to find the corresponding position in the string I fed into the layout? I need this to be safe for UTF-8, ampersand-style special characters (see below), and formatting tags.

This is for the purpose of implementing a copy/cut/replace system for custom UI elements using Pango-Cairo for text rendering. So, essentially, I’m trying to find a reliable way to get the text between two click-points run through pango_layout_xy_to_index.

I ask primarily because I’m trying to re-implement some spaghetti I programmed almost a decade ago that is supposed to do this. There’s a special case claiming ampersand-style characters (stuff like nbsp) are counted as 1 character in Pango indexes, but obviously occupy several in the string given to Pango. Thus, I need to do conversion. I have zero confidence in any of this, but I wouldn’t have added that without a reason and it had been working (I think). What I don’t know is why Pango would be doing that, since the documentation seems to imply the index should also be the index into the original UTF-8 string given to Pango. Perhaps it’s something to do with GMarkup? It could even be an ancient bug that has since been fixed.

What I’m hoping for is an authoritative answer on how I’m supposed to do this, so I can avoid any more experimentation/guessing/kludges/bugs.

That said, I am aware Pango handles some extreme cases (like text going different directions within a line). If that makes a difference in the answer, let’s assume I only care about text going a uniform direction. I have enough problems with that.

I’m also aware there are going to be some gotchas with formatting tags. Right now, I’m only concerned about their presence not breaking my ability to locate character positions after they appear.

For the record, this is how I feed text into the layout:

pango_layout_set_markup( mPangoLayout, mText.c_str( ), -1 );

matthiasc · June 17, 2023, 11:34pm

pango_layout_xy_to_index is about going from a pixel position in the formatted output (think mouse click) to a byte in the utf8 input.

As you say, pango, like any other library in the gtk stack, assumes that text is encoded in utf8, so a single character will usually take up 1-4 bytes. GLib has apis for moving through a utf8 encoded text, and for extracting characters from it.

williamkappler · June 18, 2023, 3:33am

Thanks for the reply. That’s good confirmation. UTF-8 multibyte isn’t a problem for me.

After thinking about this more, I what I thought Pango may have been my fault from the beginning. Whatever the case, it sounds like everything should be straight forward now.

system · July 18, 2023, 3:34am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.