Vala gtk4 in gtkbuffer apply_tag disturbed by accents

bonjour à tou[te]s
désolé pour mon anglais
j’écris grâce à https://www.reverso.net/traduction-texte

apply_tag disturbed by accents
example :

using Gtk;
/*
valac --pkg gtk4 --pkg gtksourceview-5 g.vala && ./g
	apply_tag sur GtkSource.Buffer questions si accents ( ou autres )
*/
int main(string[] argv) {
	Gtk.Application app=new Gtk.Application(null,GLib.ApplicationFlags.HANDLES_OPEN);
    app.activate.connect(() => {
        Gtk.ApplicationWindow window=new Gtk.ApplicationWindow(app);
		window.set_default_size(640,320);
		GtkSource.View src=new GtkSource.View();
		Gtk.TextTagTable tag=new Gtk.TextTagTable();
		Gtk.TextTag tagl=new Gtk.TextTag("link");
    	tagl.set_property("underline",Pango.Underline.SINGLE);
        tagl.foreground="#F6624A";
		tag.add(tagl);
		GtkSource.Buffer buf=new GtkSource.Buffer(tag);
		buf.text=	"\n	aa https://www.aa"+
					"\n	bb http://www.bb"+
					"\n	éé https://www.cc"+
					"\n	dd https://www.dd"+
					"\n	ee http://www.ee";
		src.set_buffer(buf);
        window.set_child(src);
        window.present();
		Gtk.TextIter deb,fin;
		int ts,te;
		buf.get_start_iter(out deb);
		buf.get_end_iter(out fin);
		MatchInfo trv;
		try {	Regex url=new Regex("https?://[a-zA-Z0-9=?./-]+");
				if ( url.match(buf.get_text(deb,fin,false),0,out trv) ) {
					do {	trv.fetch_pos(0,out ts,out te);
							Gtk.TextIter start;
							buf.get_iter_at_offset(out start,ts);
							Gtk.TextIter end;
							buf.get_iter_at_offset(out end,te);
print(buf.get_text(deb,fin,false).substring(ts,te-ts)+"\n");
							buf.apply_tag(tagl,start,end);
					} while ( trv.next() );
				}
			} catch ( Error e ) {
		}
    });
    return app.run(argv);
}

it sounds but, after the accents, the 'link" mark is offset from the number of accented characters
I make a mistake where?
thank you in advance

oubli : linux manjaro, gtk 4, gtksource 5

Hi,

fetch_pos() will give you a position as pointer (i.e. byte index), while get_iter_at_offset() takes a character position (i.e. independent of the underlying size of each character).

Accented characters line “é” are encoded in multiple bytes (2 bytes in UTF-8), so the following character will have a +1 character offset but a +2 byte position.

You will need to convert the byte indexes te and ts into offsets using functions like GLib.utf8_pointer_to_offset

thank you very much
I suspected well utf8 !
It remains to be seen how to use GLib.utf8_pointer_to_offset
with vala, because the doc is more than succinus!
@+

1 Like

finally, with difficulty, a sequence that works,
even if I’m sure it’s not the best solution

		try {	Regex url=new Regex("https?://[a-zA-Z0-9=?./-]+");
				string txt=buf.get_text(deb,fin,false);
				if ( url.match(txt,0,out trv) ) {
					do {	trv.fetch_pos(0,out ts,out te);
							int ts1=txt.index_of_nth_char(ts);
							ts+=(ts-ts1);
							int te1=txt.index_of_nth_char(te);
							te+=(te-te1);
							buf.get_iter_at_offset(out start,ts);
							buf.get_iter_at_offset(out end,te);
							buf.apply_tag(tagl,start,end);
					} while ( trv.next() );
				}
			} catch ( Error e ) {
		}

again thank you

oups…
characters like ╭ ┬ ┼ ├ disturb !
( there must be others )
not simple vala and gtk4

No, the other way around: you already have an index, and now you need an offset.

Have you tried string.pointer_to_offset – glib-2.0 ?

Note that I’m not sure these pointer games are really possible from vala… Language bindings usually don’t like that. In the worst case, you may have to implement a small C file with a helper function, and make valac compile and link it with your vala code.

`string.pointer_to_offset’ is deprecated. Use string.char_count
I followed the advice
I will dig deeper

I had time to see again
and using "char_count() " it works
@+

1 Like