Spellcheck language selection in Web (Epiphany)

GNOME Web (Epiphany) populates options for spellcheck based on the installed locales.

Some issues of determining the available languages with Web in Flatpak were reported. Web also suggests languages that do not have spellchecking installed as reported separately. Another seemingly related issue is the lack of support for typing in multiple languages at once, which was reported earlier.

Is there a consensus that a different approach, other than relying on the configured locales, may not be taken in the future?

For example, GNOME Text Editor relies on enchant-2 to identify and provide spellcheck support based on the installed dictionaries. As a side note, when I was testing the Flatpak version, it did not find an Aspell dictionary specifically installed in the system as a package. Furthermore, when I installed “langpacks-el” in Fedora, it did not show up in Flatpak versions of Text Editor and Web.

Evolution is somehow locating dictionaries too.

And there is also gspell, which seems to provide a unified support.

Was any of these considered in the past and rejected?

The locales are configured by the system administrator but the spelling dictionaries could be installed by the end user, probably depending on what the system is. For example, this is the case on a Guix System. Is it expected to work differently for Flatpak?

The languages offered are all locales that exist, not all locales that are configured. On your host OS this should be comprehensive. But freedesktop flaptak runtimes only include an English locale, i.e. other locales do not exist at all unless they are provided by locale extensions.

I’m not sure why it would ever make sense to have spellcheck dictionaries installed for a locale that doesn’t exist?

enchant2 is the spellchecker used by WebKitGTK. However, dictionaries installed on the host system won’t be used because flatpaks don’t see your host system. I see this bug exists to provide dictionaries in locale extensions, although it’s reported against flatpak rather than freedesktop-sdk, which is not the right issue tracker for it to actually get fixed.

Well Evolution also uses WebKitGTK, so its capabilities should be equivalent to Epiphany? Are you comparing apples to apples (flatpak Ephy to flatpak Evo)?

I don’t know about gspell or what advantages it provides vs. enchant.

Indeed, the spellcheck languages list in natively-installed Web or Evolution seems to correspond to locale -a output and /usr/share/locales contents in Fedora, which is comprehensive.

Sorry if my question is uneducated, but is it mandatory to have the corresponding locales? It might be reasonable to expect them all to be present in a desktop installation. But I still could remove all of them but the one that I use. In Guix, the locales are present in the store but the user is responsible for installing them in that user’s environment, so only the system-wide configured locale is shown by Web by default.

I have several spellchecking dictionaries installed for languages that I do not use often. However, I am not yet convinced that I need the locale data for them.

As a bonus, having Web determine the available dictionaries should solve the issue #254 of showing the languages that do not provide spellchecking.

Indeed, Text Editor in Flatpak does not show the dictionaries that the natively installed version finds.

I primarily compared natively installed applications, but also considered whether the issues with Flatpak are relevant.

Evolution somehow finds aspell dictionaries in a Guix System, while Web doesn’t. There might be little incentive to remove unused locales on a desktop installation, but when I do so in Fedora as a matter of experiment, Evolution finds hunspell dictionaries while Web doesn’t. So, Evolution does something different.

I think so. Without locale data, would we even have any way to know the names of the various languages?

The names of languages are specific to the currently used language, so they must be in the currently used locale. Thus, a single locale must be enough for this purpose as long as this is the currently used locale.

I’ve just tried an old trick of removing unused locale data in a Debian installation, which is attempted sometimes to preserve disk space. I left only Ukrainian locale.

$ locale -a
$ LC_ALL=C apt list aspell-* --installed
Listing... Done
aspell-el/stable,now 0.50-3-7 all [installed]
aspell-en/stable,now 2020.12.07-0-1 all [installed,automatic]
aspell-uk/stable,now 1.8.0+dfsg-1 all [installed,automatic]
$ LC_ALL=C apt list hunspell-* --installed
Listing... Done
hunspell-en-us/stable,now 1:2020.12.07-2 all [installed,automatic]
$ enchant-lsmod-2 -list-dicts
el (aspell)
en (aspell)
en_AU (aspell)
en_CA (aspell)
en_GB (aspell)
en_US (aspell)
uk (hunspell)
uk_UA (hunspell)

However, the names of the languages are shown accurately in Ukrainian in Evolution, including the dictionaries that were installed separately.

Знімок екрана з 2023-08-31 23-27-03

Meanwhile, Web also shows appropriate names but doesn’t show any additionally installed dictionaries.

So does Evolution let you choose spellcheck languages from all languages with dictionaries installed? That seems fine to me.

Yes, Evolution allows me to choose languages from:

  • all languages of the existing locales and
  • all languages of the installed dictionaries.

The latter is the desired behaviour in case there are dictionaries for languages without a locale.

In contrast, Web allows me to choose languages only from all languages of the existing locales, even if dictionaries for other languages are installed too. It would be nice if Web instead behaved the same way as Evolution in this situation.

How to find the names of the installed languages? Currently we read that from locale data.

E.g. how to figure out that “es” means “Espanol”?

You do not need to show each individual language in the list in this particular language, e.g.

  • es → “Español”,
  • uk → “Українська”,
  • el → “Ελληνικά”, etc.

I’ve just noticed that it is how Epiphany shows them.

Instead, you could show all languages in the list in the language of the current locale, e.g. for Ukrainian it would be all in Ukrainian:

  • en_US → “англійська (США)”,
  • uk_UA → “українська (Україна)”,
  • el_GR → “грецька (з 1453) (Греція)”, etc.

Thus, a single locale would be enough. That is how Evolution shows them, even if multiple locales are available.

It would be confusing to show languages in one locale if it were a selector for interface languages. But it is appropriate for spellchecking because the user is assumed to know the language of the locale, maybe even better than the others.

I’ve just made a clean install of Debian 12 and found that it also hides extra locales by default. So, Web only allows selecting Ukrainian language for spellchecking out of the box. However, Evolution shows both Ukrainian and US English out of the box because hunspell-en-us package is pre-installed. It seems that this issue should be common on Debian too, not just in Guix System.

Without locale data, I think all we can show is “uk_UA.” We have to read the language name from somewhere.

After some research, I conclude that it is necessary to utilise iso-codes to get the language name from the language code when the locale is missing.

It is possible to get a locale language name with something like the ln_langinfo_l function and _NL_IDENTIFICATION_LANGUAGE from an existing locale. However, when the locale was not generated or cannot be accessed, I do not see how to get the value with glibc.

Evolution parses an ISO 639 XML data file from iso-codes and uses it to get language names for language codes. Evolution translates the language name into the language of the current locale with dgettext. A side note: iso-codes deprecated XML files in favour of JSON data. Evolution implements a wrapper around Enchant. Enchant identifies dictionaries by language codes, so Evolution derives the language names separately with iso-codes data.

gspell is a GTK library that uses Enchant to provide spellchecking GUI elements and dialogues. However, the stable release of gspell is currently for GTK 3, which I assume to be a problem for potential use in GTK 4 applications, including Epiphany, unless the GTK 4 version becomes stable at any time in the future.

gspell provides the language names by utilising ICU. A comment in the code of gspell states that it used iso-codes in the past but switched to ICU to simplify non-Linux packaging. So, there seems to be no apparent advantage to choosing ICU over iso-codes for Epiphany, which already has iso-codes among its dependencies.

for what it’s worth, Evolution also uses enchant2 (it can use either of
the two enchants), the same version as WebKitGTK, at least on Fedora, I
believe. There is involved enchant_broker_list_dicts() in
the Edit->Preferences->Composer Preferences->Spellchecking.

To get the dictionary/language name Evolution implements its own
e_util_get_language_name(), if you would like to have a look:

which involves iso-codes package.

1 Like

That approach sounds good to me. Roman, feel free to add a comment to preferences: can't add other preferred languages (#2066) · Issues · GNOME / Epiphany · GitLab describing your proposed solution.

That said, you would then need to have a spellcheck dictionary installed to select a language, rather than the locale itself. This probably has basically the same problem? If dictionaries are not available in the flatpak runtime, you won’t have many options…