Orca multilingual use

Orca seem not to support multilingual documents or other languages as the default LANG.
If, for example, I am on a french web page, I expect that the choosed language is “ffr”, for a multilingual page, that the text marked as lang=“en” should use the correct voice.
Actually all is read with the configured language eg. LANG=de_DE.utf8
Foreign text are not easy to be understood.
I use as speech synthesizer pico which sound better as the festival, espeah(-ng) and so on.

With spd-say I can set the language and get the correct pronunciation.
is there a way to get orca multilingual?

at-spi has the text language attribute, but afaict, firefox does not currently translate the lang= html attribute into at-spi text language attribute yet.
And then orca doesn’t support such attribute yet either, but we need firefox to produce them anyway.

Okay, Do you know how I can intercept firefox messages to orca?

You can use accerciser to inspect how firefox exposes the html elements. Notably in the preferences dialog, you can set the “Quick Select” global hotkeys to quickly get to your element inside accerciser.

accerciser this is not very useful for my case. I have to learn more on it.
I think that I must first review a lot of
code.

I have just launched orca with debugging.

<html lang=“en”>

<h1>Orca Test</h1>
<p lang=“fr”>Bonjour, comment allez vous ?</p>
<p lang=“de”>Gutentag, wie geht es ihnen?</p>
<p lang=“en<”>Hello, how are you?</p>

The log fife show that firefox return the language to use.

WEB: Results for text at offset 27 for [paragraph | ] using TEXT_BOUNDARY_LINE_START:
String: ‘Gutentag, wie geht es ihnen?’, Start: 0, End: 28.
INFO: Getting text attributes for [paragraph | ] (chars: 0-28)
INFO: Attributes at 0: [‘font-weight:400’, ‘font-style:normal’, ‘font-size:12pt’, ‘text-position:baseline’, ‘background-color:rgb(255, 255, 255)’, ‘font-family:DejaVu Sans’, ‘language:de’, ‘color:rgb(0, 0, 0)’] (0-28)

jjsa via GNOME Discourse, le lun. 20 mars 2023 14:19:17 +0000, a ecrit:

WEB: Results for text at offset 27 for [paragraph | ] using
TEXT_BOUNDARY_LINE_START:
String: ‘Gutentag, wie geht es ihnen?’, Start: 0, End: 28.
INFO: Getting text attributes for [paragraph | ] (chars: 0-28)
INFO: Attributes at 0: [‘font-weight:400’, ‘font-style:normal’,
‘font-size:12pt’, ‘text-position:baseline’, ‘background-color:rgb(255, 255,
255)’, ‘font-family:DejaVu Sans’, ‘language:de’, ‘color:rgb(0, 0, 0)’] (0-28)

Oh!

It seems text.getAttributes(0) is not working, while
text.getAttributeRun(0) is working.

So it’s good :slight_smile: Now it needs to be plugged inside orca onto making
src/orca/speechdispatcherfactory.py request changing the language on the
fly.

Samuel

I had a look for libreOffice Writer, according to the log output the language of the elements is returned but the whole is produced with the language set to $LANG (de_DE.utf8 for my case):

12:35:39.785449 - INFO: Getting text attributes for [paragraph | ] (chars: 0-19)
12:35:39.785735 - INFO: Attributes at 0: [‘text-decoration:none’, ‘writing-mode:lr-tb’, ‘indent:0mm’, ‘pixels-below-lines:0mm’, ‘vertical-align:baseline’, ‘stretch:normal’, ‘text-rotation:0’, ‘style:normal’, ‘direction:ltr’, ‘language:en-us’, ‘paragraph-style:Standard’, ‘tab-interval:12,51mm’, ‘strikethrough:none’, ‘variant:normal’, ‘size:12’, ‘family-name:Liberation Serif’, ‘weight:400’, ‘bg-color:255,255,255’, ‘pixels-above-lines:0mm’, ‘line-height:100%’, ‘invisible:false’, ‘justification:left’, ‘underline:none’, ‘text-shadow:none’, ‘left-margin:0mm’, ‘font-effect:none’, ‘right-margin:0mm’, ‘scale:1’, ‘fg-color:0,0,0’] (0-19)
12:35:39.785754 - INFO: 1 attribute ranges found in 0.0003s
12:35:39.785817 - SPEECH GENERATOR: None voice requested with language=‘en’, dialect=‘us’

12:35:39.787332 - SPEECH: Last spoke 2.6057 seconds ago
12:35:39.787354 - SPEECH OUTPUT: ‘Hello, how are you?’{‘established’: True, ‘family’: {‘name’: ‘sabrina’, ‘lang’: ‘de’, ‘dialect’: ‘DE’, ‘variant’: ‘none’}}
12:35:39.787657 - SPEECH DISPATCHER: Speaking ‘Hello, how are you?’
ORCA rate 5.0, pitch 5.0, volume 5.0, language de, punctuation: MOST
SD rate 2, pitch 10, volume 100, language de-de