-
-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spiel speech API #1335
base: gtk4
Are you sure you want to change the base?
Add Spiel speech API #1335
Conversation
If you mean supporting providers that don't support mark events but do support word boundary events, that is indeed not something currently supported by foliate-js. I do plan on adding this, since SSML support seems to be problematic in browsers. (Ideally I'd like to switch to the Web Speech API once it's supported by WebKitGTK.) One slight snag is that the marks are currently also used to implement "speak from here" and pausing, mainly to ensure that the speech always begins from word boundaries. Maybe it could do without this (or decouple this from the speech text), in which case it shouldn't be too hard to do without marks. Just need to maintain a text walker instance (see https://github.com/johnfactotum/foliate-js/blob/main/text-walker.js) to convert string offsets to ranges. |
It makes sense moving into the web view. Happy to see that it is at least proposed in WebKitGTK. Note: because of the way spiel works there is no need to track the speech progress for pausing purposes. Calling pause will simply pause the stream. So I think SSML marks would need to stay exclusively for speech dispatcher support. And yeah, we would need to de-serialize the string offset to a DOM range, which may or may not be trivial? |
Ideally I think it should be capable of rewinding to the start of the last word or sentence other than simply pausing the stream. The current behavior in Foliate isn't good either because it simply restarts from a word boundary, which would result in incorrect pronunciation or intonation. The same goes for "speak from here" (i.e. starting speech from a user-selected position). Avoiding part words isn't really an important feature, though. Mostly it's just what you get "for free" since the text is already segmented. It's probably fine to just drop it.
For plain text it should be okay. It might be non-trivial when using SSML, when the offsets are that of the SSML source string, because in general mapping source offsets to nodes is difficult if not impossible with the browser's DOM APIs. But since the XML here is controlled and rather simple, maybe one can just count the number of characters between |
I noticed that the upstream issue has been fixed: WebKit/WebKit@290d009. So I guess going forward, one should fix this by implementing all TTS functionality in foliate-js with the Web Speech API. The implementation should ideally have the exact same interface as how foliate-js currently handles media overlays. That would simplify things a lot. |
It says |
Spiel is a modern speech synthesis API for the desktop that will hopefully support many kinds of providers and voices. It has GI bindings, so adding it to foliate shouldn't be hard.
I started a port, but ran into some trouble with how foliate creates SSML mark elements to report speech progress. Spiel has speech boundary events, including SSML marks, but now all providers support it. Unfortunately I think changes will be needed in foliate-js as well to make this work. Specifically, pre-segmenting the text into marks gets in the way here.