Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing phonemes when converting from MusicXML to DS #619

Open
fcnjd opened this issue Dec 13, 2024 · 4 comments
Open

Missing phonemes when converting from MusicXML to DS #619

fcnjd opened this issue Dec 13, 2024 · 4 comments

Comments

@fcnjd
Copy link
Contributor

fcnjd commented Dec 13, 2024

Hi,
what is the best way to convert a MusicXML file to a Diffsinger script? When I try this I'm getting the rror:
ParamsError: The source file lacks phoneme parameters.
I basically understand what this means, MusicXML doesn't provide Phoneme's natively. So, how do I do that then? I thought opencpop-extension would specify those, so they could be used?
Thank you for any hint you have on this.

@SoulMelody
Copy link
Owner

The source file needs to store a list of phonemes with their exact durations in the note attribute. Currently only a few project file formats have implemented this feature, such as acep, svip, tlp, etc.

@fcnjd
Copy link
Contributor Author

fcnjd commented Dec 16, 2024

Thank you for the explanation. However, I'm still not sure how to solve the problem(s) I currently have, maybe you have an idea.
My workflow right now is converting MusicXML to USTX and then using Diffsinger through OpenUTAU. However OpenUtau separates syllables a bit strange, so whole words are not processed, leading to wrong pronunciations in many cases. Plus, connected notes won't sing as they should, instead the lyric la is always inserted. I had hoped to solve these, by converting directly to Diffsinger, which should be able to process whole words and connected notes. But this doesn't look like it's working - even if I convert to a format that supports those parameters in between, like SVIP or USTX, I can't continue directly to DS withouth previously running the software, to get the phonemizer add the flags. Do you have any idea what I could try to improve that?
Thank you very much in advance if you have any advice, and for all your effort. I'll make a PR, as soon as I have the German translation ready.

@oxygen-dioxide
Copy link
Contributor

oxygen-dioxide commented Dec 17, 2024

musicxml represents a printed sheet music for human reader, while ustx represents an input to machine singing synthesis program. That's why they handle lyrics differently. We should have a smarter lyric conversion logic.

For multisyllable lyrics: In OpenUtau, we have to input the whole word in the first note, use + to distribute syllable and use +~ to extend the current syllable, which is different from a printed sheet music.
image

For "connected notes" (in OpenUtau we call them slur notes), they are represented in sheet music as two notes connected with a slur symbol:
image

In OpenUtau, input lyrics on the first note, and input +~ on the second note.

@fcnjd
Copy link
Contributor Author

fcnjd commented Dec 20, 2024

@oxygen-dioxide Thank you for pointing that out - maybe I should give some more context why I created this issue: I'm blind, and use a a screenreader, therefore the images you sent don't really help me. I'm familiar with sheet music and work with Musescore a lot, but new to SVS. Since OpenUTAU isn't really accessible, I wouldn't use it if I didn't have to, it's for me nothing more than a middleware between MusicXML and Diffsinger. Because of this, I was hoping to be able to convert MusicXML directly to DS, so asked this. But as SoulMelody wrote, this is because of the missing parameters not possible.
Since I neither know if Diffscope's already finished, nor does LibreSVIP support the dsp format, it looks like I still need to keep with OpenUTAU for my Diffsinger creations. I have very limited access to it, selecting a singer and Phonemizer works, but I can forget about the editor. That's why your hint with +~ for slur notes was already very valuable. LibreSVIP doesn't insert it, instead it inserts la. However I tried it, and now after each command I run:

sed -i 's/\blyric: la\b/lyric: +~/g' file.ustx

That fixes the slurs - however since this was not very intuitive, maybe LibreSVIP could do that in the future automatically.
As for words with more than one syllable: The info that a + is used is already good to know, but having to go through the whole file and insert it everywhere inside of the words would need much time, as this needs a manual process. If the lyric handling could get smarter, it could save a lot of time.
Thank you again, +~ already was very helpful, and maybe we can simplify this process in the future more. Have a mery christmas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants