Open
Description
It's not always appropriate to normalize a word with embedded punctuation by inserting spaces before + after the embedded punctuation character.
A couple of counter examples from some of the target sentences in recent XRI datasets:
Original sentence: Niri poki pule kua-kua dabe edieng wao pihak ruha ihi partai nbe tenama tule hu'a gu mege wai.
Normalized sentence: Niri poki pule kua - kua dabe edieng wao pihak ruha ihi partai nbe tenama tule hu'a gu mege wai.
Original sentence: Ge a bi? nulu-waleng nu tenama dia wai dabe soro hulu mata nbe
Normalized sentence: Ge a bi? nulu - waleng nu tenama dia wai dabe soro hulu mata nbe
Normalizing the word 'kua-kua' to 'kua - kua', or the word 'nulu-waleng' to 'nulu - waleng' is not correct.