Skip to content

Commit d1b64a7

Browse files
Update crfcut.py to modify the logic for Sentence splitting.
Modified the logic for splitting of sentences due to empty strings or spaces.
1 parent 41558bb commit d1b64a7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

pythainlp/tokenize/crfcut.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ def segment(text: str) -> List[str]:
204204
if toks[idx].strip().endswith(("!", ".", "?")):
205205
labs[idx] = "E"
206206
# Spaces or empty strings would no longer be treated as end of sentence.
207-
elif toks[idx].strip() == "":
207+
elif (idx == 0 or labs[idx-1] == "E") and toks[idx].strip() == "":
208208
labs[idx] = "I"
209209

210210
sentences = []

0 commit comments

Comments
 (0)