Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

基于difflib的get_errors的适用范围问题 #546

Open
yongzhuo opened this issue Jan 9, 2025 · 2 comments
Open

基于difflib的get_errors的适用范围问题 #546

yongzhuo opened this issue Jan 9, 2025 · 2 comments
Labels
question Further information is requested

Comments

@yongzhuo
Copy link

yongzhuo commented Jan 9, 2025

Describe the Question

Please provide a clear and concise description of what the question is.

difflib中的replace权重不是最高的,可能不适合csc这些等长问题;

例如我只想要replace,但是会引入很多delete, insert

y_org: 良好得行业生态不应只让消费者炼就"火眼金睛",自辨真伪,而是应该主动提供看的见摸的着得透明服务。
y_new:良好的行业生态不应只让消费者炼就"火眼金睛",自辨真伪,而是应该主动提供看得见摸得着的透明服务。
y_new_op, errors = get_errors(y_new, y_org)
print(errors)
[('得', '的', 2), ('的', '得', 37), ('', '得', 40), ('', '着', 41), ('着', '', 41), ('得', '', 42)]
@yongzhuo yongzhuo added the question Further information is requested label Jan 9, 2025
@yongzhuo
Copy link
Author

yongzhuo commented Jan 10, 2025

多个错误(连续 或 间隔1),前后有相同字符的时候会触发,这时候更倾向于delete, insert

original_text: 耻蓇骨浙多久可以坐
 correct_text: 耻骨骨折多久可以坐
    wrong_ids: [1, 3]
       errors: [('蓇', '', 1), ('浙', '骨', 3)]

original_text: 蛱蝶双飞芍药前,鸯鸳对浴芙蓉水。
 correct_text: 蛱蝶双飞芍药前,鸳鸯对浴芙蓉水。
    wrong_ids: [8, 9]
       errors: [('', '鸯', 8), ('鸯', '', 9)]
       
original_text: 滕膝以下冰凉改善方法
 correct_text: 膝盖以下冰凉改善方法
    wrong_ids: [0, 1]
       errors: [('滕', '', 0), ('', '盖', 1)]

这有一个样例似乎还有Bug, 看着不太对。

original_text: 婴儿发饶褪烧喉手脚冰凉怎么回事
 correct_text: 婴儿发烧退烧后手脚冰凉怎么回事
    wrong_ids: [3, 4, 6]
       errors: [('饶', '', 3), ('褪', '', 4), ('喉', '退', 6)]

@shibing624
Copy link
Owner

我看看,不行就回退到原来的get_errors方法。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants