Fix batch split bug#257
Conversation
|
@microsoft-github-policy-service agree |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #257 +/- ##
===========================================
+ Coverage 80.70% 96.60% +15.89%
===========================================
Files 35 92 +57
Lines 6910 74355 +67445
===========================================
+ Hits 5577 71828 +66251
- Misses 1063 2191 +1128
- Partials 270 336 +66
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@shueybubbles could you take a look at this? |
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a bug in the batch splitter where the word GOTO was incorrectly recognized as a GO batch delimiter by adding a forward lookup to ensure the separator isn’t part of a larger word, and adds a corresponding test case.
- Added a check in
hasPrefixFoldto reject matches where the next character is a letter. - Introduced a unit test to verify that
GOTOisn’t split like aGObatch command.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| batch/batch.go | Added a forward lookup in hasPrefixFold to ensure the next character isn’t a letter when matching. |
| batch/batch_test.go | Added a test item verifying that GOTO doesn’t trigger a batch split at GO. |
Comments suppressed due to low confidence (1)
batch/batch_test.go:70
- [nitpick] Add a test case covering lowercase
go(e.g.,govs.gotoflag) to ensure the splitting logic is case-insensitive and handles lowercase separators correctly.
testItem{
| if len(s) > len(sep) && unicode.IsLetter(rune(s[len(sep)])) { | ||
| return false |
There was a problem hiding this comment.
When checking the character after the prefix, use utf8.DecodeRuneInString to properly handle multi-byte runes instead of casting a single byte to a rune.
| if len(s) > len(sep) && unicode.IsLetter(rune(s[len(sep)])) { | |
| return false | |
| if len(s) > len(sep) { | |
| r, _ := utf8.DecodeRuneInString(s[len(sep):]) | |
| if unicode.IsLetter(r) { | |
| return false | |
| } |
There was a problem hiding this comment.
@heppu add some double byte char test cases to cover this.
Addresses review feedback on the GO/GOTO word-boundary fix: - Use utf8.DecodeRuneInString so the follower-char letter check sees the full rune, not the leading byte of a multi-byte UTF-8 sequence. Casting rune(byte) misclassifies multi-byte runes (for example Hebrew aleph U+05D0 has leading byte 0xD7 which is the MULTIPLICATION SIGN, not a letter). - Extend TestHasPrefixFold to cover word-boundary cases (GOTO, gotoflag, GO1, GO_FOO), Latin-1 follower, and the Hebrew-aleph case that distinguishes the two implementations. Co-authored-by: Henri Koski <[email protected]>
|
@heppu — pushed a follow-up commit to your branch addressing the review feedback so we can move this forward (you had What changed:
You're credited via |
Keep both new TestBatchSplit cases: the GOTO/Bookmark case from this branch and the create-table 'gone_ts' case from microsoft#248 on main. Both exercise the word-boundary protection from different angles.
|
Resolved the merge conflict with Heads-up on a redundancy that the merge surfaced: #248 (already merged to Two notable differences between the two defenses:
Options for getting this to land:
@shueybubbles, any preference? I'm fine with any of the three. The default-easiest path is leaving it as-is now that the merge is clean. |
Parsing breaks on
GOTOword so I added ahead lookup with length validation to fix this. Without this fix running DB migration scripts isn't possible. PR waiting for this here.