Skip to content

Fix batch split bug#257

Open
heppu wants to merge 4 commits into
microsoft:mainfrom
heppu:fix/batch-split-bug
Open

Fix batch split bug#257
heppu wants to merge 4 commits into
microsoft:mainfrom
heppu:fix/batch-split-bug

Conversation

@heppu
Copy link
Copy Markdown

@heppu heppu commented May 1, 2025

Parsing breaks on GOTO word so I added ahead lookup with length validation to fix this. Without this fix running DB migration scripts isn't possible. PR waiting for this here.

@heppu
Copy link
Copy Markdown
Author

heppu commented May 1, 2025

@microsoft-github-policy-service agree

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.60%. Comparing base (b935441) to head (53ec202).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main     #257       +/-   ##
===========================================
+ Coverage   80.70%   96.60%   +15.89%     
===========================================
  Files          35       92       +57     
  Lines        6910    74355    +67445     
===========================================
+ Hits         5577    71828    +66251     
- Misses       1063     2191     +1128     
- Partials      270      336       +66     
Flag Coverage Δ
unittests 96.53% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
batch/batch.go 90.00% <100.00%> (ø)

... and 58 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@heppu
Copy link
Copy Markdown
Author

heppu commented May 20, 2025

@shueybubbles could you take a look at this?

@apoorvdeshmukh apoorvdeshmukh requested a review from Copilot July 2, 2025 04:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in the batch splitter where the word GOTO was incorrectly recognized as a GO batch delimiter by adding a forward lookup to ensure the separator isn’t part of a larger word, and adds a corresponding test case.

  • Added a check in hasPrefixFold to reject matches where the next character is a letter.
  • Introduced a unit test to verify that GOTO isn’t split like a GO batch command.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
batch/batch.go Added a forward lookup in hasPrefixFold to ensure the next character isn’t a letter when matching.
batch/batch_test.go Added a test item verifying that GOTO doesn’t trigger a batch split at GO.
Comments suppressed due to low confidence (1)

batch/batch_test.go:70

  • [nitpick] Add a test case covering lowercase go (e.g., go vs. gotoflag) to ensure the splitting logic is case-insensitive and handles lowercase separators correctly.
		testItem{

Comment thread batch/batch.go Outdated
Comment on lines +51 to +52
if len(s) > len(sep) && unicode.IsLetter(rune(s[len(sep)])) {
return false
Copy link

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When checking the character after the prefix, use utf8.DecodeRuneInString to properly handle multi-byte runes instead of casting a single byte to a rune.

Suggested change
if len(s) > len(sep) && unicode.IsLetter(rune(s[len(sep)])) {
return false
if len(s) > len(sep) {
r, _ := utf8.DecodeRuneInString(s[len(sep):])
if unicode.IsLetter(r) {
return false
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heppu add some double byte char test cases to cover this.

@dlevy-msft-sql dlevy-msft-sql added bug Something isn't working Size: S Small issue (less than one week effort, less than 250 lines of code) Area - parameters Issues with requests for additional parameters needs-work and removed needs-work labels Jan 31, 2026
@dlevy-msft-sql dlevy-msft-sql removed bug Something isn't working Size: S Small issue (less than one week effort, less than 250 lines of code) Area - parameters Issues with requests for additional parameters labels Apr 20, 2026
Addresses review feedback on the GO/GOTO word-boundary fix:

- Use utf8.DecodeRuneInString so the follower-char letter check sees the full rune, not the leading byte of a multi-byte UTF-8 sequence. Casting rune(byte) misclassifies multi-byte runes (for example Hebrew aleph U+05D0 has leading byte 0xD7 which is the MULTIPLICATION SIGN, not a letter).

- Extend TestHasPrefixFold to cover word-boundary cases (GOTO, gotoflag, GO1, GO_FOO), Latin-1 follower, and the Hebrew-aleph case that distinguishes the two implementations.

Co-authored-by: Henri Koski <[email protected]>
@dlevy-msft-sql
Copy link
Copy Markdown

@heppu — pushed a follow-up commit to your branch addressing the review feedback so we can move this forward (you had maintainer_can_modify enabled). Happy to revert if you'd rather take it yourself.

What changed:

  • hasPrefixFold now uses utf8.DecodeRuneInString for the follower-char letter check. The rune(s[i]) cast misclassifies multi-byte UTF-8 sequences; e.g. Hebrew aleph (U+05D0) has leading byte 0xD7 which decodes as × (MULTIPLICATION SIGN, not a letter), so the old check would have allowed GO to match inside GOא....
  • Expanded TestHasPrefixFold with word-boundary cases (GOTO, gotoflag, GO1, GO_FOO), a Latin-1 follower, and the Hebrew-aleph case that distinguishes the byte-cast and rune-decode implementations.

You're credited via Co-authored-by on the new commit. @shueybubbles this should also pick up your "add some double byte char test cases" ask.

Keep both new TestBatchSplit cases: the GOTO/Bookmark case from this branch and the create-table 'gone_ts' case from microsoft#248 on main. Both exercise the word-boundary protection from different angles.
@dlevy-msft-sql
Copy link
Copy Markdown

Resolved the merge conflict with main (commit 53ec202). The conflict was test-table positional — both this branch and #248 added new TestBatchSplit cases at the same spot; kept both.

Heads-up on a redundancy that the merge surfaced: #248 (already merged to main) added its own word-boundary check in stateSep that returns stateText when the byte after the separator is a letter. This branch's check sits in hasPrefixFold and returns false so stateSep is never entered. Both reach the same outcome for the GOTO case, so we now have two defenses for the same scenario.

Two notable differences between the two defenses:

  1. Location. hasPrefixFold is the predicate; rejecting there is arguably the cleaner architecture. stateSep's check is a workaround at the consumer.
  2. Multi-byte UTF-8. stateSep uses unicode.IsLetter(rune(s[0])), which misclassifies the leading byte of a multi-byte sequence. The updated hasPrefixFold here uses utf8.DecodeRuneInString, so it correctly rejects GO followed by, say, Hebrew aleph. Without this PR's fix, stateSep's defense would let multi-byte letter followers through.

Options for getting this to land:

@shueybubbles, any preference? I'm fine with any of the three. The default-easiest path is leaving it as-is now that the merge is clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants