Skip to content

Commit ce57866

Browse files
mumiaoLuckyFBBCythia828liuxy0551HaydenOrz
authored
Sync next to main (#418)
* feat: improve errorListener msg (#281) * feat: add mysql errorListener and commonErrorListener * feat: improve other sql error msg * feat: support i18n for error msg * feat: add all sql errorMsg unit test * feat: update locale file and change i18n funtion name * test: upate error unit test * feat(flinksql): collect comment, type attribute for entity (#319) * feat(flinksql): collect comment, type attribute for entity * feat(flinksql): delete console log * fix(#305): delete function ctxToWord,using ctxToText instead of ctxToWord * feat: update attribute's type * feat(flinksql): update flinksql's entitycollect unit test * feat: optimize interface and update unit test * feat: update collect attr detail * feat: optimize interface and some function's arguments * feat: add comment and update params' name * feat: collect alias in select statement * feat: update collect attribute function and update unit test --------- Co-authored-by: zhaoge <> * fix: spell check (#337) Co-authored-by: liuyi <[email protected]> * ci: check-types and test unit update * feat: collect entity's attribute(#333) * feat(trinosql): collect trino sql's attribute(comment,alias,colType) * feat(hivesql): collect hive sql's attribute(comment,alias,colType) * feat(impalasql): collect attribute(comment, colType, alias) * feat(sparksql): collect entity's attribute (comment,alias, colType) * feat: update endContextList of collect attribute * feat(postgresql): collect hive sql's attribute(alias,colType) * feat: update interface of attrInfo and alter entitycollect ts file * feat(mysql): collect entity's attribute(comment,colType,alias) * ci: fix check-types problem --------- Co-authored-by: zhaoge <> * chore(release): 4.1.0-beta.0 * fix: #362 set hiveVar value (#369) * fix: #371 export EntityContext types (#372) * fix: minimum collect candidates boundary to fix parse performance (#378) * fix: minimum collect candidates boundary to fix parse performance * fix: fix check-types * fix: remove debugger code * fix(flink): fix flinksql syntax error about ROW and function using (#383) Co-authored-by: zhaoge <> * build: pnpm antlr4 --lang all * Feat/follow keywords (#407) * feat: provide follow keywords when get suggestions * chore: add watch script * refactor: optimize spark grammar (#360) * feat: support semantic context of isNewStatement (#361) * feat: support semantic context of isStatamentBeginning * docs: add docs for semantic context * feat: unify variables in lexer (#366) * feat: unify variables in lexer * fix: all sql use WHITE_SPACE * feat: complete after error syntax (#334) * refactor: split getMinimumParserInfo to slice input and parser again * test: complete after error syntax * feat: complete after error syntax * feat: use createParser to get parserIns and remove parserWithNewInput * feat(all sql): add all sql expression column (#358) * feat(impala): add impala expression column * feat(trino): add expression column * feat(hive): add hive expression column * feat(spark): add spark expression column * feat(mysql): add mysql expression column unit test * feat(flink): add flink expression column * feat(postgresql): add pg expression column * feat: #410 optimize processCandidates tokenIndexOffset (#411) * test: test suggestion wordRanges with range when processCandidates without tokenIndexOffset * feat: #410 optimize processCandidates tokenIndexOffset --------- Co-authored-by: 霜序 <[email protected]> Co-authored-by: XCynthia <[email protected]> Co-authored-by: 琉易 <[email protected]> Co-authored-by: liuyi <[email protected]> Co-authored-by: zhaoge <> Co-authored-by: Hayden <[email protected]> Co-authored-by: JackWang032 <[email protected]> Co-authored-by: JackWang032 <[email protected]>
1 parent 8e11012 commit ce57866

File tree

174 files changed

+42022
-38263
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

174 files changed

+42022
-38263
lines changed

README-zh_CN.md

+49
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,55 @@ console.log(sqlSlices)
356356

357357
行列号信息不是必传的,如果传了行列号信息,那么收集到的实体中,如果实体位于对应行列号所在的语句下,那么实体的所属的语句对象上会带有 `isContainCaret` 标识,这在与自动补全功能结合时,可以帮助你快速筛选出需要的实体信息。
358358

359+
360+
### 获取语义上下文信息
361+
调用 SQL 实例上的 `getSemanticContextAtCaretPosition` 方法,传入 sql 文本和指定位置的行列号, 例如:
362+
```typescript
363+
import { HiveSQL } from 'dt-sql-parser';
364+
365+
const hive = new HiveSQL();
366+
const sql = 'SELECT * FROM tb;';
367+
const pos = { lineNumber: 1, column: 18 }; // 'tb;' 的后面
368+
const semanticContext = hive.getSemanticContextAtCaretPosition(sql, pos);
369+
370+
console.log(semanticContext);
371+
```
372+
373+
*输出*
374+
375+
```typescript
376+
/*
377+
{
378+
isStatementBeginning: true,
379+
}
380+
*/
381+
```
382+
383+
目前能收集到的语义上下文信息如下,如果有更多的需求,欢迎提[issue](https://github.com/DTStack/dt-sql-parser/issues)
384+
- `isStatementBeginning` 当前输入位置是否为一条语句的开头
385+
386+
默认情况下,`isStatementBeginning` 的收集策略为`SqlSplitStrategy.STRICT`
387+
388+
有两种可选策略:
389+
- `SqlSplitStrategy.STRICT` 严格策略, 仅以语句分隔符`;`作为上一条语句结束的标识
390+
- `SqlSplitStrategy.LOOSE` 宽松策略, 以语法解析树为基础分割SQL
391+
392+
两种策略的差异:
393+
如输入SQL
394+
```sql
395+
CREATE TABLE tb (id INT)
396+
397+
SELECT
398+
```
399+
CREATE语句后未添加分号,那么当获取SELECT后的语义上下文时,
400+
`SqlSplitStrategy.STRICT`策略下`isStatementBeginning``false`, 因为CREATE语句未以分号结尾,那么会被认为这条语句尚未结束;
401+
`SqlSplitStrategy.LOOSE`策略下`isStatementBeginning``true`, 因为语法解析树中这条SQL被拆分成了CREATE独立语句与SELECT独立语句。
402+
403+
可以通过第三个`options`参数设置策略:
404+
```typescript
405+
hive.getSemanticContextAtCaretPosition(sql, pos, { splitSqlStrategy: SqlSplitStrategy.LOOSE });
406+
```
407+
359408
### 其他 API
360409

361410
- `createLexer` 创建一个 Antlr4 Lexer 实例并返回;

README.md

+51
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,57 @@ Call the `getAllEntities` method on the SQL instance, and pass in the sql text a
357357
358358
Position is not required, if the position is passed, then in the collected entities, if the entity is located under the statement where the corresponding position is located, then the statement object to which the entity belongs will be marked with `isContainCaret`, which can help you quickly filter out the required entities when combined with the code completion function.
359359
360+
### Get semantic context information
361+
362+
Call the `getSemanticContextAtCaretPosition` method on the SQL instance, passing in the sql text and the line and column numbers at the specified position, for example:
363+
364+
```typescript
365+
import { HiveSQL } from 'dt-sql-parser';
366+
367+
const hive = new HiveSQL();
368+
const sql = 'SELECT * FROM tb;';
369+
const pos = { lineNumber: 1, column: 18 }; // after 'tb;'
370+
const semanticContext = hive.getSemanticContextAtCaretPosition(sql, pos);
371+
372+
console.log(semanticContext);
373+
```
374+
375+
*output*
376+
377+
```typescript
378+
/*
379+
{
380+
isStatementBeginning: true,
381+
}
382+
*/
383+
```
384+
385+
Currently, the semantic context information that can be collected is as follows. If there are more requirements, please submit an [issue](https://github.com/DTStack/dt-sql-parser/issues).
386+
387+
- `isStatementBeginning` Whether the current input position is the beginning of a statement
388+
389+
The **default strategy** for `isStatementBeginning` is `SqlSplitStrategy.STRICT`
390+
391+
There are two optional strategies:
392+
- `SqlSplitStrategy.STRICT` Strict strategy, only the statement delimiter `;` is used as the identifier for the end of the previous statement
393+
- `SqlSplitStrategy.LOOSE` Loose strategy, based on the syntax parsing tree to split SQL
394+
395+
The difference between the two strategies:
396+
For example, if the input SQL is:
397+
```sql
398+
CREATE TABLE tb (id INT)
399+
400+
SELECT
401+
```
402+
In the `SqlSplitStrategy.STRICT` strategy, `isStatementBeginning` is `false`, because the CREATE statement is not terminated by a semicolon.
403+
404+
In the `SqlSplitStrategy.LOOSE` strategy, `isStatementBeginning` is `true`, because the syntax parsing tree splits the SQL into two independent statements: CREATE and SELECT.
405+
406+
You can set the strategy through the third `options` parameter:
407+
```typescript
408+
hive.getSemanticContextAtCaretPosition(sql, pos, { splitSqlStrategy: SqlSplitStrategy.LOOSE });
409+
```
410+
360411
### Other API
361412
362413
- `createLexer` Create an instance of Antlr4 Lexer and return it;

benchmark/benchmark.config.ts

+5
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,11 @@ const testFiles: TestFile[] = [
8080
includes: ['flink'],
8181
testTypes: ['getSuggestionAtCaretPosition'],
8282
},
83+
{
84+
name: 'Collect Semantics',
85+
sqlFileName: 'select.sql',
86+
testTypes: ['getSemanticContextAtCaretPosition'],
87+
},
8388
];
8489

8590
export default {

benchmark/data/params.json

+3
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,8 @@
66
"suggestion_flink": {
77
"getAllEntities": ["$sql", { "lineNumber": 1020, "column": 38 }],
88
"getSuggestionAtCaretPosition": ["$sql", { "lineNumber": 1020, "column": 38 }]
9+
},
10+
"select": {
11+
"getSemanticContextAtCaretPosition": ["$sql", { "lineNumber": 997, "column": 25 }]
912
}
1013
}

benchmark_reports/cold_start/flink.benchmark.md

+15-14
Original file line numberDiff line numberDiff line change
@@ -4,33 +4,34 @@
44
FlinkSQL
55

66
### Report Time
7-
2024/9/9 19:55:03
7+
2024/12/18 14:50:08
88

99
### Device
10-
macOS 14.4.1
10+
macOS 15.0.1
1111
(8) arm64 Apple M1 Pro
1212
16.00 GB
1313

1414
### Version
1515
`nodejs`: v21.6.1
16-
`dt-sql-parser`: v4.0.2
16+
`dt-sql-parser`: v4.1.0-beta.0
1717
`antlr4-c3`: v3.3.7
1818
`antlr4ng`: v2.0.11
1919

2020
### Running Mode
2121
Cold Start
2222

2323
### Report
24-
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25-
|----------------|----------------------------|--------|----------------|
26-
|Query Collection| getAllTokens | 1015 | 227 |
27-
|Query Collection| validate | 1015 | 221 |
28-
| Insert Columns | getAllTokens | 1001 | 65 |
29-
| Insert Columns | validate | 1001 | 65 |
30-
| Create Table | getAllTokens | 1004 | 27 |
31-
| Create Table | validate | 1004 | 26 |
32-
| Split SQL | splitSQLByStatement | 999 | 52 |
33-
|Collect Entities| getAllEntities | 1056 | 141 |
34-
| Suggestion |getSuggestionAtCaretPosition| 1056 | 131 |
24+
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25+
|-----------------|---------------------------------|--------|----------------|
26+
| Query Collection| getAllTokens | 1015 | 257 |
27+
| Query Collection| validate | 1015 | 277 |
28+
| Insert Columns | getAllTokens | 1001 | 66 |
29+
| Insert Columns | validate | 1001 | 67 |
30+
| Create Table | getAllTokens | 1004 | 27 |
31+
| Create Table | validate | 1004 | 28 |
32+
| Split SQL | splitSQLByStatement | 999 | 53 |
33+
| Collect Entities| getAllEntities | 1056 | 191 |
34+
| Suggestion | getSuggestionAtCaretPosition | 1056 | 185 |
35+
|Collect Semantics|getSemanticContextAtCaretPosition| 1015 | 247 |
3536

3637

benchmark_reports/cold_start/hive.benchmark.md

+17-16
Original file line numberDiff line numberDiff line change
@@ -4,35 +4,36 @@
44
HiveSQL
55

66
### Report Time
7-
2024/9/9 19:55:03
7+
2024/12/18 14:50:08
88

99
### Device
10-
macOS 14.4.1
10+
macOS 15.0.1
1111
(8) arm64 Apple M1 Pro
1212
16.00 GB
1313

1414
### Version
1515
`nodejs`: v21.6.1
16-
`dt-sql-parser`: v4.0.2
16+
`dt-sql-parser`: v4.1.0-beta.0
1717
`antlr4-c3`: v3.3.7
1818
`antlr4ng`: v2.0.11
1919

2020
### Running Mode
2121
Cold Start
2222

2323
### Report
24-
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25-
|----------------|----------------------------|--------|----------------|
26-
|Query Collection| getAllTokens | 1015 | 185 |
27-
|Query Collection| validate | 1015 | 179 |
28-
| Update Table | getAllTokens | 1011 | 112 |
29-
| Update Table | validate | 1011 | 109 |
30-
| Insert Columns | getAllTokens | 1001 | 329 |
31-
| Insert Columns | validate | 1001 | 329 |
32-
| Create Table | getAllTokens | 1002 | 21 |
33-
| Create Table | validate | 1002 | 20 |
34-
| Split SQL | splitSQLByStatement | 1001 | 72 |
35-
|Collect Entities| getAllEntities | 1066 | 106 |
36-
| Suggestion |getSuggestionAtCaretPosition| 1066 | 100 |
24+
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25+
|-----------------|---------------------------------|--------|----------------|
26+
| Query Collection| getAllTokens | 1015 | 194 |
27+
| Query Collection| validate | 1015 | 194 |
28+
| Update Table | getAllTokens | 1011 | 126 |
29+
| Update Table | validate | 1011 | 119 |
30+
| Insert Columns | getAllTokens | 1001 | 326 |
31+
| Insert Columns | validate | 1001 | 323 |
32+
| Create Table | getAllTokens | 1002 | 21 |
33+
| Create Table | validate | 1002 | 20 |
34+
| Split SQL | splitSQLByStatement | 1001 | 71 |
35+
| Collect Entities| getAllEntities | 1066 | 338 |
36+
| Suggestion | getSuggestionAtCaretPosition | 1066 | 148 |
37+
|Collect Semantics|getSemanticContextAtCaretPosition| 1015 | 201 |
3738

3839

benchmark_reports/cold_start/impala.benchmark.md

+17-16
Original file line numberDiff line numberDiff line change
@@ -4,35 +4,36 @@
44
ImpalaSQL
55

66
### Report Time
7-
2024/9/9 19:55:03
7+
2024/12/18 14:50:08
88

99
### Device
10-
macOS 14.4.1
10+
macOS 15.0.1
1111
(8) arm64 Apple M1 Pro
1212
16.00 GB
1313

1414
### Version
1515
`nodejs`: v21.6.1
16-
`dt-sql-parser`: v4.0.2
16+
`dt-sql-parser`: v4.1.0-beta.0
1717
`antlr4-c3`: v3.3.7
1818
`antlr4ng`: v2.0.11
1919

2020
### Running Mode
2121
Cold Start
2222

2323
### Report
24-
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25-
|----------------|----------------------------|--------|----------------|
26-
|Query Collection| getAllTokens | 1015 | 71 |
27-
|Query Collection| validate | 1015 | 71 |
28-
| Update Table | getAllTokens | 1011 | 113 |
29-
| Update Table | validate | 1011 | 108 |
30-
| Insert Columns | getAllTokens | 1001 | 208 |
31-
| Insert Columns | validate | 1001 | 213 |
32-
| Create Table | getAllTokens | 1002 | 23 |
33-
| Create Table | validate | 1002 | 23 |
34-
| Split SQL | splitSQLByStatement | 1001 | 65 |
35-
|Collect Entities| getAllEntities | 1066 | 82 |
36-
| Suggestion |getSuggestionAtCaretPosition| 1066 | 83 |
24+
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25+
|-----------------|---------------------------------|--------|----------------|
26+
| Query Collection| getAllTokens | 1015 | 77 |
27+
| Query Collection| validate | 1015 | 72 |
28+
| Update Table | getAllTokens | 1011 | 120 |
29+
| Update Table | validate | 1011 | 121 |
30+
| Insert Columns | getAllTokens | 1001 | 218 |
31+
| Insert Columns | validate | 1001 | 217 |
32+
| Create Table | getAllTokens | 1002 | 25 |
33+
| Create Table | validate | 1002 | 25 |
34+
| Split SQL | splitSQLByStatement | 1001 | 67 |
35+
| Collect Entities| getAllEntities | 1066 | 93 |
36+
| Suggestion | getSuggestionAtCaretPosition | 1066 | 101 |
37+
|Collect Semantics|getSemanticContextAtCaretPosition| 1015 | 80 |
3738

3839

benchmark_reports/cold_start/mysql.benchmark.md

+17-16
Original file line numberDiff line numberDiff line change
@@ -4,35 +4,36 @@
44
MySQL
55

66
### Report Time
7-
2024/9/9 19:55:03
7+
2024/12/18 14:50:08
88

99
### Device
10-
macOS 14.4.1
10+
macOS 15.0.1
1111
(8) arm64 Apple M1 Pro
1212
16.00 GB
1313

1414
### Version
1515
`nodejs`: v21.6.1
16-
`dt-sql-parser`: v4.0.2
16+
`dt-sql-parser`: v4.1.0-beta.0
1717
`antlr4-c3`: v3.3.7
1818
`antlr4ng`: v2.0.11
1919

2020
### Running Mode
2121
Cold Start
2222

2323
### Report
24-
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25-
|----------------|----------------------------|--------|----------------|
26-
|Query Collection| getAllTokens | 1015 | 1281 |
27-
|Query Collection| validate | 1015 | 1254 |
28-
| Update Table | getAllTokens | 1011 | 876 |
29-
| Update Table | validate | 1011 | 842 |
30-
| Insert Columns | getAllTokens | 1001 | 261 |
31-
| Insert Columns | validate | 1001 | 266 |
32-
| Create Table | getAllTokens | 1002 | 48 |
33-
| Create Table | validate | 1002 | 45 |
34-
| Split SQL | splitSQLByStatement | 1001 | 287 |
35-
|Collect Entities| getAllEntities | 1066 | 474 |
36-
| Suggestion |getSuggestionAtCaretPosition| 1066 | 462 |
24+
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25+
|-----------------|---------------------------------|--------|----------------|
26+
| Query Collection| getAllTokens | 1015 | 1339 |
27+
| Query Collection| validate | 1015 | 1305 |
28+
| Update Table | getAllTokens | 1011 | 860 |
29+
| Update Table | validate | 1011 | 898 |
30+
| Insert Columns | getAllTokens | 1001 | 282 |
31+
| Insert Columns | validate | 1001 | 284 |
32+
| Create Table | getAllTokens | 1002 | 48 |
33+
| Create Table | validate | 1002 | 50 |
34+
| Split SQL | splitSQLByStatement | 1001 | 305 |
35+
| Collect Entities| getAllEntities | 1066 | 653 |
36+
| Suggestion | getSuggestionAtCaretPosition | 1066 | 637 |
37+
|Collect Semantics|getSemanticContextAtCaretPosition| 1015 | 1418 |
3738

3839

benchmark_reports/cold_start/postgresql.benchmark.md

+17-16
Original file line numberDiff line numberDiff line change
@@ -4,35 +4,36 @@
44
PostgreSQL
55

66
### Report Time
7-
2024/9/9 19:55:03
7+
2024/12/18 14:50:08
88

99
### Device
10-
macOS 14.4.1
10+
macOS 15.0.1
1111
(8) arm64 Apple M1 Pro
1212
16.00 GB
1313

1414
### Version
1515
`nodejs`: v21.6.1
16-
`dt-sql-parser`: v4.0.2
16+
`dt-sql-parser`: v4.1.0-beta.0
1717
`antlr4-c3`: v3.3.7
1818
`antlr4ng`: v2.0.11
1919

2020
### Running Mode
2121
Cold Start
2222

2323
### Report
24-
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25-
|----------------|----------------------------|--------|----------------|
26-
|Query Collection| getAllTokens | 1015 | 1086 |
27-
|Query Collection| validate | 1015 | 1078 |
28-
| Update Table | getAllTokens | 1011 | 1193 |
29-
| Update Table | validate | 1011 | 1183 |
30-
| Insert Columns | getAllTokens | 1001 | 539 |
31-
| Insert Columns | validate | 1001 | 565 |
32-
| Create Table | getAllTokens | 1002 | 294 |
33-
| Create Table | validate | 1002 | 275 |
34-
| Split SQL | splitSQLByStatement | 1001 | 597 |
35-
|Collect Entities| getAllEntities | 1066 | 797 |
36-
| Suggestion |getSuggestionAtCaretPosition| 1066 | 776 |
24+
| Benchmark Name | Method Name |SQL Rows|Average Time(ms)|
25+
|-----------------|---------------------------------|--------|----------------|
26+
| Query Collection| getAllTokens | 1015 | 1008 |
27+
| Query Collection| validate | 1015 | 955 |
28+
| Update Table | getAllTokens | 1011 | 941 |
29+
| Update Table | validate | 1011 | 936 |
30+
| Insert Columns | getAllTokens | 1001 | 534 |
31+
| Insert Columns | validate | 1001 | 547 |
32+
| Create Table | getAllTokens | 1002 | 288 |
33+
| Create Table | validate | 1002 | 288 |
34+
| Split SQL | splitSQLByStatement | 1001 | 522 |
35+
| Collect Entities| getAllEntities | 1066 | 744 |
36+
| Suggestion | getSuggestionAtCaretPosition | 1066 | 719 |
37+
|Collect Semantics|getSemanticContextAtCaretPosition| 1015 | 941 |
3738

3839

0 commit comments

Comments
 (0)