Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure all internal lists do not rely on keywords order #117

Merged
merged 2 commits into from
May 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 49 additions & 46 deletions src/Tokenizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,8 @@ final class Tokenizer
'ENCLOSED',
'END',
'ENGINE',
'ENGINE_TYPE',
'ENGINES',
'ENGINE_TYPE',
'ESCAPE',
'ESCAPED',
'EVENTS',
Expand All @@ -111,9 +111,9 @@ final class Tokenizer
'FIRST',
'FIXED',
'FLUSH',
'FOLLOWING',
'FOR',
'FORCE',
'FOLLOWING',
'FOREIGN',
'FULL',
'FULLTEXT',
Expand Down Expand Up @@ -187,12 +187,12 @@ final class Tokenizer
'NULL',
'OFFSET',
'ON',
'ON DELETE',
'ON UPDATE',
'OPEN',
'OPTIMIZE',
'OPTION',
'OPTIONALLY',
'ON UPDATE',
'ON DELETE',
'OUTFILE',
'OVER',
'PACK_KEYS',
Expand All @@ -209,11 +209,11 @@ final class Tokenizer
'PROCESSLIST',
'PURGE',
'QUICK',
'RANGE',
'RAID0',
'RAID_CHUNKS',
'RAID_CHUNKSIZE',
'RAID_TYPE',
'RANGE',
'READ',
'READ_ONLY',
'READ_WRITE',
Expand Down Expand Up @@ -254,20 +254,20 @@ final class Tokenizer
'SQL_BIG_SELECTS',
'SQL_BIG_TABLES',
'SQL_BUFFER_RESULT',
'SQL_CACHE',
'SQL_CALC_FOUND_ROWS',
'SQL_LOG_BIN',
'SQL_LOG_OFF',
'SQL_LOG_UPDATE',
'SQL_LOW_PRIORITY_UPDATES',
'SQL_MAX_JOIN_SIZE',
'SQL_NO_CACHE',
'SQL_QUOTE_SHOW_CREATE',
'SQL_SAFE_UPDATES',
'SQL_SELECT_LIMIT',
'SQL_SLAVE_SKIP_COUNTER',
'SQL_SMALL_RESULT',
'SQL_WARNINGS',
'SQL_CACHE',
'SQL_NO_CACHE',
'START',
'STARTING',
'STATUS',
Expand Down Expand Up @@ -314,47 +314,47 @@ final class Tokenizer
* @var list<string>
*/
private array $reservedToplevel = [
'WITH',
'SELECT',
'FROM',
'WHERE',
'SET',
'ORDER BY',
'GROUP BY',
'LIMIT',
'DROP',
'VALUES',
'UPDATE',
'HAVING',
'ADD',
'CHANGE',
'MODIFY',
'ALTER TABLE',
'CHANGE',
'DELETE FROM',
'UNION ALL',
'UNION',
'DROP',
'EXCEPT',
'FROM',
'GROUP BY',
'GROUPS',
'HAVING',
'INTERSECT',
'LIMIT',
'MODIFY',
'ORDER BY',
'PARTITION BY',
'ROWS',
'RANGE',
'GROUPS',
'ROWS',
'SELECT',
'SET',
'UNION',
'UNION ALL',
'UPDATE',
'VALUES',
'WHERE',
'WINDOW',
'WITH',
];

/** @var list<string> */
private array $reservedNewline = [
'LEFT OUTER JOIN',
'RIGHT OUTER JOIN',
'LEFT JOIN',
'RIGHT JOIN',
'OUTER JOIN',
'AND',
'EXCLUDE',
'INNER JOIN',
'JOIN',
'XOR',
'LEFT JOIN',
'LEFT OUTER JOIN',
'OR',
'AND',
'EXCLUDE',
'OUTER JOIN',
'RIGHT JOIN',
'RIGHT OUTER JOIN',
'XOR',
];

/** @var list<string> */
Expand Down Expand Up @@ -575,9 +575,9 @@ final class Tokenizer
'ORD',
'OVERLAPS',
'PASSWORD',
'PERCENT_RANK',
'PERCENTILE_CONT',
'PERCENTILE_DISC',
'PERCENT_RANK',
'PERIOD_ADD',
'PERIOD_DIFF',
'PI',
Expand Down Expand Up @@ -625,13 +625,13 @@ final class Tokenizer
'SRID',
'STARTPOINT',
'STD',
'STDEV',
'STDEVP',
'STDDEV',
'STDDEV_POP',
'STDDEV_SAMP',
'STRING_AGG',
'STDEV',
'STDEVP',
'STRCMP',
'STRING_AGG',
'STR_TO_DATE',
'SUBDATE',
'SUBSTR',
Expand Down Expand Up @@ -725,11 +725,14 @@ final class Tokenizer
*/
public function __construct()
{
// Sort reserved word list from longest word to shortest, 3x faster than usort
$reservedMap = array_combine($this->reserved, array_map(strlen(...), $this->reserved));
assert($reservedMap !== false);
arsort($reservedMap);
$this->reserved = array_keys($reservedMap);
// Sort list from longest word to shortest, 3x faster than usort
$sortByLengthFx = static function ($values) {
$valuesMap = array_combine($values, array_map(strlen(...), $values));
assert($valuesMap !== false);
arsort($valuesMap);

return array_keys($valuesMap);
};

// Set up regular expressions
$this->regexBoundaries = '(' . implode(
Expand All @@ -738,18 +741,18 @@ public function __construct()
) . ')';
$this->regexReserved = '(' . implode(
'|',
$this->quoteRegex($this->reserved),
$this->quoteRegex($sortByLengthFx($this->reserved)),
) . ')';
$this->regexReservedToplevel = str_replace(' ', '\\s+', '(' . implode(
'|',
$this->quoteRegex($this->reservedToplevel),
$this->quoteRegex($sortByLengthFx($this->reservedToplevel)),
) . ')');
$this->regexReservedNewline = str_replace(' ', '\\s+', '(' . implode(
'|',
$this->quoteRegex($this->reservedNewline),
$this->quoteRegex($sortByLengthFx($this->reservedNewline)),
) . ')');

$this->regexFunction = '(' . implode('|', $this->quoteRegex($this->functions)) . ')';
$this->regexFunction = '(' . implode('|', $this->quoteRegex($sortByLengthFx($this->functions))) . ')';
}

/**
Expand Down
34 changes: 34 additions & 0 deletions tests/TokenizerTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,43 @@
use Doctrine\SqlFormatter\Tokenizer;
use PHPUnit\Framework\Attributes\DoesNotPerformAssertions;
use PHPUnit\Framework\TestCase;
use ReflectionClass;

use function sort;

final class TokenizerTest extends TestCase
{
/**
* @param 'reserved'|'reservedToplevel'|'reservedNewline'|'functions' $propertyName
*
* @return list<string>
*/
private function getTokenizerList(string $propertyName): array
{
$tokenizerReflClass = new ReflectionClass(Tokenizer::class);
/** @var list<string> $res */
$res = $tokenizerReflClass->getProperty($propertyName)->getDefaultValue();

return $res;
}

public function testInternalKeywordListsAreSortedForEasierMaintenance(): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that this should be a code style check rather than an unit test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you know how, then great and definitely tell me, if not, let's move with this PR. In #106 I need to add many keywords and I want to make sure the git diff is minimal for now and long term.

{
foreach (
[
$this->getTokenizerList('reserved'),
$this->getTokenizerList('reservedToplevel'),
$this->getTokenizerList('reservedNewline'),
$this->getTokenizerList('functions'),
] as $list
) {
$listSorted = $list;
sort($listSorted);

self::assertSame($listSorted, $list);
}
}

#[DoesNotPerformAssertions]
public function testThereAreNoRegressions(): void
{
Expand Down