Skip to content

Latest commit

Β 

History

History
1554 lines (1087 loc) Β· 31 KB

File metadata and controls

1554 lines (1087 loc) Β· 31 KB

Plan

For the future of Squawk

General

PL/pgSQL strings

parse & lint in things like function definitions

should support all the features of normal SQL

src/pl/plpgsql/src/pl_gram.y

Notebook / Juypter

linter and IDE support

Embedded SQL Strings

support SQL embedded in other languages

Psql Meta Commands

parse and warn, helps with copy pasting examples

https://www.enterprisedb.com/blog/psqls-scripting-language-turing-complete-or-fibonacci-psql

Other Dialects

support Trino, BigQuery, Aurora DSLQ, etc.

Validations

Check format() calls

https://www.postgresql.org/docs/18/functions-string.html#FUNCTIONS-STRING-FORMAT

SELECT format('Hello %s', 'World');
-- ok

SELECT format('Hello %s %s', 'World');
-- error                ^^

Warn about invalid column notation

with t as (select 1 as data)
select (data).id from t;

postgres says:

Query 1 ERROR at Line 40: : ERROR:  column notation .id applied to type integer, which is not a composite type
LINE 2: select (data).id from t;
                ^

Warn about invalid missing column notation

create type f as (
  id integer,
  name text
);
with t as (select (1, 'a')::f as data)
select data.id from t;

postgres says:

Query 1 ERROR at Line 41: : ERROR:  missing FROM-clause entry for table "data"
LINE 2: select data.id from t;
               ^

we should suggest an autofix to:

with t as (select (1, 'a')::f as data)
select (data).id from t;

Formatter

squawk format

example:

-- some query comment
SELECT name username, "email",
        "weird-column-name"
    FROM    bar

becomes

-- some query comment
select name username, email, "weird-column-name" from bar

config ideas:

  • lower / upper case keywords (default lowercase)
  • indent (default 2)
  • quoted idents (default avoid)
  • aliases (include as or not)

links:

LSIF/SCIP

Benchmark & Profile Parser

https://www.dolthub.com/blog/2024-10-10-yacc-union-types/

sql for benchmarks maybe?

CLI

from deno

  • check command for type checking
  • lsp for language server
  • lint for linter

Type Checker

https://www.postgresql.org/docs/17/datatype-pseudo.html

context aware errors

type checking should allow us to give type errors without the user having to run the query

support more advanced lint rules

type check create type t as range

create type timerange as range (
  subtype = time,
  subtype_diff = time_subtype_diff
);

if set, subtype_diff must be of type function time_subtype_diff(time without time zone, time without time zone)

function createRangeType<T>(subtype: T, subtype_diff: (T, T) => float8)

https://www.postgresql.org/docs/current/rangetypes.html#RANGETYPES-DEFINING

PGO

Linter

Support for auto fixes on the command line and in an IDE for rules where it makes sense.

Rule: validating json path expressions

Make sure they've valid and won't error at runtime

https://www.postgresql.org/docs/17/functions-json.html#FUNCTIONS-SQLJSON-PATH-OPERATORS

Rule: limit-missing-order-by

https://www.postgresql.org/docs/17/queries-limit.html

warn about missing order by with limit

not important for a lot of queries, like:

select * from t limit 10;

Rule: simple array types

foo array[]

-- becomes

foo[]

Rule: simple alter column

alter table t alter column c;

-- becomes

alter table t alter c;
alter table t
  alter column c set data type int;

-- becomes

alter table t
  alter column c type int;

Rule: casing

select x as Foo from t;
--          ^^^ casing: `Foo` will get parsed as `foo`.

Autofixes to

select x as "Foo" from t;

Rule: pointless cascade / restrict

These key words do not have any effect, since there are no dependencies on $name.

https://www.postgresql.org/docs/17/sql-dropcast.html

Applies to:

  • drop cast
  • drop conversion
  • drop policy
  • drop publication
  • drop statistics
  • drop subscription

Rule: unnecessary casts

select a::int from t;
--     ^^^^^^ pointless cast, `a` is already an `int`

Rule: simple joins

Inner join

select * from t inner join u using (id);

-- becomes:

select * from t join u using (id);

Left join

select * from t left outer join u using (id);

-- becomes:

select * from t left join u using (id);

Right join

select * from t right outer join u using (id);

-- becomes:

select * from t right join u using (id);

Full join

select * from t full outer join u using (id);

-- becomes:

select * from t full join u using (id);

Rule: british spellings

   analyse foo.bar(a, b);
-- ^^^^^^^ Prefer analyze

Rule: unsupported sequence function

select c from unnest(sequence(1, 10, 2)) as t(c);
--                   ^^^^^^^^ unknown function `sequence`, did you mean `generate_series`?

-- quick fix gives:

select c from generate_series(1, 10, 2) as t(c);

Trino has sequence instead of generate_series.

Rule: unsupported suffixed literal

set session query_max_run_time = 10m;

Trino has support for suffixed literals

Running this in Postgres gives you:

Query 1 ERROR at Line 1: : ERROR:  trailing junk after numeric literal at or near "10m"
LINE 1: set session query_max_run_time = 10m;
                                         ^

Rule: cross join

select * from t join u on true;
select * from t1, t2;

suggests / autofixes to:

select * from t cross join u;
select * from t1 cross join t2;

with config to change desired destination format

Rule: natural join

warn about natural joins and autofix to the equivalent

select * from t natural join u;

suggests / autofixes to:

select * from t join u using (id, name, ip, description, meta);

also offer an autofix back to natural join:

select * from t natural join u;

Rule: using unsupported lambdas

This actually parsers in Postgres, but could work off a heuristic

select array_filter(
  array[1, 2, 2, 3],
  e -> (e % 2) = 0
--^^^^^^^^^^^^^^^^ Rule: lambdas aren't support in postgres
);

-- suggest

select (
  select coalesce(
    array_agg(e),
    array[]::int[]
  )
  from unnest(array[1, 2, 2, 3]) t(e)
  where (e % 2) = 0
);

https://blog.jooq.org/when-sql-meets-lambda-expressions/

Rule: friendly sql

https://duckdb.org/docs/stable/sql/dialect/friendly_sql

Parse friendly sql syntax, like leading from clause, and warn with an autofix to convert to valid postgres syntax.

Rule: values to scalars

SELECT * FROM machines
WHERE ip_address IN (VALUES('192.168.0.1'::inet), ('192.168.0.10'), ('192.168.1.43'));

becomes:

SELECT * FROM machines
WHERE ip_address IN ('192.168.0.1', '192.168.0.10', '192.168.1.43');

via: https://www.postgresql.org/docs/17/sql-values.html

Tip: For simple IN tests, it's better to rely on the list-of-scalars form of IN than to write a VALUES query as shown above. The list of scalars method requires less writing and is often more efficient.

example to test this:

with t(name) as (select '1 month'::interval)
select count(*) from t where t.name in ('1 month', '2 month')
-- type checks!

Rule: no-unncesssary-parens

select (((x * 2)));
-- becomes
select x * 2;
-- ok, indexes on expressions require extra parens
create index foo on t((1));

-- ok
select (x * 2) + 4;

related: https://eslint.style/rules/no-extra-parens

Rule: no-constant-condition

select a from t where false = true;
--                    ^^^^^^^^^^^^ constant condition

select case when 1 = 2 then 2 else 3 end;
--               ^^^^^ constant condition

related: https://eslint.org/docs/latest/rules/no-constant-condition

Rule: with query missing returning clause

create table t(a int, b int);
create table u(a int, b int);
with x as (merge into t
  using u on true
  when matched then do nothing)
select * from x;
-- Query 1 ERROR at Line 19: : ERROR:  WITH query "x" does not have a RETURNING clause
-- LINE 6:     select * from x;

Rule: dialect: now() to dest

should support various fixes so people can write in one dialect of SQL and have it easily convert to the other one

Rule: group by all

requires pg >=19

SELECT customer_id, merchant_id, request_id, count(*) from t
group by 1, 2, 3;
--       ^^^^^^^ implicit group by all

becomes:

SELECT customer_id, merchant_id, request_id, count(*) from t
group by all;

and an action to go backwards:

SELECT customer_id, merchant_id, request_id, count(*) from t
group by all;
--        ^action:expand-group-by-all-names

becomes:

SELECT customer_id, merchant_id, request_id, count(*) from t
group by customer_id, merchant_id, request_id;

or with column numbers:

SELECT customer_id, merchant_id, request_id, count(*) from t
group by all;
--        ^action:expand-group-by-all-numbers

becomes:

SELECT customer_id, merchant_id, request_id, count(*) from t
group by 1, 2, 3;

Rule: missing column in group by

select a, b from t
--        ^ b must appear in group by, quick fix:
group by a;

-- becomes

select a, b from t
group by a, b;

Rule: unresolved column

create function foo(a int, b int) returns int
as 'select $0'
--         ^^ unresolved column, did you mean `a` or `b`?
language sql;

Rule: unused column

SELECT customer_id, total_amount
FROM (
  SELECT customer_id, SUM(amount) AS total_amount, min(created_at)
--                                                 ^^^^^^^^^^^^^^^ unused
  FROM orders
  GROUP BY customer_id
) AS customer_totals
WHERE total_amount > 1000;

Rule: implicit column name in references

create table u(
  id int primary key
);
create table t(
  u_id int references u
--                    ^quickfix
);

-- becomes

create table t(
  u_id int references u(id)
);

and vice versa

Rule: aggregate free having condition

select a from t group by a having a > 10;
--                                ^^^^^^

-- quick fix to:
select a from t where a > 10 group by a;

Rule: conflicting function and aggregate definitions

create function foo(int) returns int as $$
  select $1 * 2;
$$ language sql;

create aggregate foo(int) (
  sfunc = int4pl,
  stype = int,
  initcond = '0'
);
-- Query 1 ERROR at Line 1: : ERROR:  function "foo" already exists with same argument types

Rule: order direction is redundent

select * from t order by a asc;
--                         ^^^ order direction is redundent. asc is the default.

-- quick fix to:
select * from t order by a;

Rule: sum(boolean) to case stmt

with t(x) as (select 1, 2, 3)
select sum(x > 1) from t;
Query 1 ERROR at Line 2: : ERROR:  function sum(boolean) does not exist
LINE 2: select sum(x  > 1) from t;
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

should instead offer an autofix to change it to a case statement:

with t(x) as (select 1, 2, 3)
select sum(case when x > 1 then 1 else 0 end) from t;

rel: https://duckdb.org/2025/03/06/gems-of-duckdb-1-2.html

Rule: missing column in group by

suggest using an aggregate or grouping by

Rule: field does not exist

Rule: table does not exist

Rule: ambiguous column name

Provide options to select from in quick fix

create table t (t_id int, name text, created timestampz);
create table u (u_id int, t_id int, created timestampz);

select name, created from u join t using (t_id);
--           ^^^^^^^ error: ambiguous column name `created`, prefix with either `t` or `u`
--                   action: Prefix with...
--                            Prefix with `t`
--                            Prefix with `u`

-- gives

select name, u.created from u join t using (t_id);

Rule: invalid literal

Check for invalid literals

select '{"foo": 1,}'::json;
--               ^ invalid json, unexpected trailing comma
create type foo as (
  a int8,
  b int8
);
select foo '(1)';
--            ^ malformed record literal, missing column b
select '[1,,2)'::numrange;
--         ^ malformed range literal, extra comma

create table reservation (room int, during tsrange);
insert into reservation values
    (1108, '[2010-01-01 14:30,, 2010-01-01 15:30)');
--                            ^ malformed range literal, extra comma

Rule: column label is the same as an existing column

create table t (
    a int,
    b int,
    c int
);

select a, b c from t;
--          ^warn: column label takes the same name as a table column

Rule: exists to count equal 0

select u.* from users u
where 0 = (select count(*) from addresses a where a.user_id = u.id);
--    ^^^warn: NOT EXISTS instead of comparing against zero

-- instead:

select u.* from users u
where NOT EXISTS (select from addresses a where a.user_id = u.id);

via: https://www.depesz.com/2024/12/01/sql-best-practices-dont-compare-count-with-0/

Rule: unnamed columns in view

via: https://www.postgresql.org/docs/17/sql-createview.html

CREATE VIEW vista AS SELECT 'Hello World';

is bad form because the column name defaults to ?column?; also, the column data type defaults to text, which might not be what you wanted. Better style for a string literal in a view's result is something like:

CREATE VIEW vista AS SELECT text 'Hello World' AS hello;

Rule: drop role to drop group

DROP GROUP is now an alias for DROP ROLE.

https://www.postgresql.org/docs/17/sql-dropgroup.html

Rule: select into to create table as

SELECT * INTO films_recent FROM films WHERE date_prod >= '2002-01-01';

-- becomes

CREATE TABLE films_recent AS
SELECT * FROM films WHERE date_prod >= '2002-01-01';

CREATE TABLE AS is functionally similar to SELECT INTO. CREATE TABLE AS is the recommended syntax, since this form of SELECT INTO is not available in ECPG or PL/pgSQL, because they interpret the INTO clause differently. Furthermore, CREATE TABLE AS offers a superset of the functionality provided by SELECT INTO.

In contrast to CREATE TABLE AS, SELECT INTO does not allow specifying properties like a table's access method with USING method or the table's tablespace with TABLESPACE tablespace_name. Use CREATE TABLE AS if necessary. Therefore, the default table access method is chosen for the new table. See default_table_access_method for more information.

via https://www.postgresql.org/docs/17/sql-selectinto.html

Rule: delete without a where clause

Rule: overflow warnings

https://sql-info.de/postgresql/postgres-gotchas.html#1_4

select 256 * 256 * 256 * 256;
-- Query 1 ERROR at Line 1: : ERROR:  integer out of range
select 256::int8 * 256 * 256 * 256;
-- ?column?
-- 4294967296

IDE

Find References

https://rust-analyzer.github.io/blog/2019/11/13/find-usages.html

Go to Definition

Column / table

select name username, email, "weird-column-name" from bar
--      ^$ option+click

navigates to schema definition file or if setup, navigates to the the source code from a user provided command

Function

select json_array_length('["foo", "bar", "buzz"]');
--      ^$ option+click

for builtins, we return the same stuff as:

select pg_get_functiondef('pg_catalog.json_array_length'::regproc)

so in this case, we'd get:

CREATE OR REPLACE FUNCTION pg_catalog.json_array_length(json)
 RETURNS integer
 LANGUAGE internal
 IMMUTABLE PARALLEL SAFE STRICT
AS $function$json_array_length$function$

maybe simplify to:

create function pg_catalog.json_array_length(
  json
) returns integer
  language internal immutable parallel safe strict
  as $$json_array_length$$

non-builtin:

select inc(10)
--     ^$ option+click

gives:

create function inc(int) returns int
  language sql immutable
  as 'select $1 + 1';

if we call that postgres function to get the def we get:

CREATE OR REPLACE FUNCTION public.inc(integer)
 RETURNS integer
 LANGUAGE sql
 IMMUTABLE
AS $function$SELECT $1 + 1$function$
create function pg_catalog.jsonb_extract_path(
  from_json jsonb,
  variadic path_elems text[]
) returns jsonb
  language internal immutable parallel safe strict
  as $$jsonb_extract_path$$

Json path expressions

select * from json_table(
  jsonb '[1,2,3]',
  '$[*] ? (@ < $x)'
  --           ^^ go to def jumps to the def in the passing clause
  passing 10 as x, 3 as y
  --                    ^ find references
  --            ^ find references
  columns (a text format json path '$ ? (@ < $y)')
  --      references json_table arg ^
  --          also references passing clause ^^
);

nextval

-- via https://www.postgresql.org/docs/current/datatype-oid.html
nextval('foo')              -- operates on sequence foo
nextval('FOO')              -- same as above
nextval('"Foo"')            -- operates on sequence Foo
nextval('myschema.foo')     -- operates on myschema.foo
nextval('"myschema".foo')   -- same as above
nextval('foo')              -- searches search path for foo

Autocomplete

Joins

select id, email, created_at from users
join login_att
--            ^$:suggest

suggests

select id, email, created_at from users
join login_attempts on login_attempts.user_id = users.id

or maybe it does the snippet style thing:

select id, email, created_at from users
join login_attempts on login_attempts.$0 = $1

data grips has nice snippet support: https://www.jetbrains.com/datagrip/features/editor.html#live

CTEs

We should make writing and using CTEs easy since they're common in analytical queries

WITH customer_totals AS (
  SELECT customer_id, SUM(amount) AS total_amount
  FROM orders
  GROUP BY customer_id
)
SELECT c.customer_id, c.total_amount
FROM customer_totals c
WHERE c.total_amount > 1000;

Hover Info

Column

create table bar (name varchar(255), email text, "weird-column-name" boolean);
select name username, email, "weird-column-name" from bar;
--      ^$ hover

gives:

-- size = 24 (0x18)
name: nullable varchar(255)
--------------------------------------
The name of the user in the system.

rust analyzer gives:

squawk_parser::event::Event
Error { pub(crate) msg: String, }
size = 24 (0x18), align = 0x8, needs Drop

other fields?

Table

another example:

select name username, email, "weird-column-name" from bar
--                                                     ^$ hover

gives:

-- size = 48 (0x30)
-- storage = 150 million rows, 500GB
create table user (
  id integer seq_123123,
  name varchar(255),
  email varchar(125)
)
--------------------------------
Users of the platform.

Function

another example:

SELECT customer_id, SUM(amount) AS total_amount, min(created_at)
--                  ^$ hover
FROM orders
-- sum of expression across all input values
sum(expression smallint | int) -> bigint
sum(expression bigint) -> numeric
sum(expression double precision) -> double precision
sum(expression real) -> real
sum(expression interval) -> interval
sum(expression numeric) -> numeric

unnset

select * from unnest(ARRAY[1,2], ARRAY['foo','bar','baz']);
--            ^$ hover
-- Expands multiple arrays (possibly of different data types) into a set of rows. If the arrays are not all the same length then the shorter ones are padded with NULLs. This form is only allowed in a query's FROM clause;
unnest(anyarray, anyarray [, ... ] ) -> setof anyelement, anyelement [, ... ]
select * from unnest(ARRAY[1,2]);
--            ^$ hover
-- Expands an array into a set of rows. The array's elements are read out in storage order.
unnest(anyarray) -> setof anyelement

Column Number

another example:

SELECT customer_id, sum(cost) from cpus
group by 1;
--       ^$ hover
field: cpus.customer_id
type: string
-- id of the customer that rents the CPU

Star

case expr

select * from (select case
--     ^$ hover
  when random() > 0.5 then
  	true
  else
    false
  end)
case  boolean

unnest

select * from unnest(ARRAY[1,2], ARRAY['foo','bar','baz']);
--     ^$ hover
unnest integer,
unnest text

Semantic Syntax Highlighting

https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide

VSCode Syntax Highlighting

Aka a non-semantic version

Monaco Support

  • Monaco Syntax Highlighting

Codemirror Support

Show Syntax Tree Command

replicate what we have in WASM in IDE

also show lex command

Snippets

Quick Fix: alias query

select * from bar
--              ^$ action:rename-alias

-- becomes after filling in alias name with `b`

select b.* from bar b

another example:

select name, email from bar
--                       ^$ action:rename-alias

-- becomes after filling in alias name with `b`

select b.name, b.email from bar
select * from (
  select a, b from t where x > 10
);
-- ^introduce alias

-- becomes

select * from (
  select a, b from t where x > 10
) as foo;

should prompt for table name for each entry when there is an ambiguous column

related:

Quick Fix: introduce table alias

select t.a from t;
-- becomes (with user input for value)
select t2.a from t t2;

Quick Fix: flip $op

select 1 > 2;
--     ^ flip >
select 1 < 2;

select 1 = 2;
--     ^ flip =
select 2 = 1;

-- etc, etc.

Quick Fix: flip join (may change semantics)

select * from t join u using (u_id);
-- becomes
select * from u join t using (u_id);

Quick Fix: replace equality checks with in expression

select * from t where a = 1 or a = 2;
-- becomes
select * from t where a in (1, 2);

Quick Fix: replace using with on

select * from t join u using (g_id);
select * from t join u on t.g_id = u.g_id;

Quick Fix: table to select

table t;
-- ^$ action: convert to select

select * from t;
-- ^$ action: convert to table

Quick Fix: wrap side-effect explain analyze in a transaction

explain analyze insert into t values (1);
-- ^$ action: wrap possible side effect in a transaction

-- becomes

begin;
explain analyze insert into t values (1);
rollback;

Important

Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on an INSERT, UPDATE, DELETE, MERGE, CREATE TABLE AS, or EXECUTE statement without letting the command affect your data, use this approach:

BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

via: https://www.postgresql.org/docs/17/sql-explain.html

Quick Fix: expand star

select * from bar
--     ^$ action:expand

becomes

select bar.name, bar.email, bar.buzz, bar.foo, bar."weird-column-name" from bar

or maybe

select name, email, buzz, foo, "weird-column-name" from bar

related:

Quick Fix: create table definition

select a, b from t;
--               ^ unknown table, quick fix: create table
-- becomes
create table t (a int not null, b int not null);
select a, b from t;

We should have markers so you can quickly tab through and fill in the type and nullability for each field.

Quick Fix: field rename

select name, email, buzz, foo, "weird-column-name" from bar
--      ^$ action:rename

becomes:

select name as username, email, buzz, foo, "weird-column-name" from bar

or maybe:

select name username, email, buzz, foo, "weird-column-name" from bar

Quick Fix: strings and quoted idents

if the quoted ident doesn't exist, the user probably meant to use single quotes to create a string literal. Add an auto fix suggestion.

select foo, "a" from t;
--          ^^^ unknown column `"a"`, did you mean to convert this to write a string literal?
--          ^^^ Quick Fix: convert to string literal

-- gives

select foo, 'a' from t;

Quick Fix: Quote and Unquote columns, tables, etc.

select "x" from "t";
--     ^ Quick Fix: unquote

gives

select x from "t";

and vice versa:

select x from "t";
--     ^ Quick Fix: quote

gives:

select "x" from "t";

Note: there's some gotchas with this that we need to handle:

-- okay
with t("X") as (select 1)
select "X" from t;

-- err
with t("X") as (select 1)
select X from t;

-- err
with t("X") as (select 1)
select x from t;

or invalid column names:

-- ok
with t("a-b") as (select 1)
select "a-b" from t;

-- err
with t("a-b") as (select 1)
select a-b from t;
-- Query 1 ERROR at Line 2: ERROR:  column "a" does not exist
-- LINE 2: select a-b from t;

Quick Fix: in array

select 1 in [1,2,3];
-- or
select 1 in array[1,2,3];

-- become

select 1 = any(array[1,2,3]);

Quick Fix: in array to in tuple

select 1 in [1,2,3];
-- or
select 1 in array[1,2,3];

-- become

select 1 in (1,2,3);

Quick Fix: subquery to CTE

select * from (select 1);
-- becomes (with user input for value)
with t as (select 1)
select * from t;

Quick Fix: inline CTE

with t as (select 1)
select * from t;
-- becomes
select * from (select 1) as t;

with t(a) as (select 1)
select * from t;
-- becomes
select * from (select 1) as t(a);

Quick Fix: qualify identifier

create table foo.t(a int);
select a from t;
--     ^?
-- becomes
select t.a from t;
create table t(a int);
select a from t;
--            ^?
-- becomes
select a from public.t;

Quick Fix: convert to subquery

select a, b from t where x > 10;
-- ^ quickfix:convert to subquery

-- becomes

select * from (
  select a, b from t where x > 10
);

Inlay Hints

datagrip has good support for these.

Column Names

insert into t values (/* column_a: */ 1, /* column_c: */ 'foo');
create view v(a, b) as select /* a: */ 1, /* b: */ 'foo';
create view v(a, b) as select /* a,y: */ * from foo;
with t(x, y) as (
  select /* x: */ 1, /* y: */ 2
)
select x, y from t;
create table t(id int, a text, b int);
create table u(id int, a text, b int);

select * from t
union
select /* id: */ 1, /* a: */ 'x', /* b: */ 'y' from u;

Function param names

create function foo(x int) returns text
as $$select 'foo'$$ language sql;

select foo(/* x: */ 'foo');

Join cardinality

See datagrip docs

select * from t join u /* 1<->1..n: */ using (user_id);

select * from t left join u /* 1<->0..n: */ using (user_id);

select * from t inner join u /* 1<->0..n: */ using (user_id);

select * from t full join u /* 0..n<->1: */ using (user_id);