For the future of Squawk
parse & lint in things like function definitions
should support all the features of normal SQL
linter and IDE support
- https://jupyterlab.readthedocs.io/en/latest/user/lsp.html
- https://github.com/denoland/deno/blob/57dd66ec3dae2a0fe376f6a43c476dfade421c84/cli/tools/jupyter/server.rs
- https://docs.astral.sh/ruff/configuration/#jupyter-notebook-discovery
support SQL embedded in other languages
-
requires supporting their various formatting strings among other things
-
https://peps.python.org/pep-0249/#paramstyle
- qmark Question mark style, e.g. ...WHERE name=?
- numeric Numeric, positional style, e.g. ...WHERE name=:1
- named Named style, e.g. ...WHERE name=:name
- format ANSI C printf format codes, e.g. ...WHERE name=%s
- pyformat Python extended format codes, e.g. ...WHERE name=%(name)s
-
https://fljd.in/en/2024/11/25/substituting-a-variable-in-a-sql-script/
-
https://www.timescale.com/blog/how-to-build-an-iot-pipeline-for-real-time-analytics-in-postgresql
$__timeFrom(),$__timeTo(), and$sensor_id- https://grafana.com/docs/grafana/latest/dashboards/variables/
parse and warn, helps with copy pasting examples
https://www.enterprisedb.com/blog/psqls-scripting-language-turing-complete-or-fibonacci-psql
support Trino, BigQuery, Aurora DSLQ, etc.
https://www.postgresql.org/docs/18/functions-string.html#FUNCTIONS-STRING-FORMAT
SELECT format('Hello %s', 'World');
-- ok
SELECT format('Hello %s %s', 'World');
-- error ^^with t as (select 1 as data)
select (data).id from t;postgres says:
Query 1 ERROR at Line 40: : ERROR: column notation .id applied to type integer, which is not a composite type
LINE 2: select (data).id from t;
^
create type f as (
id integer,
name text
);
with t as (select (1, 'a')::f as data)
select data.id from t;postgres says:
Query 1 ERROR at Line 41: : ERROR: missing FROM-clause entry for table "data"
LINE 2: select data.id from t;
^
we should suggest an autofix to:
with t as (select (1, 'a')::f as data)
select (data).id from t;squawk formatexample:
-- some query comment
SELECT name username, "email",
"weird-column-name"
FROM barbecomes
-- some query comment
select name username, email, "weird-column-name" from bar-
autofix trailing commas
- http://peter.eisentraut.org/blog/2025/02/11/how-about-trailing-commas-in-sql
- clickhouse & duckdb support them
-
comments maintained
-
disable for sections / files with special comment
config ideas:
- lower / upper case keywords (default lowercase)
- indent (default 2)
- quoted idents (default avoid)
- aliases (include
asor not)
links:
- https://mcyoung.xyz/2025/03/11/formatters/
- https://archive.jlongster.com/A-Prettier-Formatter
- https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf
- datagrip code formatting
https://www.dolthub.com/blog/2024-10-10-yacc-union-types/
sql for benchmarks maybe?
-
pg_graphql
-
postgrest
-
hasura
https://github.com/hasura/graphql-engine/blob/master/server/src-rsr/initialise.sql
-
pg_vector
https://github.com/pgvector/pgvector/blob/master/sql/vector.sql
-
doomql
-
rrule_plpgsql
-
pqrsafe
https://github.com/tanelpoder/tpt-postgres/blob/main/demos/pqrsafe.sql
from deno
checkcommand for type checkinglspfor language serverlintfor linter
https://www.postgresql.org/docs/17/datatype-pseudo.html
context aware errors
- aka disambgiuate types with their namespace if necessary: https://gleam.run//news/context-aware-compilation#context-aware-errors
type checking should allow us to give type errors without the user having to run the query
support more advanced lint rules
create type timerange as range (
subtype = time,
subtype_diff = time_subtype_diff
);if set, subtype_diff must be of type function time_subtype_diff(time without time zone, time without time zone)
function createRangeType<T>(subtype: T, subtype_diff: (T, T) => float8)https://www.postgresql.org/docs/current/rangetypes.html#RANGETYPES-DEFINING
Support for auto fixes on the command line and in an IDE for rules where it makes sense.
Make sure they've valid and won't error at runtime
https://www.postgresql.org/docs/17/functions-json.html#FUNCTIONS-SQLJSON-PATH-OPERATORS
https://www.postgresql.org/docs/17/queries-limit.html
warn about missing order by with limit
not important for a lot of queries, like:
select * from t limit 10;foo array[]
-- becomes
foo[]alter table t alter column c;
-- becomes
alter table t alter c;alter table t
alter column c set data type int;
-- becomes
alter table t
alter column c type int;select x as Foo from t;
-- ^^^ casing: `Foo` will get parsed as `foo`.Autofixes to
select x as "Foo" from t;These key words do not have any effect, since there are no dependencies on $name.
https://www.postgresql.org/docs/17/sql-dropcast.html
Applies to:
drop castdrop conversiondrop policydrop publicationdrop statisticsdrop subscription
select a::int from t;
-- ^^^^^^ pointless cast, `a` is already an `int`select * from t inner join u using (id);
-- becomes:
select * from t join u using (id);select * from t left outer join u using (id);
-- becomes:
select * from t left join u using (id);select * from t right outer join u using (id);
-- becomes:
select * from t right join u using (id);select * from t full outer join u using (id);
-- becomes:
select * from t full join u using (id); analyse foo.bar(a, b);
-- ^^^^^^^ Prefer analyzeselect c from unnest(sequence(1, 10, 2)) as t(c);
-- ^^^^^^^^ unknown function `sequence`, did you mean `generate_series`?
-- quick fix gives:
select c from generate_series(1, 10, 2) as t(c);Trino has sequence instead of generate_series.
set session query_max_run_time = 10m;Trino has support for suffixed literals
Running this in Postgres gives you:
Query 1 ERROR at Line 1: : ERROR: trailing junk after numeric literal at or near "10m"
LINE 1: set session query_max_run_time = 10m;
^
select * from t join u on true;
select * from t1, t2;suggests / autofixes to:
select * from t cross join u;
select * from t1 cross join t2;with config to change desired destination format
warn about natural joins and autofix to the equivalent
select * from t natural join u;suggests / autofixes to:
select * from t join u using (id, name, ip, description, meta);also offer an autofix back to natural join:
select * from t natural join u;This actually parsers in Postgres, but could work off a heuristic
select array_filter(
array[1, 2, 2, 3],
e -> (e % 2) = 0
--^^^^^^^^^^^^^^^^ Rule: lambdas aren't support in postgres
);
-- suggest
select (
select coalesce(
array_agg(e),
array[]::int[]
)
from unnest(array[1, 2, 2, 3]) t(e)
where (e % 2) = 0
);https://blog.jooq.org/when-sql-meets-lambda-expressions/
https://duckdb.org/docs/stable/sql/dialect/friendly_sql
Parse friendly sql syntax, like leading from clause, and warn with an autofix to convert to valid postgres syntax.
SELECT * FROM machines
WHERE ip_address IN (VALUES('192.168.0.1'::inet), ('192.168.0.10'), ('192.168.1.43'));becomes:
SELECT * FROM machines
WHERE ip_address IN ('192.168.0.1', '192.168.0.10', '192.168.1.43');via: https://www.postgresql.org/docs/17/sql-values.html
Tip: For simple IN tests, it's better to rely on the list-of-scalars form of IN than to write a VALUES query as shown above. The list of scalars method requires less writing and is often more efficient.
example to test this:
with t(name) as (select '1 month'::interval)
select count(*) from t where t.name in ('1 month', '2 month')
-- type checks!select (((x * 2)));
-- becomes
select x * 2;-- ok, indexes on expressions require extra parens
create index foo on t((1));
-- ok
select (x * 2) + 4;related: https://eslint.style/rules/no-extra-parens
select a from t where false = true;
-- ^^^^^^^^^^^^ constant condition
select case when 1 = 2 then 2 else 3 end;
-- ^^^^^ constant conditionrelated: https://eslint.org/docs/latest/rules/no-constant-condition
create table t(a int, b int);
create table u(a int, b int);
with x as (merge into t
using u on true
when matched then do nothing)
select * from x;
-- Query 1 ERROR at Line 19: : ERROR: WITH query "x" does not have a RETURNING clause
-- LINE 6: select * from x;should support various fixes so people can write in one dialect of SQL and have it easily convert to the other one
SELECT customer_id, merchant_id, request_id, count(*) from t
group by 1, 2, 3;
-- ^^^^^^^ implicit group by allbecomes:
SELECT customer_id, merchant_id, request_id, count(*) from t
group by all;and an action to go backwards:
SELECT customer_id, merchant_id, request_id, count(*) from t
group by all;
-- ^action:expand-group-by-all-namesbecomes:
SELECT customer_id, merchant_id, request_id, count(*) from t
group by customer_id, merchant_id, request_id;or with column numbers:
SELECT customer_id, merchant_id, request_id, count(*) from t
group by all;
-- ^action:expand-group-by-all-numbersbecomes:
SELECT customer_id, merchant_id, request_id, count(*) from t
group by 1, 2, 3;select a, b from t
-- ^ b must appear in group by, quick fix:
group by a;
-- becomes
select a, b from t
group by a, b;create function foo(a int, b int) returns int
as 'select $0'
-- ^^ unresolved column, did you mean `a` or `b`?
language sql;SELECT customer_id, total_amount
FROM (
SELECT customer_id, SUM(amount) AS total_amount, min(created_at)
-- ^^^^^^^^^^^^^^^ unused
FROM orders
GROUP BY customer_id
) AS customer_totals
WHERE total_amount > 1000;create table u(
id int primary key
);
create table t(
u_id int references u
-- ^quickfix
);
-- becomes
create table t(
u_id int references u(id)
);and vice versa
select a from t group by a having a > 10;
-- ^^^^^^
-- quick fix to:
select a from t where a > 10 group by a;create function foo(int) returns int as $$
select $1 * 2;
$$ language sql;
create aggregate foo(int) (
sfunc = int4pl,
stype = int,
initcond = '0'
);
-- Query 1 ERROR at Line 1: : ERROR: function "foo" already exists with same argument typesselect * from t order by a asc;
-- ^^^ order direction is redundent. asc is the default.
-- quick fix to:
select * from t order by a;with t(x) as (select 1, 2, 3)
select sum(x > 1) from t;Query 1 ERROR at Line 2: : ERROR: function sum(boolean) does not exist
LINE 2: select sum(x > 1) from t;
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
should instead offer an autofix to change it to a case statement:
with t(x) as (select 1, 2, 3)
select sum(case when x > 1 then 1 else 0 end) from t;rel: https://duckdb.org/2025/03/06/gems-of-duckdb-1-2.html
suggest using an aggregate or grouping by
Provide options to select from in quick fix
create table t (t_id int, name text, created timestampz);
create table u (u_id int, t_id int, created timestampz);
select name, created from u join t using (t_id);
-- ^^^^^^^ error: ambiguous column name `created`, prefix with either `t` or `u`
-- action: Prefix with...
-- Prefix with `t`
-- Prefix with `u`
-- gives
select name, u.created from u join t using (t_id);Check for invalid literals
select '{"foo": 1,}'::json;
-- ^ invalid json, unexpected trailing commacreate type foo as (
a int8,
b int8
);
select foo '(1)';
-- ^ malformed record literal, missing column bselect '[1,,2)'::numrange;
-- ^ malformed range literal, extra comma
create table reservation (room int, during tsrange);
insert into reservation values
(1108, '[2010-01-01 14:30,, 2010-01-01 15:30)');
-- ^ malformed range literal, extra comma
create table t (
a int,
b int,
c int
);
select a, b c from t;
-- ^warn: column label takes the same name as a table columnselect u.* from users u
where 0 = (select count(*) from addresses a where a.user_id = u.id);
-- ^^^warn: NOT EXISTS instead of comparing against zero
-- instead:
select u.* from users u
where NOT EXISTS (select from addresses a where a.user_id = u.id);via: https://www.depesz.com/2024/12/01/sql-best-practices-dont-compare-count-with-0/
via: https://www.postgresql.org/docs/17/sql-createview.html
CREATE VIEW vista AS SELECT 'Hello World';is bad form because the column name defaults to
?column?; also, the column data type defaults to text, which might not be what you wanted. Better style for a string literal in a view's result is something like:CREATE VIEW vista AS SELECT text 'Hello World' AS hello;
DROP GROUP is now an alias for DROP ROLE.
https://www.postgresql.org/docs/17/sql-dropgroup.html
SELECT * INTO films_recent FROM films WHERE date_prod >= '2002-01-01';
-- becomes
CREATE TABLE films_recent AS
SELECT * FROM films WHERE date_prod >= '2002-01-01';CREATE TABLE AS is functionally similar to SELECT INTO. CREATE TABLE AS is the recommended syntax, since this form of SELECT INTO is not available in ECPG or PL/pgSQL, because they interpret the INTO clause differently. Furthermore, CREATE TABLE AS offers a superset of the functionality provided by SELECT INTO.
In contrast to CREATE TABLE AS, SELECT INTO does not allow specifying properties like a table's access method with USING method or the table's tablespace with TABLESPACE tablespace_name. Use CREATE TABLE AS if necessary. Therefore, the default table access method is chosen for the new table. See default_table_access_method for more information.
via https://www.postgresql.org/docs/17/sql-selectinto.html
https://sql-info.de/postgresql/postgres-gotchas.html#1_4
select 256 * 256 * 256 * 256;
-- Query 1 ERROR at Line 1: : ERROR: integer out of rangeselect 256::int8 * 256 * 256 * 256;
-- ?column?
-- 4294967296https://rust-analyzer.github.io/blog/2019/11/13/find-usages.html
select name username, email, "weird-column-name" from bar
-- ^$ option+clicknavigates to schema definition file or if setup, navigates to the the source code from a user provided command
select json_array_length('["foo", "bar", "buzz"]');
-- ^$ option+clickfor builtins, we return the same stuff as:
select pg_get_functiondef('pg_catalog.json_array_length'::regproc)so in this case, we'd get:
CREATE OR REPLACE FUNCTION pg_catalog.json_array_length(json)
RETURNS integer
LANGUAGE internal
IMMUTABLE PARALLEL SAFE STRICT
AS $function$json_array_length$function$maybe simplify to:
create function pg_catalog.json_array_length(
json
) returns integer
language internal immutable parallel safe strict
as $$json_array_length$$non-builtin:
select inc(10)
-- ^$ option+clickgives:
create function inc(int) returns int
language sql immutable
as 'select $1 + 1';if we call that postgres function to get the def we get:
CREATE OR REPLACE FUNCTION public.inc(integer)
RETURNS integer
LANGUAGE sql
IMMUTABLE
AS $function$SELECT $1 + 1$function$create function pg_catalog.jsonb_extract_path(
from_json jsonb,
variadic path_elems text[]
) returns jsonb
language internal immutable parallel safe strict
as $$jsonb_extract_path$$select * from json_table(
jsonb '[1,2,3]',
'$[*] ? (@ < $x)'
-- ^^ go to def jumps to the def in the passing clause
passing 10 as x, 3 as y
-- ^ find references
-- ^ find references
columns (a text format json path '$ ? (@ < $y)')
-- references json_table arg ^
-- also references passing clause ^^
);-- via https://www.postgresql.org/docs/current/datatype-oid.html
nextval('foo') -- operates on sequence foo
nextval('FOO') -- same as above
nextval('"Foo"') -- operates on sequence Foo
nextval('myschema.foo') -- operates on myschema.foo
nextval('"myschema".foo') -- same as above
nextval('foo') -- searches search path for fooselect id, email, created_at from users
join login_att
-- ^$:suggestsuggests
select id, email, created_at from users
join login_attempts on login_attempts.user_id = users.idor maybe it does the snippet style thing:
select id, email, created_at from users
join login_attempts on login_attempts.$0 = $1data grips has nice snippet support: https://www.jetbrains.com/datagrip/features/editor.html#live
We should make writing and using CTEs easy since they're common in analytical queries
WITH customer_totals AS (
SELECT customer_id, SUM(amount) AS total_amount
FROM orders
GROUP BY customer_id
)
SELECT c.customer_id, c.total_amount
FROM customer_totals c
WHERE c.total_amount > 1000;create table bar (name varchar(255), email text, "weird-column-name" boolean);
select name username, email, "weird-column-name" from bar;
-- ^$ hovergives:
-- size = 24 (0x18)
name: nullable varchar(255)
--------------------------------------
The name of the user in the system.
rust analyzer gives:
squawk_parser::event::Event
Error { pub(crate) msg: String, }
size = 24 (0x18), align = 0x8, needs Drop
other fields?
-
alignment
-
common values and distribution
-
index size on hover
-
data staleness -- if you have a daily batch job to calculate data, we could expose the staleness date
another example:
select name username, email, "weird-column-name" from bar
-- ^$ hovergives:
-- size = 48 (0x30)
-- storage = 150 million rows, 500GB
create table user (
id integer seq_123123,
name varchar(255),
email varchar(125)
)
--------------------------------
Users of the platform.another example:
SELECT customer_id, SUM(amount) AS total_amount, min(created_at)
-- ^$ hover
FROM orders-- sum of expression across all input values
sum(expression smallint | int) -> bigint
sum(expression bigint) -> numeric
sum(expression double precision) -> double precision
sum(expression real) -> real
sum(expression interval) -> interval
sum(expression numeric) -> numericunnset
select * from unnest(ARRAY[1,2], ARRAY['foo','bar','baz']);
-- ^$ hover-- Expands multiple arrays (possibly of different data types) into a set of rows. If the arrays are not all the same length then the shorter ones are padded with NULLs. This form is only allowed in a query's FROM clause;
unnest(anyarray, anyarray [, ... ] ) -> setof anyelement, anyelement [, ... ]select * from unnest(ARRAY[1,2]);
-- ^$ hover-- Expands an array into a set of rows. The array's elements are read out in storage order.
unnest(anyarray) -> setof anyelementanother example:
SELECT customer_id, sum(cost) from cpus
group by 1;
-- ^$ hoverfield: cpus.customer_id
type: string
-- id of the customer that rents the CPU
case expr
select * from (select case
-- ^$ hover
when random() > 0.5 then
true
else
false
end)case boolean
unnest
select * from unnest(ARRAY[1,2], ARRAY['foo','bar','baz']);
-- ^$ hoverunnest integer,
unnest text
https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide
Aka a non-semantic version
- Monaco Syntax Highlighting
replicate what we have in WASM in IDE
also show lex command
- datagrip live templates
- insert
- select
- create table
- postgresql-snippets
select * from bar
-- ^$ action:rename-alias
-- becomes after filling in alias name with `b`
select b.* from bar banother example:
select name, email from bar
-- ^$ action:rename-alias
-- becomes after filling in alias name with `b`
select b.name, b.email from barselect * from (
select a, b from t where x > 10
);
-- ^introduce alias
-- becomes
select * from (
select a, b from t where x > 10
) as foo;should prompt for table name for each entry when there is an ambiguous column
related:
select t.a from t;
-- becomes (with user input for value)
select t2.a from t t2;select 1 > 2;
-- ^ flip >
select 1 < 2;
select 1 = 2;
-- ^ flip =
select 2 = 1;
-- etc, etc.select * from t join u using (u_id);
-- becomes
select * from u join t using (u_id);select * from t where a = 1 or a = 2;
-- becomes
select * from t where a in (1, 2);select * from t join u using (g_id);select * from t join u on t.g_id = u.g_id;table t;
-- ^$ action: convert to select
select * from t;
-- ^$ action: convert to tableexplain analyze insert into t values (1);
-- ^$ action: wrap possible side effect in a transaction
-- becomes
begin;
explain analyze insert into t values (1);
rollback;Important
Keep in mind that the statement is actually executed when the
ANALYZEoption is used. AlthoughEXPLAINwill discard any output that aSELECTwould return, other side effects of the statement will happen as usual. If you wish to useEXPLAIN ANALYZEon anINSERT,UPDATE,DELETE,MERGE,CREATE TABLE AS, orEXECUTEstatement without letting the command affect your data, use this approach:BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;
via: https://www.postgresql.org/docs/17/sql-explain.html
select * from bar
-- ^$ action:expandbecomes
select bar.name, bar.email, bar.buzz, bar.foo, bar."weird-column-name" from baror maybe
select name, email, buzz, foo, "weird-column-name" from barrelated:
select a, b from t;
-- ^ unknown table, quick fix: create table
-- becomes
create table t (a int not null, b int not null);
select a, b from t;We should have markers so you can quickly tab through and fill in the type and nullability for each field.
select name, email, buzz, foo, "weird-column-name" from bar
-- ^$ action:renamebecomes:
select name as username, email, buzz, foo, "weird-column-name" from baror maybe:
select name username, email, buzz, foo, "weird-column-name" from barif the quoted ident doesn't exist, the user probably meant to use single quotes to create a string literal. Add an auto fix suggestion.
select foo, "a" from t;
-- ^^^ unknown column `"a"`, did you mean to convert this to write a string literal?
-- ^^^ Quick Fix: convert to string literal
-- gives
select foo, 'a' from t;select "x" from "t";
-- ^ Quick Fix: unquotegives
select x from "t";and vice versa:
select x from "t";
-- ^ Quick Fix: quotegives:
select "x" from "t";Note: there's some gotchas with this that we need to handle:
-- okay
with t("X") as (select 1)
select "X" from t;
-- err
with t("X") as (select 1)
select X from t;
-- err
with t("X") as (select 1)
select x from t;or invalid column names:
-- ok
with t("a-b") as (select 1)
select "a-b" from t;
-- err
with t("a-b") as (select 1)
select a-b from t;
-- Query 1 ERROR at Line 2: ERROR: column "a" does not exist
-- LINE 2: select a-b from t;select 1 in [1,2,3];
-- or
select 1 in array[1,2,3];
-- become
select 1 = any(array[1,2,3]);select 1 in [1,2,3];
-- or
select 1 in array[1,2,3];
-- become
select 1 in (1,2,3);select * from (select 1);
-- becomes (with user input for value)
with t as (select 1)
select * from t;with t as (select 1)
select * from t;
-- becomes
select * from (select 1) as t;
with t(a) as (select 1)
select * from t;
-- becomes
select * from (select 1) as t(a);create table foo.t(a int);
select a from t;
-- ^?
-- becomes
select t.a from t;create table t(a int);
select a from t;
-- ^?
-- becomes
select a from public.t;select a, b from t where x > 10;
-- ^ quickfix:convert to subquery
-- becomes
select * from (
select a, b from t where x > 10
);datagrip has good support for these.
Column Names
insert into t values (/* column_a: */ 1, /* column_c: */ 'foo');create view v(a, b) as select /* a: */ 1, /* b: */ 'foo';create view v(a, b) as select /* a,y: */ * from foo;with t(x, y) as (
select /* x: */ 1, /* y: */ 2
)
select x, y from t;create table t(id int, a text, b int);
create table u(id int, a text, b int);
select * from t
union
select /* id: */ 1, /* a: */ 'x', /* b: */ 'y' from u;Function param names
create function foo(x int) returns text
as $$select 'foo'$$ language sql;
select foo(/* x: */ 'foo');Join cardinality
See datagrip docs
select * from t join u /* 1<->1..n: */ using (user_id);
select * from t left join u /* 1<->0..n: */ using (user_id);
select * from t inner join u /* 1<->0..n: */ using (user_id);
select * from t full join u /* 0..n<->1: */ using (user_id);