Getting `fmt` to the finish-line #2757

max-sixty · 2023-06-07T17:22:04Z

What's up?

After @aljazerzen built the foundations of this, I have filled a few gaps, and now all the examples format into correct PRQL. But we still have a few gaps in getting to something we can use everywhere.

In order to roll this out fully, it's quite important to be close to 100%, since all PRQL will be formatted this way — in the book, whenever someone saves a file in VS Code, whenever someone runs pre-commit (both in our repo and others'), etc. Without being close to 100%, it's not that instrumentally useful.

We currently remove comments!
Do we want to elide parentheses around function calls in {x = (sum foo)} (from feat: Format UnOp correctly #2803 (comment)). This in the docs here. It might require understanding the grandparent node (need to think more)
module breaks, which breaks the standard library (though I actually thought we were only going to do files as modules @aljazerzen ?) feat: codegen for module, type and annotations #2949
Do we want from t=tracks or from t = tracks? I marginally preferred the former — given that we have fairly few separators, having tighter expressions groups things better. (but this is not a strong view, maybe +0.3, and only aesthetics) style: Use foo=bar style in codegen #2779
Do we want to break inner transforms at all? e.g. this is a lot on one line. But I think my reaction is mostly based on what I'm used to, and I don't think there would be a good rule that lets us handle inner transforms differently. Though we could linebreak earlier (e.g. line break at 40)

prql/prql-compiler/tests/integration/snapshots/integration__fmt@distinct_on.prql.snap

Line 8 in 31c3e71

group {genre_id, media_type_id} (sort {(-album_id)} | take 1)

feat: improve codegen #2950
{(-foo)} has extra parentheses, :

prql/prql-compiler/tests/integration/snapshots/integration__fmt@distinct_on.prql.snap

Lines 8 to 9 in 31c3e71

group {genre_id, media_type_id} (sort {(-album_id)} | take 1)

sort {(-genre_id), media_type_id}

— feat: Format UnOp correctly #2803

The text was updated successfully, but these errors were encountered:

aljazerzen · 2023-06-07T18:10:20Z

module breaks, which breaks the standard library

We use it only internally, because I don't want to have 7 different files with 5 lines in each.

Do we want from t=tracks or from t = tracks?

I'm +0.5 on t=tracks.

Do we want to break inner transforms at all?

Right now, they are broken based on width only. We can add other heuristics, for example "more than two elements of a pipeline will always break into multiple lines".

max-sixty · 2023-06-10T07:54:10Z

I'm +0.5 on t=tracks.

#2779

max-sixty · 2023-06-15T06:37:07Z

One big gap I just realized — comments are erased!

This is something we have to think about — we want to retain some aesthetics; for example:

from t  # comment about `t`
# comment about the select
select f

...these two comments are in the same place as far as AST nodes go — but can't be treated the same.

aljazerzen · 2023-06-15T10:16:01Z

Jup. My plan was to do something similar to what rustfmt does. But that's hard, so I didn't - yet.

max-sixty · 2023-06-15T19:14:59Z

Jup. My plan was to do something similar to what rustfmt does. But that's hard, so I didn't - yet.

Interesting link, thanks for sharing. One option would be — since we're also in control of the parser — to have comments in the initial AST. During compilation, we could then run through a cheap remove_comments function...

max-sixty · 2023-06-15T19:27:44Z

To share my thinking — part of the reason for spending time here is: without getting all the way there, it's not that instrumentally useful — to auto-format files, we need it to produce reasonable PRQL.

It doesn't have to be completely perfect — we're OK with small line-length changes etc — but it needs to not lose information (e.g. comments), and be acceptable PRQL.

max-sixty · 2023-06-15T19:37:10Z

(I edited this a few times, was working through it in my own mind...)

I just added this to the description:

Do we want to elide parentheses around function calls in {x = (sum foo)} (from feat: Format UnOp correctly #2803 (comment)). This in the docs here. It might require understanding the grandparent node (need to think more)

As a reminder, the reason we require parentheses around function calls in select x = (sum foo) but not in {x = sum foo} is that the latter has an = after the first ident, and so isn't a function call.

x is aliased to sum foo

select x = (sum foo)

...but here, x is aliased to sum and then select receives a single arg of foo...

select x = sum foo  # wrong!

e is aliased to employees (not to employees (==id))

join e=employees (==id)

Here, x can't be a function call, since after the initial ident there's a =

derive {
  x = sum foo
}

I'm not sure it's possible to elide the parentheses in both without changing the syntax between an assign & alias, or resolving much later in the compilation pipeline

aljazerzen · 2023-06-16T06:30:54Z

To have a proper formatter, we do need to fix the comments. Your approach with having comments in AST would probably not work well, because it would move comments around, as AST would not capture where exactly the comment was (on the same line as code, in a new line, maybe indented?). This is why I think the rustfmt's approach would be better in the long term.

But, we don't need a proper formatter - at least not now. In the current state, the formatter can have the function of "the idiomatic PRQL standard" - i.e. the definition of what we want the idiomatic PRQL to look like. It can also be used to format book snippets (if they don't contain comments).

max-sixty · 2023-06-16T06:40:08Z

because it would move comments around, as AST would not capture where exactly the comment was (on the same line as code, in a new line, maybe indented?). This is why I think the rustfmt's approach would be better in the long term.

Hmmm, if I look at the output of rustfmt, it seems to retain:

Same line vs. next line
Whether there's a linebreak

...but that's all — can't have different identations, can't have more than one linebreak!

(this might still be too much to store in the AST)

But, we don't need a proper formatter - at least not now. In the current state, the formatter can have the function of "the idiomatic PRQL standard" - i.e. the definition of what we want the idiomatic PRQL to look like. It can also be used to format book snippets (if they don't contain comments).

Sure, that's nice :) But much less impactful than being able to format on every save!

aljazerzen · 2023-07-04T14:28:49Z

Do we want to break inner transforms at all? e.g. this is a lot on one line. But I think my reaction is mostly based on what I'm used to, and I don't think there would be a good rule that lets us handle inner transforms differently. Though we could linebreak earlier

Codegen framework is working only partially, I'm fixing it.

max-sixty · 2024-01-16T17:47:15Z

FYI I looked at adding comments to the lexer (not the parser), so we could use that to grab the comments. (#4094 was some of this)

It's not difficult in the lexer, but it would involve having lots of .then_ignore(comment.or_not()) throughout the parser (which isn't a herculean effort, but I didn't want to do without being confident we wanted to go this path). Another option would be to remove comments from the lexer output, before passing it into the Parser.

If we can get comments working, they we're in spitting distance of prqlc fmt working. Once it's working then that'll be a big achievement!

aljazerzen · 2024-01-16T18:44:21Z

For future reference: I've posted a few screenshots with general outline of how it is possible to implement formatting with comments on Discord.

Also, I've ticked off a few things from the checklist:

module is now properly formatted,
we decided we want from t = tracks
inner pipelines now get split onto multiple lines (when needed)

max-sixty · 2024-01-18T23:59:16Z

I've done a few PRs to add important whitespace to the lexer — I think now complete. We still need to decide how we add that to the output (@aljazerzen gave a few options on discord).

Another thing we might want to add is a way of turning off formatting. That could be as simple as a comment such as # fmt:off, and plausibly # fmt:on.

max-sixty added help wanted friendliness labels Jun 7, 2023

This was referenced Jun 16, 2023

feat: Format UnOp correctly #2803

Merged

feat: elide parentheses in aliases in tuples #2851

Merged

max-sixty added the priority label Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting `fmt` to the finish-line #2757

Getting `fmt` to the finish-line #2757

max-sixty commented Jun 7, 2023 •

edited by aljazerzen

Loading

aljazerzen commented Jun 7, 2023

max-sixty commented Jun 10, 2023

max-sixty commented Jun 15, 2023 •

edited

Loading

aljazerzen commented Jun 15, 2023

max-sixty commented Jun 15, 2023

max-sixty commented Jun 15, 2023

max-sixty commented Jun 15, 2023 •

edited

Loading

aljazerzen commented Jun 16, 2023

max-sixty commented Jun 16, 2023

aljazerzen commented Jul 4, 2023

max-sixty commented Jan 16, 2024

aljazerzen commented Jan 16, 2024

max-sixty commented Jan 18, 2024

Getting fmt to the finish-line #2757

Getting fmt to the finish-line #2757

Comments

max-sixty commented Jun 7, 2023 • edited by aljazerzen Loading

What's up?

aljazerzen commented Jun 7, 2023

max-sixty commented Jun 10, 2023

max-sixty commented Jun 15, 2023 • edited Loading

aljazerzen commented Jun 15, 2023

max-sixty commented Jun 15, 2023

max-sixty commented Jun 15, 2023

max-sixty commented Jun 15, 2023 • edited Loading

aljazerzen commented Jun 16, 2023

max-sixty commented Jun 16, 2023

aljazerzen commented Jul 4, 2023

max-sixty commented Jan 16, 2024

aljazerzen commented Jan 16, 2024

max-sixty commented Jan 18, 2024

Getting `fmt` to the finish-line #2757

Getting `fmt` to the finish-line #2757

max-sixty commented Jun 7, 2023 •

edited by aljazerzen

Loading

max-sixty commented Jun 15, 2023 •

edited

Loading

max-sixty commented Jun 15, 2023 •

edited

Loading