Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting fmt to the finish-line #2757

Open
5 of 6 tasks
max-sixty opened this issue Jun 7, 2023 · 13 comments
Open
5 of 6 tasks

Getting fmt to the finish-line #2757

max-sixty opened this issue Jun 7, 2023 · 13 comments

Comments

@max-sixty
Copy link
Member

max-sixty commented Jun 7, 2023

What's up?

After @aljazerzen built the foundations of this, I have filled a few gaps, and now all the examples format into correct PRQL. But we still have a few gaps in getting to something we can use everywhere.

In order to roll this out fully, it's quite important to be close to 100%, since all PRQL will be formatted this way — in the book, whenever someone saves a file in VS Code, whenever someone runs pre-commit (both in our repo and others'), etc. Without being close to 100%, it's not that instrumentally useful.

@aljazerzen
Copy link
Member

module breaks, which breaks the standard library

We use it only internally, because I don't want to have 7 different files with 5 lines in each.

Do we want from t=tracks or from t = tracks?

I'm +0.5 on t=tracks.

Do we want to break inner transforms at all?

Right now, they are broken based on width only. We can add other heuristics, for example "more than two elements of a pipeline will always break into multiple lines".

@max-sixty
Copy link
Member Author

I'm +0.5 on t=tracks.

#2779

@max-sixty
Copy link
Member Author

max-sixty commented Jun 15, 2023

One big gap I just realized — comments are erased!

This is something we have to think about — we want to retain some aesthetics; for example:

from t  # comment about `t`
# comment about the select
select f

...these two comments are in the same place as far as AST nodes go — but can't be treated the same.

@aljazerzen
Copy link
Member

Jup. My plan was to do something similar to what rustfmt does. But that's hard, so I didn't - yet.

@max-sixty
Copy link
Member Author

Jup. My plan was to do something similar to what rustfmt does. But that's hard, so I didn't - yet.

Interesting link, thanks for sharing. One option would be — since we're also in control of the parser — to have comments in the initial AST. During compilation, we could then run through a cheap remove_comments function...

@max-sixty
Copy link
Member Author

To share my thinking — part of the reason for spending time here is: without getting all the way there, it's not that instrumentally useful — to auto-format files, we need it to produce reasonable PRQL.

It doesn't have to be completely perfect — we're OK with small line-length changes etc — but it needs to not lose information (e.g. comments), and be acceptable PRQL.

@max-sixty
Copy link
Member Author

max-sixty commented Jun 15, 2023

(I edited this a few times, was working through it in my own mind...)

I just added this to the description:

As a reminder, the reason we require parentheses around function calls in select x = (sum foo) but not in {x = sum foo} is that the latter has an = after the first ident, and so isn't a function call.

x is aliased to sum foo

select x = (sum foo)

...but here, x is aliased to sum and then select receives a single arg of foo...

select x = sum foo  # wrong!

e is aliased to employees (not to employees (==id))

join e=employees (==id)

Here, x can't be a function call, since after the initial ident there's a =

derive {
  x = sum foo
}

I'm not sure it's possible to elide the parentheses in both without changing the syntax between an assign & alias, or resolving much later in the compilation pipeline

@aljazerzen
Copy link
Member

To have a proper formatter, we do need to fix the comments. Your approach with having comments in AST would probably not work well, because it would move comments around, as AST would not capture where exactly the comment was (on the same line as code, in a new line, maybe indented?). This is why I think the rustfmt's approach would be better in the long term.

But, we don't need a proper formatter - at least not now. In the current state, the formatter can have the function of "the idiomatic PRQL standard" - i.e. the definition of what we want the idiomatic PRQL to look like. It can also be used to format book snippets (if they don't contain comments).

@max-sixty
Copy link
Member Author

because it would move comments around, as AST would not capture where exactly the comment was (on the same line as code, in a new line, maybe indented?). This is why I think the rustfmt's approach would be better in the long term.

Hmmm, if I look at the output of rustfmt, it seems to retain:

  • Same line vs. next line
  • Whether there's a linebreak

...but that's all — can't have different identations, can't have more than one linebreak!

(this might still be too much to store in the AST)


But, we don't need a proper formatter - at least not now. In the current state, the formatter can have the function of "the idiomatic PRQL standard" - i.e. the definition of what we want the idiomatic PRQL to look like. It can also be used to format book snippets (if they don't contain comments).

Sure, that's nice :) But much less impactful than being able to format on every save!

@aljazerzen
Copy link
Member

Do we want to break inner transforms at all? e.g. this is a lot on one line. But I think my reaction is mostly based on what I'm used to, and I don't think there would be a good rule that lets us handle inner transforms differently. Though we could linebreak earlier

Codegen framework is working only partially, I'm fixing it.

@max-sixty
Copy link
Member Author

FYI I looked at adding comments to the lexer (not the parser), so we could use that to grab the comments. (#4094 was some of this)

It's not difficult in the lexer, but it would involve having lots of .then_ignore(comment.or_not()) throughout the parser (which isn't a herculean effort, but I didn't want to do without being confident we wanted to go this path). Another option would be to remove comments from the lexer output, before passing it into the Parser.

If we can get comments working, they we're in spitting distance of prqlc fmt working. Once it's working then that'll be a big achievement!

@aljazerzen
Copy link
Member

For future reference: I've posted a few screenshots with general outline of how it is possible to implement formatting with comments on Discord.

Also, I've ticked off a few things from the checklist:

  • module is now properly formatted,
  • we decided we want from t = tracks
  • inner pipelines now get split onto multiple lines (when needed)

@max-sixty
Copy link
Member Author

I've done a few PRs to add important whitespace to the lexer — I think now complete. We still need to decide how we add that to the output (@aljazerzen gave a few options on discord).


Another thing we might want to add is a way of turning off formatting. That could be as simple as a comment such as # fmt:off, and plausibly # fmt:on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants