Skip to content

Rework oxc_prettier #5068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Boshen opened this issue Aug 22, 2024 · 43 comments
Open

Rework oxc_prettier #5068

Boshen opened this issue Aug 22, 2024 · 43 comments
Assignees
Labels
C-enhancement Category - New feature or request E-Help Wanted Experience level - For the experienced collaborators

Comments

@Boshen
Copy link
Member

Boshen commented Aug 22, 2024

Note

We have decided to fork the Biome formatter code and rework as oxc_formatter.
@Dunqing is now actively working on this to improve compatibility and resolve #10179.

Original description

crates/oxc_prettier was my attempt at the prettier bounty.

I thought I could finish it in time, except the fact that I rushed too quickly without looking at all the requirements ... It was too late when I got blocked by printing comments.

In order to rework oxc_prettier, we need to understand at least:

As for the infrastructure, we already have most of the code:

Feel free to remove everything and start from scratch, and copy over the format code https://github.com/oxc-project/oxc/tree/main/crates/oxc_prettier/src/format

@Boshen Boshen added C-enhancement Category - New feature or request A-prettier E-Help Wanted Experience level - For the experienced collaborators labels Aug 22, 2024
@parkin-lin

This comment was marked as off-topic.

@DonIsaac

This comment was marked as off-topic.

@Boshen
Copy link
Member Author

Boshen commented Sep 4, 2024

@leaysgur is writing a series of articles in preparation of this task:


I'm also working on comment attachments to unblock prettier.

@Boshen Boshen pinned this issue Sep 4, 2024
@leaysgur

This comment has been minimized.

@Boshen

This comment has been minimized.

@IWANABETHATGUY
Copy link
Contributor

IWANABETHATGUY commented Sep 7, 2024

For those who are interested in algorithms under the hood, prettier is based on https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf,
https://prettier.io/docs/en/technical-details

@leaysgur
Copy link
Member

It has been 3 weeks since I started reading the Prettier source code.
It's still far from being complete, but I'd like to leave some progress and summary here.

How to debug Prettier

There are 3 ways:

  • Playground
    • Enable "Debug" options in the sidebar
  • CLI
    • With --debug-* args
  • Node.js API
    • Under __debug exports

https://leaysgur.github.io/posts/2024/09/03/111109/

It is written in Japanese, but it is all code, so you can understand it. 😉

I also recommend to run node --inspect-brk the code with debugger and inspect it from Chrome DevTools.

How to handle comments

I will post some topics for discussion in a few days.

@leaysgur
Copy link
Member

How to handle comments

As you may know, Prettier's formatting process consists of roughly 3 phases:

  • P1: original text > AST
  • P2: AST > Doc
  • P3: Doc > formatted text

Comments are collected in P1 and used in P2.

In P1:

  • It simply parses to the AST with comments in the output
    • However, there are also some adjustments made to the AST nodes
    • Some parsers, such as Babel, already attach comments to nodes at this stage
      • However, Prettier does not use them
      • The reason is to support parsers other than Babel
  • As the final step of P1 (more technically, the beginning of P2), comments are classified and attached to nodes
    • First, it starts by finding nearby nodes for each comment
    • Based on that, it determines the placement(ownLine, endOfLine, remaining) from the lines before and after each comment
    • Then, it handles about 30(!) known special patterns for each placement
    • Finally, it finishes using unique tie-breaking algorithm

As a result, some AST nodes have comments property with array of Comment extended with leading, trailing and few more props.

In P2 (I haven’t read the code in detail here yet),

  • When each node is converted into a Doc, comments are also converted into Docs
    • Therefore, how they are output seems to have already been decided in P1

In OXC, part of the necessary information is already implemented and can be obtained. / #5785

However, just like with Babel, that information may be different from what Prettier requires...


So, I think I’ve generally understood "what" Prettier is doing.

However, as for "why" Prettier does it that way, I can only say it’s because that’s Prettier’s opinion.

Incidentally, there seem to be at least around 120 issues related to JS/TS at the moment, but

https://github.com/prettier/prettier/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3Alang%3Ajavascript%2Cjsx%2Ctypescript+label%3Atype%3Abug

about 50 of them are related to comments, with some remaining unresolved since as far back as 2017.

https://github.com/prettier/prettier/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3Alang%3Ajavascript%2Cjsx%2Ctypescript+label%3Atype%3Abug+label%3Aarea%3Acomments

@leaysgur

This comment has been minimized.

@Boshen

This comment has been minimized.

@Boshen

This comment has been minimized.

@leaysgur

This comment has been minimized.

@magic-akari
Copy link
Contributor

For the long run, I envision a more configurable and less opinionated formatter.

Does this mean that oxc_prettier will provide a wide range of configurable options, and offer a preset called prettier to match with prettier?

@leaysgur
Copy link
Member

How to handle comments

(Follow up of #5068 (comment))

As I posted above, comments are collected and attached to AST nodes in P1.
Then in P2, comments are printed as Doc along with other AST nodes.

Most comments are printed with their attached nodes like:

[leadingCommentsDoc, nodeDoc]
// or
[nodeDoc, trailingCommentsDoc]

https://github.com/prettier/prettier/blob/e3b8a16c7fa247db02c483fb86fc2049d45763b8/src/main/ast-to-doc.js#L128-L138

But the rest of the comments are handled as needed.

  • Dangling comments
  • Not dangling(leading or trailing) but need special treatment

There are about 40 files for printing ESTree AST to Doc.

https://github.com/prettier/prettier/tree/main/src/language-js/print

And 15 files of them print comments on demand.

❯ rg 'print(Dangling)?Comments(Separately)?' -l src/language-js/print
estree.js
class.js
type-annotation.js
function-parameters.js
mapped-type.js
component.js
module.js
function.js
ternary.js
array.js
binaryish.js
property.js
call-arguments.js
block.js
jsx.js
arrow-function.js
object.js
member-chain.js
type-parameters.js

@Boshen

This comment has been minimized.

@leaysgur

This comment has been minimized.

@Boshen

This comment has been minimized.

@srijan-paul

This comment has been minimized.

@pumano

This comment has been minimized.

@YurySolovyov

This comment has been minimized.

@leaysgur

This comment has been minimized.

@leaysgur

This comment has been minimized.

@Boshen

This comment has been minimized.

@leaysgur

This comment has been minimized.

@Boshen
Copy link
Member Author

Boshen commented Nov 25, 2024

@leaysgur will lead this project, and is free to make any changes to the oxc_prettier crate.

Boshen pushed a commit that referenced this issue Feb 6, 2025
Part of #5068 

Verified and completed `print/array.rs`, except for comment handling.
Boshen pushed a commit that referenced this issue Feb 6, 2025
Part of #5068 

Cosmetic changes only.
Boshen pushed a commit that referenced this issue Feb 10, 2025
Part of #5068 

Update `doc.to_string()` output to `Prettier.__debug.formatDoc()`
compatible Doc AST json format.

```sh
# Usecase
cargo run -p oxc_prettier --example prettier --quiet -- --debug | jq .

# Advanced
cargo run -p oxc_prettier --example prettier --quiet -- --debug | pbcopy
# Open Prettier playground, select doc-explorer as parser option, then paste as input!
```
Boshen pushed a commit that referenced this issue Feb 12, 2025
Part of #5068 

Full rewrite `print/object`, slight improvement. 😇
Boshen pushed a commit that referenced this issue Feb 14, 2025
Part of #5068 

Support `objectWrap` option added in v3.5.0.

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Boshen pushed a commit that referenced this issue Feb 27, 2025
Part of #5068 

Verify and refactor `print/*`.

- Verify the main printing logic w/ refactoring
  - The `propagate_breaks()` needs to be reworked soon
- Test coverage drops slightly because of this, but the root cause is
the original `Doc` structure
- Export `print_doc_to_string` as a function
- Some functions in `format/*` require the printed result to determine
formatting...
- Update the `Doc` macro comments
- Properly escape strings in `Doc` JSON
@MichaReiser
Copy link

MichaReiser commented Feb 27, 2025

I'm probably a bit late here but what we did for ruff (Python formatter) was to vendor biome_formatter, adjust it to use our own AST (you don't want to convert between different representations for best performance). We, obviously, couldn't reuse any of the formatting logic because it's Python ;)

I also think that building/vendoring is the right choice. It's a fundamental piece of your formatter and you want to have control over it.

@leaysgur
Copy link
Member

leaysgur commented Feb 28, 2025

@MichaReiser Thanks for the very useful information!

It's not too late at all. I'm still asking myself every day whether to base on Prettier or biome_formatter. 😅

So far, I mainly checked if it is possible to use biome_formatter directly as an external crate instead of vendor(fork)ing it, and concluded that it seems not to be possible.

On top of that,

  • fork biome_formatter and start from scratch
  • or continue to improve the existing Prettier-based code

To determine which directions is more promising, I spent time to diff the current code with Prettier, examine what is missing while picking low-hanging fruit. (And it's been two months already...!)

Even though, if I want to base it on biome_formatter, just getting it to work seems to take a lot of time due to my limited time and knowledge. 🙈

The situation might be different for JS(+JSX+TS) and Python, but do you think it would also be better for OXC to be based on biome_formatter like ruff? Do you have any advice? 👀

@jycouet
Copy link

jycouet commented Feb 28, 2025

Would love to try the prototype from npm 😇

@MichaReiser
Copy link

It's not too late at all. I'm still asking myself every day whether to base on Prettier or biome_formatter. 😅

Yeah, it's a hard decision!

The situation might be different for JS(+JSX+TS) and Python, but do you think it would also be better for OXC to be based on biome_formatter like ruff? Do you have any advice? 👀

I can tell you why we decided to use biome_formatter. Ultimately, that's a decision you have to make for your project and I'm probably biased because I wrote most of biomejs_formatter.

  • Using biomejs_formatter gave us a huge head start because it gave us most infrastructure for free. The only thing we had to do was fix the compile errors (because of CST dependencies).
  • I authored most of biomejs_formatter. I'm obviously happy with most its design decisions and understand it well. You might not agree with some of my design decisions or have ideas to make the formatter even faster.
  • biomejs_formatter reimplements Prettier very closely (the semantics are 99.99% the same, it's mainly that I named things differently and that it's Rusty)

Our approach was to copy over biomejs_formatter and then rip out/replace everything that didn't fit well into our architecture. That worked fairly well. We ultimately also ended up introducing new IR elements that allow us to format some common Python syntax more performantly.

You could also take a middle ground and copy code more incrementally. That could also be a good learning experience. Start with the Printer and the IR (which is, by the way, the most important foundation already). Then go ahead and change the IR however you like or make changes to the Printer. The advantage of this is that you already get most of the IR semantics right (it took me a long time to reverse engineer all the semantics!) It also gives you the freedom to disagree with all my naming decisions and rename the IR elements or even decide to use an entirely different representation (struct of arrays?). You could even decide that the Printer is just not what you want and that means you have your decision not to use biomejs_formatter.

From there, you can move on and vendor more of biomejs (or decide to go different ways). Vendor the Format trait or decide that you want to use standalone functions with the signature fn format_node(node: &Node, context: &Context) -> FormatResult. Rip out the invalid-syntax error if that's something you don't want to support... You can also decide that you aren't interested in having two separate crates (biomejs_formatter and biomejs_javascript_formatter). This removes the need for all those FormatWith traits because the orphan rules no longer apply (https://github.com/astral-sh/ruff/blob/0945803427ef971724ce29565cffb58ffc06166c/crates/ruff_python_formatter/orphan_rules_in_the_formatter.svg#L16)

You can then move on to start investigating comments. Biome comes with a CommentsMap, but maybe that's not needed for OXC or there's a better representation for OXC.

I hope you find some of this helpful. Overall, I encourage you to take what you find useful and throw away/rewrite everything that you don't. You can also take a look at ruff's formatter to see how we changed biomejs_formatter to work with an AST instead of CST. Let me know if you have any questions (here or in the discord formatter channel).

@leaysgur
Copy link
Member

Thank you so much, senpai! 🥹 Your help is extremely valuable, and I really appreciate it.

It was a nice discovery for me that there is an example of using the biome_formatter code for a custom AST rather than one based on the Biome CST.

Now that I have a general idea of the approach based on Prettier, I will try to start by copying the biome_formatter code next time.

Thanks again~!

@leaysgur
Copy link
Member

leaysgur commented Mar 12, 2025

I've been busy with another matter, so there's been a slight delay, but I wanted to report my progress.

For the past few weeks, I have been working on a PoC implementation based on biome(_js)_formatter, following the kind guidance(thanks again for that! 😉). The current code is available in my repo.

https://github.com/leaysgur/oxc_formatter

As a first step, I have implemented the foundational part that performs the minimal formatting.

But at this point, I'd like to consult on the future direction. In simple terms, I want to decide whether to continue with the current Prettier-based or to go with the Biome-based.

Originally, considering future extensibility and maintainability, it seemed better to go with the already normalized Biome-based code, which is why I started this PoC.

However, while the IR and Printer could be mostly ported as-is, I'm concerned that the amount of code has turned out to be larger than expected to port IR builders and other stuff.

Furthermore, I feel that more time is needed to understand the purpose and background of them.(And for future contributors.)

Of course, I think that could be simplified with some effort, but judging is the matter, and there is no way to ensure that those changes wouldn't cause issues for now.
Also, at least for now, I am not capable of restructuring from a performance perspective centered around oxc_allocator.

@Boshen (I apologize for bothering you when you're busy, but...)
Could you take a quick look at the PoC code when you have some time and share your thoughts?

Until then, I'll continue working on the Biome-based approach. 💪🏻


FYI: This is also a WIP, but the numbers are quite interesting. 👀

https://github.com/leaysgur/prettier-plugin-oxc

@MichaReiser
Copy link

However, while the IR and Printer could be mostly ported as-is, I'm concerned that the amount of code has turned out to be larger than expected to port IR builders.

Can you tell more about which parts your concerned? Or what has turned out to be larger than expected?

I saw a question around source text in the git repo. You may want to take a look at the FormatElements in Ruff (not a CST)

https://github.com/astral-sh/ruff/blob/b63c2e126b928fab4180eda5d9132552beeee7fe/crates/ruff_formatter/src/format_element.rs#L36-L50

I saw another question around lines_before. The way we solve this in Ruff is by doing ad-hoc re-lexing. This is similar to what Prettier does

https://github.com/astral-sh/ruff/blob/c9fdb1f5e3a4d0d60b4537741f2c9c19e2426038/crates/ruff_python_trivia/src/tokenizer.rs#L66-L113

FYI: This is also a WIP, but the numbers are quite interesting. 👀

This is interesting indeed and might be a good and very easy improvement for downstream users

@leaysgur
Copy link
Member

Thanks as always!

Can you tell more about which parts you're concerned about? Or what has turned out to be larger than expected?

It was mainly about things related to the Buffer trait and BufferExtensions, BufferSnapshot, Recording, etc.

However, on second thought, this impression might just be due to the fact that I'm just feeling unfamiliar with these mechanisms because they are not present in Prettier (right?). And as soon as I encounter real use cases, I might understand their necessity.

And as for the total amount of code, it might just be that I’m not very familiar with Rust yet, and for experienced developers it might not be a significant concern at all.

I saw a question around source text

Sorry, this is my skill issue too. 😓

When I attempted to include source_text: &'a str in FormatContext, I ran into numerous lifetime errors related to Buffer trait, so I just postponed it.

In any case, I think I need to spend a little more time on it for myself.
While it’s fine for me to invest my time, I realize that the main team have their own priorities, which is why I wanted to raise this concern.

@MichaReiser
Copy link

It was mainly about things related to the Buffer trait and BufferExtensions, BufferSnapshot, Recording, etc.

Prettier doesn't have Buffer trait because each formatting function returns the formatted IR elements. This is somewhat bad for performance, which is why the formatter instead uses the Buffer trait where functions can write the IR elements to, which avoids allocating a Vec for each format function only to return the elements. It's very similar to rust's Write trait.

If I were you, I'd probably delete BufferExtension, BufferSnapshot and Recording for now. You'll see why they exist when you start implementing specific formatting rules. But for a short overview:

  • BufferExtension: It's mainly syntax sugar to make some convenience methods directly available on Buffer (which Formatter implements`). It follows Rust's extension trait approach
  • BufferSnapshot: I don't remember which specific syntax it is used for but I think it's lambda formatting. Either way, there's one syntax where Prettier tries to format it using one layout but it may "fail" if it realize that this layout looks poorly. The way this is implemented in Prettier is by raising an exception and catching it. In biome, this is implemented by returning a FormatError when seeing that the layout looks poorly. However, biome has an additional challenge due to using Buffer: The Buffer now contains format elements using the "poor layout" and they need to be discarded. The way this is implemented is by first taking a snapshot of the buffer (remembering how many elements were written thus far) before trying the first layout. The buffer gets restored (reset the buffer length to the number of elements before trying the first formatting) when the FormatError::PoorLayout gets caught.
  • Recording: It's sometimes necessary to inspect the IR generated by some formatting function to make a layout decision. Recording allows you to do that with low overhead (without any extra allocations). The alternative is to use Memoized but that comes at an extra cost.

@leaysgur
Copy link
Member

Thank you, I've learned a lot from you!

If I were you, I'd probably delete BufferExtension, BufferSnapshot and Recording

Actually, I hadn't copied them in the initial stage.

However, since they seemed to be used in builders related to BlockIndent that would likely be used frequently, I ended up copying them somehow.

https://github.com/biomejs/biome/blob/aaa9443c551dcea220adae1d26494b9850e1849e/crates/biome_formatter/src/builders.rs#L1574-L1583

More importantly, I'm glad I learned how FormatError is used. I now realize that I need to restore them which completly removed... 😧

@MichaReiser
Copy link

Oh, I wasn't aware that it is used in block_indent. It's mainly an ergonomics feature where it avoids writing the indent if there are no arguments. I'm not sure if I'd implement it the same again. You could consider removing it and re-add it if the explicit checks at the call sites become too annoying.

@PodaruDragos
Copy link

Hello, apologies if this has been discussed before but would the formatter be opinionated as prettier ?
Or do we can have config files to define rules for formatting ? That can also accept feedback from the community

@leaysgur
Copy link
Member

Ultimately, we talked about making it configurable/flexible.
However, as for now, we're focusing on implementing something equivalent to Prettier.

#5068 (comment)

Boshen added a commit that referenced this issue Mar 30, 2025
Part of #5068, suggested by @ematipico and @MichaReiser, where @leaysgur
and I explored using biome formatter as the base and made a success. All
credits to the old Rome and the new Biome team.

This PR is in a boilerplate state where some of the functions are marked
as `todo!()`.

I also made the following changes:

* changed the base to be less generic, removed the `Language` and
`Context` generics
* expanded the formatter lifetime to include an `'ast` lifetime
* added access to the source text and comments
* added a mechanism for tracking and accessing parent ast nodes (done by
our codegen)
* removed most of the unit tests because they no longer compile
* removed the `oxc_prettier` crate
@Boshen Boshen removed the A-prettier label Mar 30, 2025
@Boshen Boshen unpinned this issue Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category - New feature or request E-Help Wanted Experience level - For the experienced collaborators
Projects
None yet
Development

No branches or pull requests