Language annotation support #343

dyegoaurelio · 2025-09-22T17:24:57Z

Initial work to get language annotation support working

This pr is not ready, but I created it mostly to get feedback on how we should handle edge cases and the direction I'm taking.

closes #179

github-actions · 2025-09-22T17:25:11Z

Nixpkgs diff

dyegoaurelio · 2025-09-22T17:25:37Z

src/Nixfmt/Lexer.hs

+isLanguageIdentifier :: Text -> Bool
+isLanguageIdentifier content =
+  let stripped = strip content
+  in not (Text.null stripped)
+      && Text.length stripped <= 30 -- TODO: make configurable or remove limit


Do we want a length limit here?

I added this logic as well as a test case for it, but I'm not sure if that's the intended behavior

Based on

The rendering engine currently ignores all line length calculations involving comments, however this strongly depends on the assumption that comment tokens are always strictly followed by a line break.
-- @piegamesde #179 (comment)

I'm assuming yes.

As IIUC, the code responsible for line wrapping will assume this token is small and not worth wrapping, since it doesn't know how long comments actually are.

It's also possible that I've completely misunderstood this though!

I don't think this is related to this comment
This is length limit for considering a inline comment a language notation block.
This is the reason this test case

nixfmt/test/diff/language-annotation/in.nix

Lines 100 to 103 in 6830219

# Edge case: very long language annotation

longAnnotation = /* this-is-a-very-long-language-annotation-that-might-affect-line-length */ ''

content

'';

Gets formatted as a regular comment

nixfmt/test/diff/language-annotation/out.nix

Lines 105 to 109 in 6830219

# Edge case: very long language annotation

longAnnotation = # this-is-a-very-long-language-annotation-that-might-affect-line-length

''

content

'';

I don't think this is related to this comment

My interpretation is the reason we want a length limit on what is considered a "language annotation" is because we can't wrap lines based on line length when the line is long because of a comment.

If we had smarter line wrapping, then we probably wouldn't need a special case for "annotations" and could simply preserve more inline comments more generally.

If we had smarter line wrapping, your test case could probably render something like:

# Edge case: very long language annotation longAnnotation = /* this-is-a-very-long-language-annotation-that-might-affect-line-length */ '' content '';

Or a similar example with some combos of long/short binding names and/or inline comments:

thisIsAVeryLongBindingNameThatWillAffectLineLengthForSure = /* short */ '' content ''; thisIsALongBindingNameButNotTooLong = /* short */ '' content ''; thisIsAnotherLongBindingNameButNotTooLong = /* medium-length-comment */ '' content '';

My argument is that the only reason we need to handle language annotations specially, is because we currently handle these kinda long comments poorly. Therefore, if we don't cap the length of the language annotations, they will suffer from the same issues: long comments being allowed to stay at the end of long lines.

...at least, if I understand correctly.

Makes sense

Is it fine to hard code this limit here or should this come from some cli option?

Is it fine to hard code this limit here or should this come from some cli option?

I think it's niche enough not to need a CLI option, personally.

I'll let others review the implementation details, as my Haskell is very weak.

I'd be fine with removing the limit here. The formatter doesn't promise to keep lines under a certain length, this would just be another way to have a long line.

That said, I'm also totally OK with the code as written. We can always make tweaks in the future, and erring on the side of preserving the old behaviour is a safe choice.

Without a limit, we could end up with things like:

somewhat_long_name = /* very-long-comment-that-makes-this-line-so-long-you-cant-really-see-the-string-value */ "a moderate length string";

IIUC, the formatter will count that comment as having zero length when deciding whether to do any line wrapping.

Maybe there's a way we could have it so that normal comments are still not counted to the line length, but annotations are?

That sounds like the ideal solution, but I personally don't want to block this PR on it.

dyegoaurelio · 2025-09-22T17:26:27Z

test/diff/language-annotation/in.nix

+  # Language annotation in function arguments
+  processCode = builtins.readFile (/* lua */ ''
+    return "Hello"
+  '');


This input is not idempotent in master

Checking test/diff/language-annotation/in.nix … <stdin>: Nixfmt is not idempotent. This is a bug in nixfmt. Please report it at https://github.com/NixOS/nixfmt After one formatting: builtins.readFile ( # lua '' return "Hello" '' ) After two: builtins.readFile ( # lua '' return "Hello" '')

so we're also fixing this bug here

Yes, but also I can imagine inputs not covered by this PR which still exhibit this bug, so it might be worth tracking separately

piegamesde · 2025-09-22T19:21:18Z

Can you please outline the rough high-level approach taken here to make reading the code easier?

dyegoaurelio · 2025-09-23T01:17:20Z

Sure!

The test cases on test/diff/language-annotation/in.nix are pretty extensive and cover both when we should and when we shouldn't treat block comments as language annotation.

I added a new Trivium as well a ParseTrivium type to represent language annotation

Most of the logic is on Lexer.hs:
I implemented the languageAnnotation parser to check if a trivia block is a language annotation before it gets assigned to some other type

MattSturgeon · 2025-09-23T12:21:26Z

Looks like some test cases are still using the "old" formatting: https://github.com/NixOS/nixfmt/actions/runs/17923265870/job/50963146661?pr=343#step:6:1004

Is that why this is still a draft? Or are there other things to resolve too?

dyegoaurelio · 2025-09-23T12:58:39Z

@MattSturgeon Yes.

I also want to get feedback regarding some edge cases that I decided how to handle, such as

#343 (comment)

both of these are ignored:

nixfmt/test/diff/language-annotation/in.nix

Lines 44 to 53 in 6830219

    
             # Multiple block comments in sequence 
        
             sequentialComments = /* first */ /* second */ '' 
        
               some content 
        
             ''; 
        
             # Block comment with line breaks 
        
             multilineBlockComment = /* this is a 
        
                                        multiline comment */ '' 
        
               content 
        
             '';

nixfmt/test/diff/language-annotation/out.nix

Lines 44 to 58 in 6830219

    
             # Multiple block comments in sequence 
        
             sequentialComments = # first second 
        
               '' 
        
                 some content 
        
               ''; 
        
             # Block comment with line breaks 
        
             multilineBlockComment = 
        
               /* 
        
                 this is a 
        
                 multiline comment 
        
               */ 
        
               '' 
        
                 content 
        
               '';

test/diff/language-annotation/out.nix

test/diff/language-annotation/in.nix

test/diff/language-annotation/out.nix

MattSturgeon

Thank you so much for working on this, I've been looking forward to this feature since before I joined the team!

I can't review the haskell, but from my perspective this PR LGTM. @piegamesde how does the haskell look to you?

FYI our next formatting team meeting is Tuesday 30th September. The meetings are public so you're welcome to attend: https://github.com/NixOS/nixfmt/tree/master/team

That link also has info about other things like our matrix room.

I'm assuming it'd be best to discuss this at the meeting prior to merge, but I'm fine with merging earlier if we get enough approvals.

jfly

LGTM, just one minor tweak, please!

Thanks to @infinisil for reviewing the Haskell while I screen shared.

src/Nixfmt/Lexer.hs

nixos-discourse · 2025-09-30T19:09:48Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/formatting-team-meeting-2025-09-30/70099/1

also update existing tests to match new logic

language annotations should remain as block comments when directly preceding string literals, while other block comments get converted to line comments. - Detect language annotations: single-line, non-doc comments with valid language identifiers - Preserve as `/* lang */` block comment syntax instead of converting to `# lang` line comments - Works with both regular strings `"..."` and indented strings `''...''`

github-project-automation bot added this to Nix formatting Sep 22, 2025

dyegoaurelio commented Sep 22, 2025

View reviewed changes

MattSturgeon reviewed Sep 23, 2025

View reviewed changes

dyegoaurelio force-pushed the language-annotation-support branch 5 times, most recently from 96b1c0e to 0bde506 Compare September 24, 2025 13:48

dyegoaurelio marked this pull request as ready for review September 24, 2025 13:50

MattSturgeon approved these changes Sep 25, 2025

View reviewed changes

jfly requested changes Sep 30, 2025

View reviewed changes

src/Nixfmt/Lexer.hs Outdated Show resolved Hide resolved

jfly moved this to Approved in Nix formatting Sep 30, 2025

dyegoaurelio added 3 commits September 30, 2025 17:35

Add test cases for language annotation support

9f57124

also update existing tests to match new logic

standard: Include language annotation support

cfbfc7d

dyegoaurelio force-pushed the language-annotation-support branch from 0bde506 to 5438c2b Compare September 30, 2025 20:39

dyegoaurelio requested a review from jfly September 30, 2025 20:40

	# Edge case: very long language annotation
	longAnnotation = /* this-is-a-very-long-language-annotation-that-might-affect-line-length */ ''
	content
	'';

	# Edge case: very long language annotation
	longAnnotation = # this-is-a-very-long-language-annotation-that-might-affect-line-length
	''
	content
	'';

Uh oh!

Language annotation support #343

Are you sure you want to change the base?

Language annotation support #343

Uh oh!

Conversation

dyegoaurelio commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dyegoaurelio Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piegamesde commented Sep 22, 2025

Uh oh!

dyegoaurelio commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MattSturgeon commented Sep 23, 2025

Uh oh!

dyegoaurelio commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MattSturgeon left a comment

Choose a reason for hiding this comment

Uh oh!

jfly left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nixos-discourse commented Sep 30, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 22, 2025 •

edited

Loading

dyegoaurelio Sep 22, 2025 •

edited

Loading

dyegoaurelio commented Sep 23, 2025 •

edited

Loading

jfly left a comment •

edited

Loading