Skip to content

Conversation

dyegoaurelio
Copy link

Initial work to get language annotation support working

This pr is not ready, but I created it mostly to get feedback on how we should handle edge cases and the direction I'm taking.

closes #179

Copy link

github-actions bot commented Sep 22, 2025

Nixpkgs diff

Comment on lines 169 to 171
isLanguageIdentifier :: Text -> Bool
isLanguageIdentifier content =
let stripped = strip content
in not (Text.null stripped)
&& Text.length stripped <= 30 -- TODO: make configurable or remove limit
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a length limit here?

I added this logic as well as a test case for it, but I'm not sure if that's the intended behavior

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on

The rendering engine currently ignores all line length calculations involving comments, however this strongly depends on the assumption that comment tokens are always strictly followed by a line break.
-- @piegamesde #179 (comment)

I'm assuming yes.

As IIUC, the code responsible for line wrapping will assume this token is small and not worth wrapping, since it doesn't know how long comments actually are.

It's also possible that I've completely misunderstood this though!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is related to this comment
This is length limit for considering a inline comment a language notation block.
This is the reason this test case

# Edge case: very long language annotation
longAnnotation = /* this-is-a-very-long-language-annotation-that-might-affect-line-length */ ''
content
'';

Gets formatted as a regular comment

# Edge case: very long language annotation
longAnnotation = # this-is-a-very-long-language-annotation-that-might-affect-line-length
''
content
'';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is related to this comment

My interpretation is the reason we want a length limit on what is considered a "language annotation" is because we can't wrap lines based on line length when the line is long because of a comment.

If we had smarter line wrapping, then we probably wouldn't need a special case for "annotations" and could simply preserve more inline comments more generally.

If we had smarter line wrapping, your test case could probably render something like:

   # Edge case: very long language annotation 
   longAnnotation =
     /* this-is-a-very-long-language-annotation-that-might-affect-line-length */
     '' 
       content 
     ''; 

Or a similar example with some combos of long/short binding names and/or inline comments:

   thisIsAVeryLongBindingNameThatWillAffectLineLengthForSure =
     /* short */ '' 
       content 
     '';
   thisIsALongBindingNameButNotTooLong = /* short */ '' 
     content 
   '';
   thisIsAnotherLongBindingNameButNotTooLong =
     /* medium-length-comment */ '' 
       content 
     '';

My argument is that the only reason we need to handle language annotations specially, is because we currently handle these kinda long comments poorly. Therefore, if we don't cap the length of the language annotations, they will suffer from the same issues: long comments being allowed to stay at the end of long lines.

...at least, if I understand correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine to hard code this limit here or should this come from some cli option?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine to hard code this limit here or should this come from some cli option?

I think it's niche enough not to need a CLI option, personally.

I'll let others review the implementation details, as my Haskell is very weak.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with removing the limit here. The formatter doesn't promise to keep lines under a certain length, this would just be another way to have a long line.

That said, I'm also totally OK with the code as written. We can always make tweaks in the future, and erring on the side of preserving the old behaviour is a safe choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a limit, we could end up with things like:

somewhat_long_name = /* very-long-comment-that-makes-this-line-so-long-you-cant-really-see-the-string-value */ "a moderate length string";

IIUC, the formatter will count that comment as having zero length when deciding whether to do any line wrapping.

Maybe there's a way we could have it so that normal comments are still not counted to the line length, but annotations are?

That sounds like the ideal solution, but I personally don't want to block this PR on it.

Comment on lines +61 to +54
# Language annotation in function arguments
processCode = builtins.readFile (/* lua */ ''
return "Hello"
'');
Copy link
Author

@dyegoaurelio dyegoaurelio Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This input is not idempotent in master

Checking test/diff/language-annotation/in.nix …
<stdin>: Nixfmt is not idempotent. This is a bug in nixfmt. Please report it at https://github.com/NixOS/nixfmt

After one formatting:
builtins.readFile (
  # lua
  ''
    return "Hello"
  ''
)


After two:
builtins.readFile (
  # lua
  ''
    return "Hello"
  '') 

so we're also fixing this bug here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but also I can imagine inputs not covered by this PR which still exhibit this bug, so it might be worth tracking separately

@piegamesde
Copy link
Member

Can you please outline the rough high-level approach taken here to make reading the code easier?

@dyegoaurelio
Copy link
Author

dyegoaurelio commented Sep 23, 2025

Sure!

The test cases on test/diff/language-annotation/in.nix are pretty extensive and cover both when we should and when we shouldn't treat block comments as language annotation.

I added a new Trivium as well a ParseTrivium type to represent language annotation

Most of the logic is on Lexer.hs:
I implemented the languageAnnotation parser to check if a trivia block is a language annotation before it gets assigned to some other type

@MattSturgeon
Copy link

Looks like some test cases are still using the "old" formatting: https://github.com/NixOS/nixfmt/actions/runs/17923265870/job/50963146661?pr=343#step:6:1004

Is that why this is still a draft? Or are there other things to resolve too?

@dyegoaurelio
Copy link
Author

@MattSturgeon Yes.

I also want to get feedback regarding some edge cases that I decided how to handle, such as

#343 (comment)

both of these are ignored:

# Multiple block comments in sequence
sequentialComments = /* first */ /* second */ ''
some content
'';
# Block comment with line breaks
multilineBlockComment = /* this is a
multiline comment */ ''
content
'';

# Multiple block comments in sequence
sequentialComments = # first second
''
some content
'';
# Block comment with line breaks
multilineBlockComment =
/*
this is a
multiline comment
*/
''
content
'';

@dyegoaurelio dyegoaurelio force-pushed the language-annotation-support branch 5 times, most recently from 96b1c0e to 0bde506 Compare September 24, 2025 13:48
@dyegoaurelio dyegoaurelio marked this pull request as ready for review September 24, 2025 13:50
Copy link

@MattSturgeon MattSturgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for working on this, I've been looking forward to this feature since before I joined the team!

I can't review the haskell, but from my perspective this PR LGTM. @piegamesde how does the haskell look to you?

FYI our next formatting team meeting is Tuesday 30th September. The meetings are public so you're welcome to attend: https://github.com/NixOS/nixfmt/tree/master/team

That link also has info about other things like our matrix room.

I'm assuming it'd be best to discuss this at the meeting prior to merge, but I'm fine with merging earlier if we get enough approvals.

Copy link
Collaborator

@jfly jfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor tweak, please!

Thanks to @infinisil for reviewing the Haskell while I screen shared.

@jfly jfly moved this to Approved in Nix formatting Sep 30, 2025
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/formatting-team-meeting-2025-09-30/70099/1

also update existing tests to match new logic
language annotations should remain as block comments when directly preceding string literals,
while other block comments get converted to line comments.

- Detect language annotations: single-line, non-doc comments with valid language identifiers
- Preserve as `/* lang */` block comment syntax instead of converting to `# lang` line comments
- Works with both regular strings `"..."` and indented strings `''...''`
@dyegoaurelio dyegoaurelio force-pushed the language-annotation-support branch from 0bde506 to 5438c2b Compare September 30, 2025 20:39
@dyegoaurelio dyegoaurelio requested a review from jfly September 30, 2025 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Approved
Development

Successfully merging this pull request may close these issues.

Allow language annotation /* comments */ inline
5 participants