From d9e71f5e6ebae6464d4a51b09ab62a6e02e6072d Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:01:50 +0200 Subject: [PATCH 01/78] . --- content/dedented-string-literals.md | 451 ++++++++++++++++++++++++++++ 1 file changed, 451 insertions(+) create mode 100644 content/dedented-string-literals.md diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md new file mode 100644 index 00000000..c1ded9ec --- /dev/null +++ b/content/dedented-string-literals.md @@ -0,0 +1,451 @@ +--- +layout: sip +permalink: /sips/:title.html +stage: implementation +status: under-review +title: SIP-XX - Dedented String Literals +--- + +**By: Li Haoyi** + +## History + +| Date | Version | +|---------------|--------------------| +| Aug 15th 2025 | Initial Draft | + +## Summary + +This SIP proposes a `'''` syntax for dedented multiline string literals that remove leading +indentation at a language level, rather than using the `.stripMargin` library method: + +```scala +> def helper = { + val x = ''' + i am cow + hear me moo + ''' + x + } + +> println(helper) +i am cow +hear me moo +``` + +Dedented strings automatically strip: + +- The first newline after the opening `'''` +- The final newline before the closing `'''` +- Any indentation up to the position of the closing `'''` + +The opening `'''` MUST be followed immediately by a newline, and the trailing `'''` MUST +be preceded by a newline followed by whitespace characters. + +If a user explicitly wants indentation to be present in the string, they +can simply adjust the contents accordingly: + +```scala +> def helper = { + // string with two-space indents before each line + val x = ''' + i am cow + hear me moo + ''' + x + } + +> println(helper) + i am cow + hear me moo +``` + +And if a user wants leading or trailing newlines, they can add those as well + +If a user explicitly wants indentation to be present in the string, they +can simply adjust the contents accordingly: + +```scala +> def helper = { + // string with two-space indents before each line, and leading and trailing newlines + val x = ''' + + i am cow + hear me moo + + ''' + x + } + +> println(helper) + + i am cow + hear me moo + +``` + +In most use cases, we expect `'''` to be preferred, although `"""` strings can continue to +exist for backwards compatibility and referred to as "raw multiline strings". + +Dedented string literals should be able to be used anywhere a normal `"` or triple `"""` +can be used: + +- Literal types (`String & Singleton`) +- String interpolation, with builtin or custom interpolators +- Inline methods and Macros that require the literal value at compile-time +- Pattern matching + +```scala + +foo match{ + case ''' + i am cow + hear me moo + ''' => +} +``` + +As `'''` is not valid syntax in Scala today, there are no backwards compatibility +concerns. + +## Motivation + + +This proposal resolves a lot of issues working with status-quo triple-quoted strings. +These issues aren't rocket science, but are a constant friction that makes working +with triple-quoted strings annoying and unpleasant. + +## Verbosity & Visual Clarity + +Using traditional `""".stripMargin` strings with `|` and `.stripMargin` is very verbose, which +interferes with visually reading the code. + +Furthermore, very often you don't want the leading or trailing newline either, which means +you need to put text on the first line of the multi-line string which breaks vertical +alignment and makes it hard to skim. e.g. see the canonical example above translated +from `'''` strings to status-quo `"""` strings below: + +```scala +def helper = { + val x = """i am cow + |hear me moo""".stripMargin + x +} +``` + +## Incorrectness with Multiline Interpolation + +`""".stripMargin` strings can misbehave when used with string interpolations that may +span multiple lines. For example: + +```scala +def helper = { + val scalazOperators = Seq("<$>", "<*>", "|@|", "|->").mkString(",\n ") + s""" + |import scalaz.{ + | $scalazOperators + |} + |""".stripMargin +} +println("SCALAZ CODE EXAMPLE:\n" + helper) +``` +```scala +SCALAZ CODE EXAMPLE: +import scalaz.{ + <$>, + <*>, +@|, +-> +} +``` + +Note how `.stripMargin` accidentally stripped the `|` from `|@|` and `|->`, which +is not what the user expects, causing a compile error. This is not just a theoretical +concern, but has resulted in multiple bugs in widely-used tools and libraries: + +- `stripMargin`-related bugs were encountered when implementing the + [Ammonite REPL](https://github.com/com-lihaoyi/Ammonite)'s import handling and wrapper-code + generation, which inspired the minimized example above + +- Mill encountered a similar bug recently where `|`s in interpolating strings + were being removed by `stripMargin`, resulting in the documentation examples being incorrect + https://github.com/com-lihaoyi/mill/pull/4544 + +## Literal/Singleton Types + +`.stripLiteral` strings are not literals, and cannot generate `String & Singleton` types +even though from a user perspective the user really may just want a string literal. + +```scala +def helper = { + val x: String & Singleton = """i am cow + |hear me moo""".stripMargin + x +} +``` +```scala +-- [E007] Type Mismatch Error: ------------------------------------------------- +2 | val x: String & Singleton = """i am cow +3 | |hear me moo""".stripMargin + | ^ + | Found: String + | Required: String & Singleton + | + | longer explanation available when compiling with `-explain` +``` + +This means that `""".stripLiteral` strings cannot take part in type-level logic +on `String & Singleton` types like normal strings can: + +```scala +scala> val x: "hello" = "hello" +``` +```scala +val x: "hello" = hello +``` +```scala +scala> val x: """i am cow + | |hear me moo""".stripMargin = """i am cow + | |hear me moo""".stripMargin +``` +```scala +-- Error: ---------------------------------------------------------------------- +2 | |hear me moo""".stripMargin = """i am cow + | ^ + | end of statement expected but '.' found +``` + +This also means that any macros that may work on string literals, e.g. validating +the string literal at build time, would not be able to work with multiline strings. +This includes `inline def`s or macros that may want to validate or process these +string literals at compile time (e.g. validating SQL literals, preventing +directory traversal attacks, pre-compiling regexes or parsers, etc.). + +One example is FastParse's `StringIn` parser which generates code +for efficiently parsing the given strings at compile time, fails when `""".stripMargin` +strings are given: + +```scala +@ def foo[_:P] = P( + StringIn( + """i am cow + |hear me moo""".stripMargin + ) + ) +cmd2.sc:2: Function can only accept constant singleton type + StringIn( + ^ +``` + +`""".stripMargin` strings also cannot participate in pattern matches: + +```scala +def foo: String = ??? + +foo match { + case """i am cow + |hear me moo""".stripMargin => +``` +```scala +-- [E040] Syntax Error: -------------------------------------------------------- +3 | |hear me moo""".stripMargin => + | ^ + | '=>' expected, but '.' found +``` + +## Implementation + +TODO + +## Limitations + +All lines within a dedented `'''` string MUST be indented further than the closing +delimiter. That means this is illegal: + +```scala +def helper = { + // string with two-space indents before each line + val x = ''' + i am cow +hear me moo + ''' + x + } +``` + +Furthermore, the indentation of each line MUST start with the same whitespace characters +as the indentation of the closing delimiter, and cannot e.g. use a different mix of tabs +and spaces. Doing so is an error. + +## Alternatives + +### `.stripMargin` + +The current status quo solution for this is `""".stripMargin` strings, which we have +discussed the limitations and problems with in the [Motivation](#motivation) section above. + +### Dedenting Interpolator + +One option is to use current triple-quoted strings with an interpolator, e.g. + +```scala +def helper = { + val x = tq""" + i am cow + hear me moo + """ + x +} +``` + +This `tq"""` interpolator could be a macro that looks at the source code and removes +indentation, avoiding the problems with runtime indentation removal we +[discussed above](#incorrectness-with-mutliline-interpolation). However, using an interpolator +does not solve the other issues of multiline strings not being valid +[literal types](#literalsingleton-types) + +Having a dedicate `tq"""` interpolator also means multiline strings cannot be used with +other existing interpolators, such as `s""`, `r""`, or user-specified interpolators +like `sql""` introduced by libraries like [ScalaSql](https://github.com/com-lihaoyi/scalasql) + +### Other syntaxes for multiline strings + +`'''` was chosen as a currently-unused syntax in Scala, but other options are also +possible: + +- Triple-double-quotes `"""` are already used with a particular semantic, so we cannot + change those, despite them being used in every other language like [Java](#java), + [C#](#c), and [Swift](#swift). + +- Single-quoted strings with `"` cannot currently span multiple lines, and so + they could be specified to have these semantics when used multi-line. This has + the advantage of not introducing a new delimiter, but the disadvantage that a + single `"` isn't very visually distinct when used for demarcating vertical blocks + of text + +- Triple-backticks are another syntax that is currently available, and so could be used as + a multi-line delimiter. This has the advantage of being similar to blocks used in + markdown, with a similar meaning, but the disadvantage that it would collide if a + user tries to embed Scala code in a markdown code block + +- Other syntaxes like `@"..."` are possible, but probably too esoteric to be worth considering + + +### Triple-Backticked Multiline Strings + +## Prior Art + +Many other languages have exactly this feature, all with exactly the same reason +and exactly the same specification: trimming the leading and trailing newlines, along +with indentation. + +### Java + +Java since [JEP 378](https://openjdk.org/jeps/378) now multiline strings called "text blocks" +that implement exactly this, with identical leading/trailing newline and indentation removal +policies: + +```java +String html = """ + + +

Hello, world

+ + +"""; +``` + +> The re-indentation algorithm takes the content of a text block whose line terminators have +> been normalized to LF. It removes the same amount of white space from each line of content +> until at least one of the lines has a non-white space character in the leftmost position. +> The position of the opening """ characters has no effect on the algorithm, but the position +> of the closing """ characters does have an effect if placed on its own line. + +### C# + +C# has [Raw String Literals](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-11.0/raw-string-literal) +with identical leading/trailing newline and indentation removal policies + +```csharp +var xml = """ + + + + + """; +``` + +> To make the text easy to read and allow for indentation that developers like in code, +> these string literals will naturally remove the indentation specified on the last line +> when producing the final literal value. For example, a literal of the form: + +### Swift + +Swift has [Multiline Strings](https://docs.swift.org/swift-book/documentation/the-swift-programming-language/stringsandcharacters/#Multiline-String-Literals) that behave identically to that described in this proposal, +with identical leading/trailing newline and indentation removal policies + +```swift +let quotation = """ +The White Rabbit put on his spectacles. "Where shall I begin, +please your Majesty?" he asked. + +"Begin at the beginning," the King said gravely, "and go on +till you come to the end; then stop." +""" +``` + +> A multiline string literal includes all of the lines between its opening and closing +> quotation marks. The string begins on the first line after the opening quotation marks +> (""") and ends on the line before the closing quotation marks, which means that neither +> of the strings below start or end with a line break: +> +> A multiline string can be indented to match the surrounding code. The whitespace before the +> closing quotation marks (""") tells Swift what whitespace to ignore before all of the other +> lines. However, if you write whitespace at the beginning of a line in addition to what’s before +> the closing quotation marks, that whitespace is included. + +### Elixer + +Elixer's [Multiline Strings](https://hexdocs.pm/elixir/1.7.4/syntax-reference.html#strings) +behave exactly as this proposal: + +```elixer +...> test = """ +...> this +...> is +...> a +...> test +...> """ +" this\n is\n a\n test\n" +...>test = """ +...> This +...> Is +...> A +...> Test +...> """ +"This\nIs\nA\nTest\n" +``` + +> Multi-line strings in Elixir are written with three double-quotes, and can have unescaped +> quotes within them. The resulting string will end with a newline. The indentation of the +> last """ is used to strip indentation from the inner string. For example: + +### Ruby + +Ruby has [Squiggly Heredoc](https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Strings) +strings that have similar leading/trailing newline removal, but has a +"least indented non-whitespace-only" line policy for removing indentation, rather than a +closing-delimiter policy like the other languages above + +```ruby +expected_result = <<~SQUIGGLY_HEREDOC + This would contain specially formatted text. + + That might span many lines +SQUIGGLY_HEREDOC +``` + +> The indentation of the least-indented line will be removed from each line of the content. +> Note that empty lines and lines consisting solely of literal tabs and spaces will be ignored +> for the purposes of determining indentation, but escaped tabs and spaces are considered +> non-indentation characters. \ No newline at end of file From 096c305c43defc676a6ca38bdd2d9e6ff4aea4eb Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:02:25 +0200 Subject: [PATCH 02/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index c1ded9ec..bb453dec 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -3,7 +3,7 @@ layout: sip permalink: /sips/:title.html stage: implementation status: under-review -title: SIP-XX - Dedented String Literals +title: SIP-XX - Dedented Multiline String Literals --- **By: Li Haoyi** From c67d1cf478b97caff56b434587fdd385de85070a Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:04:37 +0200 Subject: [PATCH 03/78] . --- content/dedented-string-literals.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index bb453dec..a78fdbed 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -259,6 +259,19 @@ TODO ## Limitations +Dedented `'''` strings MUST be multiline strings. Using this syntax for single-line +strings is not allowed, + +```scala +val x = '''hello''' +``` + +As mentioned above, the opening and closing delimiters MUST have a leading/trailing +newline, making the `'''` delimiters "vertical" delimiters that are easy to scan +rather than "horizontal" delimiters like `"` or `"""` which requires the reader +to scan left and right to determine the bounds of the string literal. + + All lines within a dedented `'''` string MUST be indented further than the closing delimiter. That means this is illegal: From 5d65b00bd41e6e5d592f908a20d861086ca83416 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:06:21 +0200 Subject: [PATCH 04/78] . --- content/dedented-string-literals.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index a78fdbed..0790dbee 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -33,6 +33,10 @@ i am cow hear me moo ``` +This is a common feature in other languages (see [Prior Art](#prior-art)) with exactly +the same semantics, although unlike other languages Scala's `"""` already has an existing +semantic, and so for this proposal the currently-unused `'''` syntax is chosen instead. + Dedented strings automatically strip: - The first newline after the opening `'''` From 1fbdebe57bd47b81860e1863b9cefa645ee9630f Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:32:35 +0200 Subject: [PATCH 05/78] . --- content/dedented-string-literals.md | 49 +++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 0790dbee..5b7961c9 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -137,6 +137,55 @@ def helper = { } ``` +In particular, the "shape" of raw multiline string is the line-wrapping +zig-zag that is hard to skim at a glance, and hard to correlate with the +actual contents of the string + +``` + i am cow + hear me moo +``` + +This can be mitigated by indenting the subsequent lines, but that results in lots +of unnecessary indentation + +```scala +def helper = { + val x = """i am cow + |hear me moo""".stripMargin + x +} +``` + +Or it can be solved by following `.stripMargin` with `.trim`, at a cost of more verbosity +and visual noise: + +```scala +def helper = { + val x = """ + |i am cow + |hear me moo + |""".stripMargin.trim + x +} +``` + +There are a huge number of ways to write and format dedented multiline strings today, and yet +none of them are great to look at visually, and there are many ways you can format them badly. +Overall this zoo of options seems inferior to the proposed dedented multiline string syntax, +which has a single valid way of writing the example above, with much better visual clarity than +any of the existing options: + +```scala +def helper = { + val x = ''' + i am cow + hear me moo + ''' + x +} +``` + ## Incorrectness with Multiline Interpolation `""".stripMargin` strings can misbehave when used with string interpolations that may From 34a3018587054a0ad36f81e64751a33ac1d9b209 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:45:38 +0200 Subject: [PATCH 06/78] . --- content/dedented-string-literals.md | 30 +++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 5b7961c9..572fb9b2 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -396,6 +396,36 @@ possible: - Other syntaxes like `@"..."` are possible, but probably too esoteric to be worth considering +### Alternative ways of specifying indentation + +The proposed rule of specifies the indentation to be removed relies on the indentation of +the trailing `'''` delimiter. Other possible approaches include: + +- The minimum indentation of any non-whitespace line within the string, which is why [Ruby does](#ruby) + - This does not allow the user to define strings with all lines indented by some amount, + unless the indentation of the closing delimiter is counted as well. But if the indentation + of the closing delimiter is counted, then it is simpler to just use that, and prohibit + other lines from being indented less than the delimiter + +- An explicit indentation-counter, which is what YAML does, e.g. with the below text block + dedenting the string by 4 characters: +```yaml +example: >4 + Several lines of text, + with some "quotes" of various 'types', + and also a blank line: + + and some text with + extra indentation + on the next line, + plus another line at the end. +``` + +This works, but it is very unintuitive for users to have to translate the indentation to be +removed (which is a visual thing) into a number that gets written at the top of the block. In +contrast, the current proposal specifies the indentation to be removed in terms of the +indentation of the closing delimiter, which keeps it within the "visual" domain without +needing the user to count spaces. ### Triple-Backticked Multiline Strings From 39134b571ae94fec2f2cacb32d3d566d9516485f Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:45:49 +0200 Subject: [PATCH 07/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 572fb9b2..c7a17968 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -374,7 +374,7 @@ Having a dedicate `tq"""` interpolator also means multiline strings cannot be us other existing interpolators, such as `s""`, `r""`, or user-specified interpolators like `sql""` introduced by libraries like [ScalaSql](https://github.com/com-lihaoyi/scalasql) -### Other syntaxes for multiline strings +### Other Delimiters `'''` was chosen as a currently-unused syntax in Scala, but other options are also possible: From a6791455afd41914083f827c6a76756f15f0d61d Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:48:23 +0200 Subject: [PATCH 08/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index c7a17968..465305e4 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -171,7 +171,7 @@ def helper = { ``` There are a huge number of ways to write and format dedented multiline strings today, and yet -none of them are great to look at visually, and there are many ways you can format them badly. +none of them are great to look at visually, and even more ways you can format them badly. Overall this zoo of options seems inferior to the proposed dedented multiline string syntax, which has a single valid way of writing the example above, with much better visual clarity than any of the existing options: From e51fb7c8355d6843e4d8b1445be896f143e8ade1 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:48:44 +0200 Subject: [PATCH 09/78] . --- content/dedented-string-literals.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 465305e4..df1c618e 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -500,12 +500,12 @@ till you come to the end; then stop." > lines. However, if you write whitespace at the beginning of a line in addition to what’s before > the closing quotation marks, that whitespace is included. -### Elixer +### Elixir Elixer's [Multiline Strings](https://hexdocs.pm/elixir/1.7.4/syntax-reference.html#strings) behave exactly as this proposal: -```elixer +```elixir ...> test = """ ...> this ...> is From 32ed3303cb72b8ab8912801532a5cb830b45de1f Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:48:53 +0200 Subject: [PATCH 10/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index df1c618e..72f1ce82 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -502,7 +502,7 @@ till you come to the end; then stop." ### Elixir -Elixer's [Multiline Strings](https://hexdocs.pm/elixir/1.7.4/syntax-reference.html#strings) +Elixir's [Multiline Strings](https://hexdocs.pm/elixir/1.7.4/syntax-reference.html#strings) behave exactly as this proposal: ```elixir From 61553810ef1287f5d63c3d059c7e412efe0198a1 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:50:54 +0200 Subject: [PATCH 11/78] . --- content/dedented-string-literals.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 72f1ce82..8ede2c1f 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -202,6 +202,7 @@ def helper = { } println("SCALAZ CODE EXAMPLE:\n" + helper) ``` + ```scala SCALAZ CODE EXAMPLE: import scalaz.{ @@ -236,6 +237,7 @@ def helper = { x } ``` + ```scala -- [E007] Type Mismatch Error: ------------------------------------------------- 2 | val x: String & Singleton = """i am cow @@ -253,14 +255,17 @@ on `String & Singleton` types like normal strings can: ```scala scala> val x: "hello" = "hello" ``` + ```scala val x: "hello" = hello ``` + ```scala scala> val x: """i am cow | |hear me moo""".stripMargin = """i am cow | |hear me moo""".stripMargin -``` +``` + ```scala -- Error: ---------------------------------------------------------------------- 2 | |hear me moo""".stripMargin = """i am cow @@ -299,6 +304,7 @@ foo match { case """i am cow |hear me moo""".stripMargin => ``` + ```scala -- [E040] Syntax Error: -------------------------------------------------------- 3 | |hear me moo""".stripMargin => From beb15977b804877b5d9c47bba05f22747800d0f5 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:52:06 +0200 Subject: [PATCH 12/78] . --- content/dedented-string-literals.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 8ede2c1f..6fa16743 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -40,11 +40,13 @@ semantic, and so for this proposal the currently-unused `'''` syntax is chosen i Dedented strings automatically strip: - The first newline after the opening `'''` -- The final newline before the closing `'''` +- The final newline and any whitespace before the closing `'''` - Any indentation up to the position of the closing `'''` The opening `'''` MUST be followed immediately by a newline, and the trailing `'''` MUST -be preceded by a newline followed by whitespace characters. +be preceded by a newline followed by whitespace characters. Lines within the +dedented string MUST be either empty, or have indentation equal-to-or-greater-than +the closing delimiter. If a user explicitly wants indentation to be present in the string, they can simply adjust the contents accordingly: From b4cacb25109d315668baf18cf4aa41cbf2d67710 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:52:29 +0200 Subject: [PATCH 13/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 6fa16743..1ad66d24 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -41,7 +41,7 @@ Dedented strings automatically strip: - The first newline after the opening `'''` - The final newline and any whitespace before the closing `'''` -- Any indentation up to the position of the closing `'''` +- Any indentation on every line up to the position of the closing `'''` The opening `'''` MUST be followed immediately by a newline, and the trailing `'''` MUST be preceded by a newline followed by whitespace characters. Lines within the From d6fae4d23922967beba41c63236ddbf19e1238da Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 21:59:18 +0200 Subject: [PATCH 14/78] . --- content/dedented-string-literals.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 1ad66d24..bfee7f45 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -229,7 +229,7 @@ concern, but has resulted in multiple bugs in widely-used tools and libraries: ## Literal/Singleton Types -`.stripLiteral` strings are not literals, and cannot generate `String & Singleton` types +`.stripMargin` strings are not literals, and cannot generate `String & Singleton` types even though from a user perspective the user really may just want a string literal. ```scala @@ -251,7 +251,7 @@ def helper = { | longer explanation available when compiling with `-explain` ``` -This means that `""".stripLiteral` strings cannot take part in type-level logic +This means that `""".stripMargin` strings cannot take part in type-level logic on `String & Singleton` types like normal strings can: ```scala From 61f2706ef1374857b57a190fd8220e85a4661256 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 22:40:22 +0200 Subject: [PATCH 15/78] . --- content/dedented-string-literals.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index bfee7f45..87d63ce9 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -275,6 +275,8 @@ scala> val x: """i am cow | end of statement expected but '.' found ``` +## Literal String Expressions + This also means that any macros that may work on string literals, e.g. validating the string literal at build time, would not be able to work with multiline strings. This includes `inline def`s or macros that may want to validate or process these @@ -314,6 +316,26 @@ foo match { | '=>' expected, but '.' found ``` +`""".stripMargin` cannot be used in annotations like `@implicitNotFound`. As shown below, +it does not properly update the error message, because `""".stripMargin` is not a literal +string. Using triple-quoted strings without `.stripMargin` results in the error message being +updated correctly, but then you lose the ability to properly dedent the error: + +```scala +scala> @scala.annotation.implicitNotFound( + | """i am cow + | |hear me moo""".stripMargin.toUpperCase) class Foo() +// defined class Foo + +scala> implicitly[Foo] +-- [E172] Type Error: ---------------------------------------------------------- +1 |implicitly[Foo] + | ^ + |No given instance of type Foo was found for parameter e of method implicitly in object Predef +1 error found + +``` + ## Implementation TODO From b59276fc3ab072c111b0863f7909c7f5fad0b13f Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 15 Aug 2025 22:40:44 +0200 Subject: [PATCH 16/78] . --- content/dedented-string-literals.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 87d63ce9..9c0d255d 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -121,7 +121,7 @@ This proposal resolves a lot of issues working with status-quo triple-quoted str These issues aren't rocket science, but are a constant friction that makes working with triple-quoted strings annoying and unpleasant. -## Verbosity & Visual Clarity +### Verbosity & Visual Clarity Using traditional `""".stripMargin` strings with `|` and `.stripMargin` is very verbose, which interferes with visually reading the code. @@ -188,7 +188,7 @@ def helper = { } ``` -## Incorrectness with Multiline Interpolation +### Incorrectness with Multiline Interpolation `""".stripMargin` strings can misbehave when used with string interpolations that may span multiple lines. For example: @@ -227,7 +227,7 @@ concern, but has resulted in multiple bugs in widely-used tools and libraries: were being removed by `stripMargin`, resulting in the documentation examples being incorrect https://github.com/com-lihaoyi/mill/pull/4544 -## Literal/Singleton Types +### Literal/Singleton Types `.stripMargin` strings are not literals, and cannot generate `String & Singleton` types even though from a user perspective the user really may just want a string literal. @@ -275,7 +275,7 @@ scala> val x: """i am cow | end of statement expected but '.' found ``` -## Literal String Expressions +### Literal String Expressions This also means that any macros that may work on string literals, e.g. validating the string literal at build time, would not be able to work with multiline strings. From e0995cccc9592508fe0896dd665a02babf4be8a9 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 06:38:38 +0200 Subject: [PATCH 17/78] . --- content/dedented-string-literals.md | 71 ++++++++++++++++++++++++++++- 1 file changed, 69 insertions(+), 2 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 9c0d255d..880fe833 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -93,6 +93,29 @@ can simply adjust the contents accordingly: In most use cases, we expect `'''` to be preferred, although `"""` strings can continue to exist for backwards compatibility and referred to as "raw multiline strings". +We allow _Extended Delimiters_ with more than three `'''`, to allow the strings to contain arbitrary +contents, similar to what is provided in [C#](#c) and [Swift](#swift). e.g. if you want the string to contain `'''`, you can use a four-`''''` delimiter to +stop the `'''` within the body from prematurely closing the literal: + + +```scala +> def helper = { + val x = '''' + ''' + i am cow + hear me moo + ''' + '''' + x + } + +> println(helper) +''' +i am cow +hear me moo +''' +``` + Dedented string literals should be able to be used anywhere a normal `"` or triple `"""` can be used: @@ -463,7 +486,8 @@ needing the user to count spaces. Many other languages have exactly this feature, all with exactly the same reason and exactly the same specification: trimming the leading and trailing newlines, along -with indentation. +with indentation. Many have similar rules for flexible delimiters to allow the strings +to contain arbitrary contents ### Java @@ -487,6 +511,9 @@ String html = """ > The position of the opening """ characters has no effect on the algorithm, but the position > of the closing """ characters does have an effect if placed on its own line. +Java doesn't have extended delimiters like those proposed here, but requires you to escape +`\"""` included in the text block using a backslash to prevent premature closing of the literal. + ### C# C# has [Raw String Literals](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-11.0/raw-string-literal) @@ -505,6 +532,17 @@ var xml = """ > these string literals will naturally remove the indentation specified on the last line > when producing the final literal value. For example, a literal of the form: +C# also allows arbitrary-length delimiters as described in this propsoal + +```csharp +var xml = """" + Ok to use """ here + """"; +``` + +> Because the nested contents might itself want to use """ then the starting/ending +> delimiters can be longer + ### Swift Swift has [Multiline Strings](https://docs.swift.org/swift-book/documentation/the-swift-programming-language/stringsandcharacters/#Multiline-String-Literals) that behave identically to that described in this proposal, @@ -530,6 +568,19 @@ till you come to the end; then stop." > lines. However, if you write whitespace at the beginning of a line in addition to what’s before > the closing quotation marks, that whitespace is included. +Swift also supports extended delimiters similar to that described in this proposal: + +```swift +let threeMoreDoubleQuotationMarks = #""" +Here are three more double quotes: """ +"""# +``` + +> String literals created using extended delimiters can also be multiline string literals. +> You can use extended delimiters to include the text """ in a multiline string, overriding +> the default behavior that ends the literal. For example: + + ### Elixir Elixir's [Multiline Strings](https://hexdocs.pm/elixir/1.7.4/syntax-reference.html#strings) @@ -556,6 +607,10 @@ behave exactly as this proposal: > quotes within them. The resulting string will end with a newline. The indentation of the > last """ is used to strip indentation from the inner string. For example: +Elixir allows both `"""` and `'''` syntax for multi-line strings, with `'''`-delimited strings +allowing you to embed `"""` in the body (and vice versa). This is similar to Python's syntax +for triple-quoted strings + ### Ruby Ruby has [Squiggly Heredoc](https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Strings) @@ -574,4 +629,16 @@ SQUIGGLY_HEREDOC > The indentation of the least-indented line will be removed from each line of the content. > Note that empty lines and lines consisting solely of literal tabs and spaces will be ignored > for the purposes of determining indentation, but escaped tabs and spaces are considered -> non-indentation characters. \ No newline at end of file +> non-indentation characters. + +As a HEREDOC-inspired syntax, you can change the header of your multi-line string in Ruby +to include arbitrary text in the body. This is different in form but similar in function to +the extended delimiters described in this proposal + +```ruby +expected_result = <<~MY_CUSTOM_SQUIGGLY_HEREDOC + This would contain specially formatted text. + SQUIGGLY_HEREDOC + That might span many lines +MY_CUSTOM_SQUIGGLY_HEREDOC +``` \ No newline at end of file From 37dc4d919cee989d0eede4fc91aa981d4e136d5d Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 07:48:23 +0200 Subject: [PATCH 18/78] . --- content/dedented-string-literals.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 880fe833..22d75507 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -436,6 +436,16 @@ possible: change those, despite them being used in every other language like [Java](#java), [C#](#c), and [Swift](#swift). +- Similarly, we cannot use four-double-quotes as the delimiter for the new semantics, + because those are already valid syntax today, and (perhaps unintuitively) represent + triple-quoted strings with quotes inside of them: + +```scala +@ """" + """".toCharArray +res0: Array[Char] = Array('\"', '\n', '\"') +``` + - Single-quoted strings with `"` cannot currently span multiple lines, and so they could be specified to have these semantics when used multi-line. This has the advantage of not introducing a new delimiter, but the disadvantage that a From 2ae17f3a13df4fb857d2d3a2b44c66e4a8808fe7 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 07:53:15 +0200 Subject: [PATCH 19/78] . --- content/dedented-string-literals.md | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 22d75507..71280e57 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -195,11 +195,22 @@ def helper = { } ``` -There are a huge number of ways to write and format dedented multiline strings today, and yet -none of them are great to look at visually, and even more ways you can format them badly. -Overall this zoo of options seems inferior to the proposed dedented multiline string syntax, -which has a single valid way of writing the example above, with much better visual clarity than -any of the existing options: +It can also be mitigated by indenting it as follows: + +```scala +def helper = { + val x = + """i am cow + |hear me moo""".stripMargin + x +} +``` + +In general, there are a huge number of ways to write and format dedented multiline strings +today, and yet none of them are great to look at visually, and there are even more ways you +can format them badly. Overall this zoo of options seems inferior to the proposed dedented +multiline string syntax, which has a single valid way of writing the example above, with +much better visual clarity than any of the existing options: ```scala def helper = { @@ -211,6 +222,14 @@ def helper = { } ``` +Note how with this dedented multi-line string, the string contents forms a single rectangular +block on screen, so you don't need to read the code line-by-line left-to-right to see +the contents of the string as you sometimes have to do with triple-quoted strings. There is also +no non-string contents to the left or to the right of the string contents: `|`s, opening or +closing `"""`s, or `.stripMargin` method calls. This makes the multiline string contents stand +out clearly from the rest of the code without distraction. + + ### Incorrectness with Multiline Interpolation `""".stripMargin` strings can misbehave when used with string interpolations that may From 07b78c8b704907c19ce6652587aadd74ebe0b804 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 07:53:33 +0200 Subject: [PATCH 20/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 71280e57..92a3d0aa 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -222,7 +222,7 @@ def helper = { } ``` -Note how with this dedented multi-line string, the string contents forms a single rectangular +Note how with this dedented string literal, the string contents forms a single rectangular block on screen, so you don't need to read the code line-by-line left-to-right to see the contents of the string as you sometimes have to do with triple-quoted strings. There is also no non-string contents to the left or to the right of the string contents: `|`s, opening or From 2d2750e45204793e5d0e67dcc21a30d3db6eaf43 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 07:53:54 +0200 Subject: [PATCH 21/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 92a3d0aa..a271dfe2 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -206,7 +206,7 @@ def helper = { } ``` -In general, there are a huge number of ways to write and format dedented multiline strings +There are a huge number of ways to write and format dedented multiline strings today, and yet none of them are great to look at visually, and there are even more ways you can format them badly. Overall this zoo of options seems inferior to the proposed dedented multiline string syntax, which has a single valid way of writing the example above, with From 7deb896562f2be8f8b1da2456368fbc8fb846f99 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 07:56:40 +0200 Subject: [PATCH 22/78] . --- content/dedented-string-literals.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index a271dfe2..747b20e4 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -222,12 +222,18 @@ def helper = { } ``` -Note how with this dedented string literal, the string contents forms a single rectangular -block on screen, so you don't need to read the code line-by-line left-to-right to see -the contents of the string as you sometimes have to do with triple-quoted strings. There is also -no non-string contents to the left or to the right of the string contents: `|`s, opening or -closing `"""`s, or `.stripMargin` method calls. This makes the multiline string contents stand -out clearly from the rest of the code without distraction. +Note how with this dedented string literal: + +* The string contents forms a single rectangular block on screen, so you don't need to + read the code in a zig-zag fashion line-by-line left-to-right to see the contents of the string + +* There is also no non-string contents to the left or to the right of the string contents: `|`s, opening or + closing `"""`s, or `.stripMargin` method calls. This makes the multiline string contents stand + out clearly from the rest of the code without distraction. + +* The amount of horizontal-space used is much less than the examples using traditional multiline + strings above: without multiple levels of indentations and without a trailing `""".stripMargin` + extending the last line. ### Incorrectness with Multiline Interpolation From f00f2f6627f08082856fa53483143e804ce4a74b Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 07:57:09 +0200 Subject: [PATCH 23/78] . --- content/dedented-string-literals.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 747b20e4..d20dd0ae 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -226,6 +226,7 @@ Note how with this dedented string literal: * The string contents forms a single rectangular block on screen, so you don't need to read the code in a zig-zag fashion line-by-line left-to-right to see the contents of the string + (as you would have to do with the first of the example of above) * There is also no non-string contents to the left or to the right of the string contents: `|`s, opening or closing `"""`s, or `.stripMargin` method calls. This makes the multiline string contents stand From 68d1050c9740bc7a265c2d22f07d4aae941e480a Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 07:57:41 +0200 Subject: [PATCH 24/78] . --- content/dedented-string-literals.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index d20dd0ae..6cb48c62 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -233,9 +233,8 @@ Note how with this dedented string literal: out clearly from the rest of the code without distraction. * The amount of horizontal-space used is much less than the examples using traditional multiline - strings above: without multiple levels of indentations and without a trailing `""".stripMargin` - extending the last line. - + strings above: without multiple levels of indentations, without a trailing `""".stripMargin` + extending the last line, or `.stripMargin.trim` ### Incorrectness with Multiline Interpolation From fd84024702ba0e0664b335015568a70462aaae45 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 08:03:56 +0200 Subject: [PATCH 25/78] . --- content/dedented-string-literals.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 6cb48c62..67469417 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -515,8 +515,6 @@ contrast, the current proposal specifies the indentation to be removed in terms indentation of the closing delimiter, which keeps it within the "visual" domain without needing the user to count spaces. -### Triple-Backticked Multiline Strings - ## Prior Art Many other languages have exactly this feature, all with exactly the same reason From b21ef428567f568c78d9c8b5234c32b6c8b8dce3 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sat, 16 Aug 2025 08:07:55 +0200 Subject: [PATCH 26/78] . --- content/dedented-string-literals.md | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 67469417..1842850c 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -472,15 +472,25 @@ res0: Array[Char] = Array('\"', '\n', '\"') ``` - Single-quoted strings with `"` cannot currently span multiple lines, and so - they could be specified to have these semantics when used multi-line. This has - the advantage of not introducing a new delimiter, but the disadvantage that a - single `"` isn't very visually distinct when used for demarcating vertical blocks - of text + they could be specified to have these dedenting semantics when used multi-line. + - This has the advantage of not introducing a new delimiter, as `"` is already + used for strings + - This has the disadvantage that a single `"` isn't very visually distinct + when used for demarcating blocks of text, and separating them vertically + from the code before and after + - Some languages do have multi-line strings with single-character delimiters, + e.g. Javascripts template literals use a single-backtick - Triple-backticks are another syntax that is currently available, and so could be used as a multi-line delimiter. This has the advantage of being similar to blocks used in - markdown, with a similar meaning, but the disadvantage that it would collide if a - user tries to embed Scala code in a markdown code block + markdown, with a similar meaning, but several disadvantages: + - It would collide if a user tries to embed Scala code in a markdown code block. In fact, + I couldn't even figure out how to embed triple-backticks in this document! + - Backticks are currently used for identifiers while single-quotes are used for literals, + so single-quotes seems more appropriate to use for multi-line literals than backticks. + - Single-quotes also would look more familiar to anyone coming from other languages like + Python or Elixir (albeit with slightly different semantics) while triple-backticks have + no precedence in any programming language. - Other syntaxes like `@"..."` are possible, but probably too esoteric to be worth considering From 3a3aeb037d810dc59aa802a0d528349678fa47a0 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sun, 17 Aug 2025 06:56:54 +0200 Subject: [PATCH 27/78] . --- content/dedented-string-literals.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 1842850c..ae056819 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -446,11 +446,15 @@ This `tq"""` interpolator could be a macro that looks at the source code and rem indentation, avoiding the problems with runtime indentation removal we [discussed above](#incorrectness-with-mutliline-interpolation). However, using an interpolator does not solve the other issues of multiline strings not being valid -[literal types](#literalsingleton-types) +[literal types](#literalsingleton-types) or [literal string expressions](#literal-string-expressions). -Having a dedicate `tq"""` interpolator also means multiline strings cannot be used with -other existing interpolators, such as `s""`, `r""`, or user-specified interpolators -like `sql""` introduced by libraries like [ScalaSql](https://github.com/com-lihaoyi/scalasql) +Custom interpolators also do not compose: having a dedicate `tq"""` interpolator also +means multiline strings cannot be used with other existing interpolators, such as `s""`, +`r""`, or user-specified interpolators like `sql""` introduced by libraries like +[ScalaSql](https://github.com/com-lihaoyi/scalasql). + +A macro-based `.stripMarginMacro` could avoid the issue with composition of interpolators, but +otherwise suffers from all the other issues mentioned above. ### Other Delimiters From 3bf22d6141e86326cfc92a0c5de2ceb8a6e9402e Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sun, 17 Aug 2025 06:59:34 +0200 Subject: [PATCH 28/78] . --- content/dedented-string-literals.md | 47 +++++++++++++++++------------ 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index ae056819..89465a83 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -347,23 +347,6 @@ cmd2.sc:2: Function can only accept constant singleton type ^ ``` -`""".stripMargin` strings also cannot participate in pattern matches: - -```scala -def foo: String = ??? - -foo match { - case """i am cow - |hear me moo""".stripMargin => -``` - -```scala --- [E040] Syntax Error: -------------------------------------------------------- -3 | |hear me moo""".stripMargin => - | ^ - | '=>' expected, but '.' found -``` - `""".stripMargin` cannot be used in annotations like `@implicitNotFound`. As shown below, it does not properly update the error message, because `""".stripMargin` is not a literal string. Using triple-quoted strings without `.stripMargin` results in the error message being @@ -384,6 +367,26 @@ scala> implicitly[Foo] ``` +### Pattern Matching + + +`""".stripMargin` strings also cannot participate in pattern matches: + +```scala +def foo: String = ??? + +foo match { + case """i am cow + |hear me moo""".stripMargin => +``` + +```scala +-- [E040] Syntax Error: -------------------------------------------------------- +3 | |hear me moo""".stripMargin => + | ^ + | '=>' expected, but '.' found +``` + ## Implementation TODO @@ -447,14 +450,20 @@ indentation, avoiding the problems with runtime indentation removal we [discussed above](#incorrectness-with-mutliline-interpolation). However, using an interpolator does not solve the other issues of multiline strings not being valid [literal types](#literalsingleton-types) or [literal string expressions](#literal-string-expressions). +A custom interpolator could also work in [pattern matching](#pattern-matching) Custom interpolators also do not compose: having a dedicate `tq"""` interpolator also means multiline strings cannot be used with other existing interpolators, such as `s""`, `r""`, or user-specified interpolators like `sql""` introduced by libraries like [ScalaSql](https://github.com/com-lihaoyi/scalasql). -A macro-based `.stripMarginMacro` could avoid the issue with composition of interpolators, but -otherwise suffers from all the other issues mentioned above. +### Macro-based `.stripMargin` + +A macro-based `.stripMarginMacro` could avoid the issue with composition of interpolators +mentioned above, but still will suffer from the issue of not being +[literal types](#literalsingleton-types) or +[literal string expressions](#literal-string-expressions), and also would not work +in [pattern matching](#pattern-matching). ### Other Delimiters From 3ef22ee27f7c5a35ca5be8d36d27b2aadc150458 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sun, 17 Aug 2025 07:01:40 +0200 Subject: [PATCH 29/78] . --- content/dedented-string-literals.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 89465a83..7654af91 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -387,6 +387,19 @@ foo match { | '=>' expected, but '.' found ``` +Both normal single-quoted `"` and triple-quoted `"""` strings can be pattern matched on, +but with triple-quoted strings it includes indentation and so is frustrating to use in practice +due to needing to manually de-dent the string to avoid matching the indentation + +```scala +def helper = { + foo match { + case """i am cow +hear me moo""" => + } +} +``` + ## Implementation TODO From eb9fe0b89ecc52cb93b0280130dc634321bb0c96 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sun, 17 Aug 2025 12:59:40 +0200 Subject: [PATCH 30/78] . --- content/dedented-string-literals.md | 33 +++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 7654af91..d1ed7485 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -400,6 +400,39 @@ hear me moo""" => } ``` +### Downstream Tooling Complexity + +The last major problem with the existing `""".stripMargin` pattern is that all tools +that read, write, or analyze Scala source files or classfiles need to be aware of it. +This results in complexity of these tools' user experience or implementation, or bugs +if the tool does not handle it precisely. + +1. [pprint.log](https://github.com/com-lihaoyi/PPrint) prints out multi-line strings + triple-quoted, but these cannot be pasted into source code without fiddling + with `|`s and `.stripMargin`s to make sure the indentation is fixed + +2. [uTest's assertGoldenLiteral](https://github.com/com-lihaoyi/utest?tab=readme-ov-file#assertgoldenliteral) + does not work for multi-line strings because of not handling indentation properly + +3. [munit hardcodes support for """.stripMargin strings](https://scalameta.org/munit/docs/assertions.html#assertnodiff) + when pretty-printing values in errors + +4. [Mill's bytecode change-detection](https://github.com/com-lihaoyi/mill/pull/2417) + will detect spurious changes if a `""".stripMargin` string is indented or de-dented, + due to not recognizing the pattern in the bytecode + +5. [ScalaFmt's flag assumeStandardLibraryStripMargin](https://scalameta.org/scalafmt/docs/configuration.html#assumestandardlibrarystripmargin) + adds a special case, complicating it's configuration schema due to the fact that `stripMargin` + is a library method and thus ScalaFmt cannot guarantee it's behavior + +6. IntelliJ IDEA needs [substantial amounts of special casing](https://github.com/JetBrains/intellij-scala/blob/idea252.x/scala/scala-impl/src/org/jetbrains/plugins/scala/format/StripMarginParser.scala) + to parse, format, and generate `""".stripMargin` strings. + +All this complexity would go away with the proposed de-dented multiline strings: rather +than every downstream tool needing hard-coded support to be `stripMargin`-aware, +tools will only need to generate `'''` multiline strings, which can then be pasted into +user code with arbitrary indentation and they will do the right thing. + ## Implementation TODO From 00a2f4442f7b72b242ddced6604354a9c3bb85f2 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Sun, 17 Aug 2025 14:50:52 +0200 Subject: [PATCH 31/78] . --- content/dedented-string-literals.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index d1ed7485..c436c2d6 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -539,6 +539,21 @@ res0: Array[Char] = Array('\"', '\n', '\"') from the code before and after - Some languages do have multi-line strings with single-character delimiters, e.g. Javascripts template literals use a single-backtick + - A single `"` would require that `"`s in the multi-line string be escaped. Given + that `"`s are very common characters to have in strings, that would be very annoying, + and mean that people would still need to use `""".stripMargin` strings in common cases + - If we don't rely on `"`s to be escaped in the common case, it could be hard to + define a rule to say which `"` does close the string and which `"` does not, + and users may have trouble visually parsing code following that rule. e.g. + +```scala +def openingParagraph = " + One dark and stormy night, + he said + "...i am cow + hear me moo" +".toJson +``` - Triple-backticks are another syntax that is currently available, and so could be used as a multi-line delimiter. This has the advantage of being similar to blocks used in From 77e03e5eb82a8ebc25d6768d3e8547d43ca43dd2 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 08:37:32 +0200 Subject: [PATCH 32/78] . --- content/dedented-string-literals.md | 314 ++++++++++++++++++++-------- 1 file changed, 225 insertions(+), 89 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index c436c2d6..4edcb97c 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -39,12 +39,12 @@ semantic, and so for this proposal the currently-unused `'''` syntax is chosen i Dedented strings automatically strip: -- The first newline after the opening `'''` +- The first newline after the opening `'''` - The final newline and any whitespace before the closing `'''` - Any indentation on every line up to the position of the closing `'''` The opening `'''` MUST be followed immediately by a newline, and the trailing `'''` MUST -be preceded by a newline followed by whitespace characters. Lines within the +be preceded by a newline followed by whitespace characters. Lines within the dedented string MUST be either empty, or have indentation equal-to-or-greater-than the closing delimiter. @@ -94,7 +94,7 @@ In most use cases, we expect `'''` to be preferred, although `"""` strings can c exist for backwards compatibility and referred to as "raw multiline strings". We allow _Extended Delimiters_ with more than three `'''`, to allow the strings to contain arbitrary -contents, similar to what is provided in [C#](#c) and [Swift](#swift). e.g. if you want the string to contain `'''`, you can use a four-`''''` delimiter to +contents, similar to what is provided in [C#](#c) and [Swift](#swift). e.g. if you want the string to contain `'''`, you can use a four-`''''` delimiter to stop the `'''` within the body from prematurely closing the literal: @@ -117,7 +117,7 @@ hear me moo ``` Dedented string literals should be able to be used anywhere a normal `"` or triple `"""` -can be used: +can be used: - Literal types (`String & Singleton`) - String interpolation, with builtin or custom interpolators @@ -125,7 +125,6 @@ can be used: - Pattern matching ```scala - foo match{ case ''' i am cow @@ -134,8 +133,12 @@ foo match{ } ``` -As `'''` is not valid syntax in Scala today, there are no backwards compatibility -concerns. +As `'''` is not valid syntax in Scala today, there are no technical backwards compatibility +concerns. See the section on [Choice Of Delimiters](#choice-of-delimiters) for a discussion on why +`'''` is proposed and some viable alternatives + +We expect that traditional `"""` strings will remain in use - e.g. for single-line scenarios - +but most multi-line strings would be served better by `'''` as the default choice ## Motivation @@ -147,7 +150,8 @@ with triple-quoted strings annoying and unpleasant. ### Verbosity & Visual Clarity Using traditional `""".stripMargin` strings with `|` and `.stripMargin` is very verbose, which -interferes with visually reading the code. +interferes with visually reading the code. There are many different ways to format them, +none of them particularly good, and many of them bad. Furthermore, very often you don't want the leading or trailing newline either, which means you need to put text on the first line of the multi-line string which breaks vertical @@ -199,14 +203,14 @@ It can also be mitigated by indenting it as follows: ```scala def helper = { - val x = + val x = """i am cow |hear me moo""".stripMargin x } ``` -There are a huge number of ways to write and format dedented multiline strings +There are a huge number of ways to write and format dedented multiline strings today, and yet none of them are great to look at visually, and there are even more ways you can format them badly. Overall this zoo of options seems inferior to the proposed dedented multiline string syntax, which has a single valid way of writing the example above, with @@ -228,17 +232,17 @@ Note how with this dedented string literal: read the code in a zig-zag fashion line-by-line left-to-right to see the contents of the string (as you would have to do with the first of the example of above) -* There is also no non-string contents to the left or to the right of the string contents: `|`s, opening or - closing `"""`s, or `.stripMargin` method calls. This makes the multiline string contents stand +* There is also no non-string contents to the left or to the right of the string contents: `|`s, opening or + closing `"""`s, or `.stripMargin` method calls. This makes the multiline string contents stand out clearly from the rest of the code without distraction. -* The amount of horizontal-space used is much less than the examples using traditional multiline - strings above: without multiple levels of indentations, without a trailing `""".stripMargin` +* The amount of horizontal-space used is much less than the examples using traditional multiline + strings above: without multiple levels of indentations, without a trailing `""".stripMargin` extending the last line, or `.stripMargin.trim` ### Incorrectness with Multiline Interpolation -`""".stripMargin` strings can misbehave when used with string interpolations that may +`""".stripMargin` strings can misbehave when used with string interpolations that may span multiple lines. For example: ```scala @@ -246,7 +250,7 @@ def helper = { val scalazOperators = Seq("<$>", "<*>", "|@|", "|->").mkString(",\n ") s""" |import scalaz.{ - | $scalazOperators + | $scalazOperators |} |""".stripMargin } @@ -259,7 +263,7 @@ import scalaz.{ <$>, <*>, @|, --> +-> } ``` @@ -278,7 +282,7 @@ concern, but has resulted in multiple bugs in widely-used tools and libraries: ### Literal/Singleton Types `.stripMargin` strings are not literals, and cannot generate `String & Singleton` types -even though from a user perspective the user really may just want a string literal. +even though from a user perspective the user really may just want a string literal. ```scala def helper = { @@ -314,7 +318,7 @@ val x: "hello" = hello scala> val x: """i am cow | |hear me moo""".stripMargin = """i am cow | |hear me moo""".stripMargin -``` +``` ```scala -- Error: ---------------------------------------------------------------------- @@ -326,12 +330,12 @@ scala> val x: """i am cow ### Literal String Expressions This also means that any macros that may work on string literals, e.g. validating -the string literal at build time, would not be able to work with multiline strings. +the string literal at build time, would not be able to work with `""".stripMargin` strings. This includes `inline def`s or macros that may want to validate or process these string literals at compile time (e.g. validating SQL literals, preventing -directory traversal attacks, pre-compiling regexes or parsers, etc.). +directory traversal attacks, pre-compiling regexes or parsers, etc.). -One example is FastParse's `StringIn` parser which generates code +One example is FastParse's `StringIn` parser which generates code for efficiently parsing the given strings at compile time, fails when `""".stripMargin` strings are given: @@ -341,7 +345,7 @@ strings are given: """i am cow |hear me moo""".stripMargin ) - ) + ) cmd2.sc:2: Function can only accept constant singleton type StringIn( ^ @@ -357,14 +361,14 @@ scala> @scala.annotation.implicitNotFound( | """i am cow | |hear me moo""".stripMargin.toUpperCase) class Foo() // defined class Foo - + scala> implicitly[Foo] -- [E172] Type Error: ---------------------------------------------------------- 1 |implicitly[Foo] | ^ |No given instance of type Foo was found for parameter e of method implicitly in object Predef 1 error found - + ``` ### Pattern Matching @@ -374,7 +378,7 @@ scala> implicitly[Foo] ```scala def foo: String = ??? - + foo match { case """i am cow |hear me moo""".stripMargin => @@ -382,13 +386,13 @@ foo match { ```scala -- [E040] Syntax Error: -------------------------------------------------------- -3 | |hear me moo""".stripMargin => +3 | |hear me moo""".stripMargin => | ^ | '=>' expected, but '.' found ``` -Both normal single-quoted `"` and triple-quoted `"""` strings can be pattern matched on, -but with triple-quoted strings it includes indentation and so is frustrating to use in practice +Both normal single-quoted `"` and triple-quoted `"""` strings can be pattern matched on, +but with triple-quoted strings it includes indentation and so is frustrating to use in practice due to needing to manually de-dent the string to avoid matching the indentation ```scala @@ -429,7 +433,7 @@ if the tool does not handle it precisely. to parse, format, and generate `""".stripMargin` strings. All this complexity would go away with the proposed de-dented multiline strings: rather -than every downstream tool needing hard-coded support to be `stripMargin`-aware, +than every downstream tool needing hard-coded support to be `stripMargin`-aware, tools will only need to generate `'''` multiline strings, which can then be pasted into user code with arbitrary indentation and they will do the right thing. @@ -440,16 +444,16 @@ TODO ## Limitations Dedented `'''` strings MUST be multiline strings. Using this syntax for single-line -strings is not allowed, +strings is not allowed, ```scala val x = '''hello''' ``` -As mentioned above, the opening and closing delimiters MUST have a leading/trailing +As mentioned above, the opening and closing delimiters MUST have a leading/trailing newline, making the `'''` delimiters "vertical" delimiters that are easy to scan rather than "horizontal" delimiters like `"` or `"""` which requires the reader -to scan left and right to determine the bounds of the string literal. +to scan left and right to determine the bounds of the string literal. All lines within a dedented `'''` string MUST be indented further than the closing @@ -492,15 +496,15 @@ def helper = { ``` This `tq"""` interpolator could be a macro that looks at the source code and removes -indentation, avoiding the problems with runtime indentation removal we -[discussed above](#incorrectness-with-mutliline-interpolation). However, using an interpolator -does not solve the other issues of multiline strings not being valid -[literal types](#literalsingleton-types) or [literal string expressions](#literal-string-expressions). -A custom interpolator could also work in [pattern matching](#pattern-matching) +indentation, avoiding the problems with runtime indentation removal we +[discussed above](#incorrectness-with-mutliline-interpolation). A custom interpolator could also work in [pattern matching](#pattern-matching). +However, using an interpolator does not solve the other issues of multiline strings +not being valid [literal types](#literalsingleton-types) or [literal string expressions](#literal-string-expressions). + Custom interpolators also do not compose: having a dedicate `tq"""` interpolator also means multiline strings cannot be used with other existing interpolators, such as `s""`, -`r""`, or user-specified interpolators like `sql""` introduced by libraries like +`r""`, or user-specified interpolators like `sql""` introduced by libraries like [ScalaSql](https://github.com/com-lihaoyi/scalasql). ### Macro-based `.stripMargin` @@ -511,12 +515,45 @@ mentioned above, but still will suffer from the issue of not being [literal string expressions](#literal-string-expressions), and also would not work in [pattern matching](#pattern-matching). -### Other Delimiters +### Choice Of Delimiters + +`'''` was chosen as a currently-unused syntax in Scala, with plenty of precedence +for `'''`-quoted strings in other languages. Languages like Python, Groovy, +Dart, and Elixir all have both `"""` and `'''` strings without any apparent issue, +with several (e.g. Groovy and Elixir) having different semantics between the two syntaxes. + +The similar "single-quote Char" syntax is `'\''` is relatively rare in typical +Scala code - a quick search of the libraries I have checked out finds 141 uses of `'\''`, +compared to 24331 uses of `.stripMargin` that could benefit from this improved syntax - +which suggests that the benefit will be widespread and the similarity with `'\''` would +be edge case that occurs rarely and cause minimal confusion. + +Other options to consider are listed below + +#### Double-single-quotes + +Like `'''`, `''` is also currently invalid syntax in Scala, and could be used for +defining multi-line strings: + +```scala +def helper = { + val x = '' + i am cow + hear me moo + '' + x +} +``` + +For all intents and purposes this is identical to the `'''` proposal, with some tweaks: + +- `''` looks less similar to a `Char` literal `'\''`, so less chance of confusion +- `''` looks less simila to the triple-quoted strings common in other languages, so + there is less benefit of familiarity. -`'''` was chosen as a currently-unused syntax in Scala, but other options are also -possible: +#### Triple-or-more double-quotes -- Triple-double-quotes `"""` are already used with a particular semantic, so we cannot +- `"""` are already used with a particular semantic, so we cannot change those, despite them being used in every other language like [Java](#java), [C#](#c), and [Swift](#swift). @@ -526,25 +563,42 @@ possible: ```scala @ """" - """".toCharArray + """".toCharArray res0: Array[Char] = Array('\"', '\n', '\"') ``` -- Single-quoted strings with `"` cannot currently span multiple lines, and so - they could be specified to have these dedenting semantics when used multi-line. - - This has the advantage of not introducing a new delimiter, as `"` is already - used for strings - - This has the disadvantage that a single `"` isn't very visually distinct - when used for demarcating blocks of text, and separating them vertically - from the code before and after - - Some languages do have multi-line strings with single-character delimiters, - e.g. Javascripts template literals use a single-backtick - - A single `"` would require that `"`s in the multi-line string be escaped. Given - that `"`s are very common characters to have in strings, that would be very annoying, - and mean that people would still need to use `""".stripMargin` strings in common cases - - If we don't rely on `"`s to be escaped in the common case, it could be hard to - define a rule to say which `"` does close the string and which `"` does not, - and users may have trouble visually parsing code following that rule. e.g. +#### Single-quoted Multiline Strings +Single-quoted strings with `"` cannot currently span multiple lines, and so +they could be specified to have these dedenting semantics when used multi-line. + +```scala +def openingParagraph = " + i am cow + hear me moo +" +``` + +This has the advantage of not introducing a new delimiter, as `"` is already +used for strings. + + +A single `"` would require that `"`s in the multi-line string be escaped. Given +that `"`s are very common characters to have in strings, that would be very annoying, +and mean that people would still need to use `""".stripMargin` strings in common cases + +```scala +def openingParagraph = " + { + \"i am\": \"cow\", + \"hear me\": \"moo\" + } +" +``` + +It is possible to define rules such that `"`s do not need to escape, but it could +complicate parsing. e.g. one suggested rule is _"the first line starting with a `"` +and with an odd number of `"`s terminates the multi-line string"_, but that fails +in simple cases like: ```scala def openingParagraph = " @@ -555,31 +609,61 @@ def openingParagraph = " ".toJson ``` +In general, such rules also are difficult to implement: while it is possible to do +such "line-based" lexing in the Scala compiler's hand-written parser, I expect it will +be challenging for other external tools, e.g. FastParse's parser combinators or syntax +highlighters like Github Linguist, Highlight.js, or Prism.js are not typically able +to encode rules such as _"the first line starting with a `"` +and with an odd number of `"`s terminates the multi-line string"_ + + +#### Single-Quote with Header + +One delimiter that uses `"`s, avoids introducing a new `'''` delimiter, and also +avoids the parsing edge cases and implementation challenges would be , e.g. `"---\n` would +need to be followed by `\n---"`. This header could be variable length, allowing the ability +to embed arbitrary contents without escaping, similar to the extendable `'''` delimiters +proposed above and present in [C#'s](#c) or [Swift's](#swift). + +```scala +def openingParagraph = "--- + One dark and stormy night, + he said + "...i am cow + hear me moo" +---" +``` + +Although in theory the delimiter between `"` and `\n` could contain any characters except +`"` and `\n` while remaining unambiguous, in practice we will likely want to limit it to +a small set e.g. dashes-only to avoid unnecessary flexibility in the syntax + +### Other Syntaxes - Triple-backticks are another syntax that is currently available, and so could be used as a multi-line delimiter. This has the advantage of being similar to blocks used in markdown, with a similar meaning, but several disadvantages: - It would collide if a user tries to embed Scala code in a markdown code block. In fact, I couldn't even figure out how to embed triple-backticks in this document! - - Backticks are currently used for identifiers while single-quotes are used for literals, - so single-quotes seems more appropriate to use for multi-line literals than backticks. - - Single-quotes also would look more familiar to anyone coming from other languages like - Python or Elixir (albeit with slightly different semantics) while triple-backticks have + - Backticks are currently used for identifiers while single-quotes are used for literals, + so single-quotes seems more appropriate to use for multi-line literals than backticks. + - Single-quotes also would look more familiar to anyone coming from other languages like + Python or Elixir (albeit with slightly different semantics) while triple-backticks have no precedence in any programming language. - Other syntaxes like `@"..."` are possible, but probably too esoteric to be worth considering ### Alternative ways of specifying indentation -The proposed rule of specifies the indentation to be removed relies on the indentation of +The proposed rule of specifies the indentation to be removed relies on the indentation of the trailing `'''` delimiter. Other possible approaches include: - The minimum indentation of any non-whitespace line within the string, which is why [Ruby does](#ruby) - - This does not allow the user to define strings with all lines indented by some amount, + - This does not allow the user to define strings with all lines indented by some amount, unless the indentation of the closing delimiter is counted as well. But if the indentation of the closing delimiter is counted, then it is simpler to just use that, and prohibit other lines from being indented less than the delimiter -- An explicit indentation-counter, which is what YAML does, e.g. with the below text block +- An explicit indentation-counter, which is what YAML does, e.g. with the below text block dedenting the string by 4 characters: ```yaml example: >4 @@ -593,16 +677,16 @@ example: >4 plus another line at the end. ``` -This works, but it is very unintuitive for users to have to translate the indentation to be -removed (which is a visual thing) into a number that gets written at the top of the block. In -contrast, the current proposal specifies the indentation to be removed in terms of the -indentation of the closing delimiter, which keeps it within the "visual" domain without +This works, but it is very unintuitive for users to have to translate the indentation to be +removed (which is a visual thing) into a number that gets written at the top of the block. In +contrast, the current proposal specifies the indentation to be removed in terms of the +indentation of the closing delimiter, which keeps it within the "visual" domain without needing the user to count spaces. ## Prior Art -Many other languages have exactly this feature, all with exactly the same reason -and exactly the same specification: trimming the leading and trailing newlines, along +Many other languages have dedented string literals, all with exactly the same reason +and almost the same specification: trimming the leading and trailing newlines, along with indentation. Many have similar rules for flexible delimiters to allow the strings to contain arbitrary contents @@ -622,10 +706,10 @@ String html = """ """; ``` -> The re-indentation algorithm takes the content of a text block whose line terminators have +> The re-indentation algorithm takes the content of a text block whose line terminators have > been normalized to LF. It removes the same amount of white space from each line of content > until at least one of the lines has a non-white space character in the leftmost position. -> The position of the opening """ characters has no effect on the algorithm, but the position +> The position of the opening """ characters has no effect on the algorithm, but the position > of the closing """ characters does have an effect if placed on its own line. Java doesn't have extended delimiters like those proposed here, but requires you to escape @@ -646,7 +730,7 @@ var xml = """ ``` > To make the text easy to read and allow for indentation that developers like in code, -> these string literals will naturally remove the indentation specified on the last line +> these string literals will naturally remove the indentation specified on the last line > when producing the final literal value. For example, a literal of the form: C# also allows arbitrary-length delimiters as described in this propsoal @@ -657,7 +741,7 @@ var xml = """" """"; ``` -> Because the nested contents might itself want to use """ then the starting/ending +> Because the nested contents might itself want to use """ then the starting/ending > delimiters can be longer ### Swift @@ -675,11 +759,11 @@ till you come to the end; then stop." """ ``` -> A multiline string literal includes all of the lines between its opening and closing +> A multiline string literal includes all of the lines between its opening and closing > quotation marks. The string begins on the first line after the opening quotation marks -> (""") and ends on the line before the closing quotation marks, which means that neither +> (""") and ends on the line before the closing quotation marks, which means that neither > of the strings below start or end with a line break: -> +> > A multiline string can be indented to match the surrounding code. The whitespace before the > closing quotation marks (""") tells Swift what whitespace to ignore before all of the other > lines. However, if you write whitespace at the beginning of a line in addition to what’s before @@ -693,7 +777,7 @@ Here are three more double quotes: """ """# ``` -> String literals created using extended delimiters can also be multiline string literals. +> String literals created using extended delimiters can also be multiline string literals. > You can use extended delimiters to include the text """ in a multiline string, overriding > the default behavior that ends the literal. For example: @@ -721,19 +805,45 @@ behave exactly as this proposal: ``` > Multi-line strings in Elixir are written with three double-quotes, and can have unescaped -> quotes within them. The resulting string will end with a newline. The indentation of the +> quotes within them. The resulting string will end with a newline. The indentation of the > last """ is used to strip indentation from the inner string. For example: Elixir allows both `"""` and `'''` syntax for multi-line strings, with `'''`-delimited strings allowing you to embed `"""` in the body (and vice versa). This is similar to Python's syntax for triple-quoted strings +### Bash + +Bash has multiple variants of `<< HEREDOC`: + +```bash +cat << EOF +The current working directory is: $PWD +You are logged in as: $(whoami) +EOF +``` + +This includes `<<- HEREDOC` strings that strip indentation. This is done relatively naively, +by simply removing all leading `\t` tab characters. + +> The first line starts with an optional command followed by the special redirection +> operator `<<` and the delimiting identifier. +> +> * You can use any string as a delimiting identifier, the most commonly used are EOF or END. +> * If the delimiting identifier is unquoted, the shell will substitute all variables, commands +> and special characters before passing the here-document lines to the command. +> * Appending a minus sign to the redirection operator <<-, will cause all leading tab characters +> to be ignored. This allows you to use indentation when writing here-documents in shell scripts. Leading whitespace characters are not allowed, only tab. +> * The here-document block can contain strings, variables, commands and any other type of input. +> * The last line ends with the delimiting identifier. White space in front of the delimiter is +> not allowed. + ### Ruby -Ruby has [Squiggly Heredoc](https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Strings) -strings that have similar leading/trailing newline removal, but has a -"least indented non-whitespace-only" line policy for removing indentation, rather than a -closing-delimiter policy like the other languages above +Ruby has [Squiggly Heredoc](https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Strings), +inspired by Bash, but with a different "least indented non-whitespace-only" line policy +for removing indentation, rather than Bash's tab-based removal or a +closing-delimiter-indentation policy like the other languages above ```ruby expected_result = <<~SQUIGGLY_HEREDOC @@ -743,9 +853,9 @@ expected_result = <<~SQUIGGLY_HEREDOC SQUIGGLY_HEREDOC ``` -> The indentation of the least-indented line will be removed from each line of the content. -> Note that empty lines and lines consisting solely of literal tabs and spaces will be ignored -> for the purposes of determining indentation, but escaped tabs and spaces are considered +> The indentation of the least-indented line will be removed from each line of the content. +> Note that empty lines and lines consisting solely of literal tabs and spaces will be ignored +> for the purposes of determining indentation, but escaped tabs and spaces are considered > non-indentation characters. As a HEREDOC-inspired syntax, you can change the header of your multi-line string in Ruby @@ -758,4 +868,30 @@ expected_result = <<~MY_CUSTOM_SQUIGGLY_HEREDOC SQUIGGLY_HEREDOC That might span many lines MY_CUSTOM_SQUIGGLY_HEREDOC -``` \ No newline at end of file +``` + +### Ocaml + +Ocaml [allows single-quoted to span multiple lines](https://ocaml.org/manual/5.3/lex.html#sss:stringliterals), +and automatically removes indentation if the newline character is escaped with a preceding `\`: + +```ocaml +# let contains_unexpected_spaces = + "This multiline literal + contains three consecutive spaces." + + let no_unexpected_spaces = + "This multiline literal \n\ + uses a single space between all words.";; +val contains_unexpected_spaces : string = + "This multiline literal\n contains three consecutive spaces." +val no_unexpected_spaces : string = + "This multiline literal \nuses a single space between all words." +``` + +However, Ocaml's "raw" string syntax `{| |}` does not have a mode that removes indentation, +which is an [open issue on the OCaml repo](https://github.com/ocaml/ocaml/issues/13860) + +## Other String Syntaxes + +- \ No newline at end of file From d53dd4e8b0dfeae70d1480d790ee5d24f96a7976 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 08:39:07 +0200 Subject: [PATCH 33/78] . --- content/dedented-string-literals.md | 35 +++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 4edcb97c..c7cda477 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -597,10 +597,22 @@ def openingParagraph = " It is possible to define rules such that `"`s do not need to escape, but it could complicate parsing. e.g. one suggested rule is _"the first line starting with a `"` -and with an odd number of `"`s terminates the multi-line string"_, but that fails -in simple cases like: +and with an odd number of `"`s terminates the multi-line string"_. That works for the scenario +above: ```scala +def openingParagraph = " + { + "i am": "cow", + "hear me": "moo" + } +" +``` + +But fails in other simple cases like: + +```scala +// The `"...` closes the string prematurely def openingParagraph = " One dark and stormy night, he said @@ -609,6 +621,25 @@ def openingParagraph = " ".toJson ``` +```scala +def openingParagraph = " + { + "i am": "cow", + "hear me": "moo" + } +" + '"' // This becomes an unclosed string! +``` + +```scala +def openingParagraph = " + { + "i am": "cow", + "hear me": "moo" + } +" // A single-`"` string +// The preceding comment causes this to become an unclosed string literal! +``` + In general, such rules also are difficult to implement: while it is possible to do such "line-based" lexing in the Scala compiler's hand-written parser, I expect it will be challenging for other external tools, e.g. FastParse's parser combinators or syntax From 5df49d6b82ce0e19724d587b98bda44b6095652a Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 08:52:21 +0200 Subject: [PATCH 34/78] . --- content/dedented-string-literals.md | 107 +++++++++++++++++++++++++--- 1 file changed, 98 insertions(+), 9 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index c7cda477..a26be8b6 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -137,8 +137,17 @@ As `'''` is not valid syntax in Scala today, there are no technical backwards co concerns. See the section on [Choice Of Delimiters](#choice-of-delimiters) for a discussion on why `'''` is proposed and some viable alternatives -We expect that traditional `"""` strings will remain in use - e.g. for single-line scenarios - -but most multi-line strings would be served better by `'''` as the default choice +We expect that traditional `"""` strings will remain in use - e.g. for single-line scenarios such +as those found in +[Scalatags](https://github.com/com-lihaoyi/scalatags/blob/0024ce995f301b10a435c672ff643f2a432a7f3b/scalatags/test/src/scalatags/generic/BasicTests.scala#L46-L61), +[Mill](https://github.com/com-lihaoyi/mill/blob/50e775b31d3f8fc8734c0a90dc231a4dd5ba1d4f/integration/invalidation/invalidation/src/ScriptsInvalidationTests.scala#L29), +[Cask](https://github.com/com-lihaoyi/cask/blob/2bbee717e176a62d6a9af6c8187fbf219aad913d/docs/build.sc#L42), +[Ammonite](https://github.com/com-lihaoyi/Ammonite/blob/2fdc440b23c9bc7eb782c496c05ec1d3c10ee3d6/amm/repl/src/test/scala/ammonite/interp/AutocompleteTests.scala#L62-L104), +[PPrint](https://github.com/com-lihaoyi/PPrint/blob/abea5a533dcb054ab0ef67a4418636faf8e243a5/pprint/test/src/test/pprint/VerticalTests.scala#L32), +[OS-Lib](https://github.com/com-lihaoyi/os-lib/blob/72605235899b65e144ffe48821c63085cb9062ad/os/test/src/PathTests.scala#L34), +[Requests-Scala](https://github.com/com-lihaoyi/requests-scala/blob/a9541623017816a53ecafc5052d02ef7ec62cf2c/requests/src/requests/Requester.scala#L257), +[FastParse](https://github.com/com-lihaoyi/fastparse/blob/d8f95daef21d6e6f9734624237f993f4cebfa881/fastparse/test/src-2.12%2B/fastparse/CustomWhitespaceMathTests.scala#L54-L58), +and other projects - but most multi-line strings would be served better by `'''` as the default choice ## Motivation @@ -318,7 +327,7 @@ val x: "hello" = hello scala> val x: """i am cow | |hear me moo""".stripMargin = """i am cow | |hear me moo""".stripMargin -``` +``` ```scala -- Error: ---------------------------------------------------------------------- @@ -361,14 +370,14 @@ scala> @scala.annotation.implicitNotFound( | """i am cow | |hear me moo""".stripMargin.toUpperCase) class Foo() // defined class Foo - + scala> implicitly[Foo] -- [E172] Type Error: ---------------------------------------------------------- 1 |implicitly[Foo] | ^ |No given instance of type Foo was found for parameter e of method implicitly in object Predef 1 error found - + ``` ### Pattern Matching @@ -378,7 +387,7 @@ scala> implicitly[Foo] ```scala def foo: String = ??? - + foo match { case """i am cow |hear me moo""".stripMargin => @@ -640,7 +649,89 @@ def openingParagraph = " // The preceding comment causes this to become an unclosed string literal! ``` -In general, such rules also are difficult to implement: while it is possible to do +Furthermore, this could cause confusion when embeding the multi-line string in surrounding code: + +```scala +// This parses as `foo` being passed one parameter +foo( + " + this is + "," + not a drill + " +) +// This parses as `foo` being passed two parameter +foo( + " +this is + ", +" +not a drill +" +) +``` + +Another possible rule is indentation-based: _"the first single-quote preceded on a line +only by whitespace that is indented equal-or-less than the opening quote"_ closes the string"_. +_"indentation of the opening quote"_ could mean one + +1. The column offset of the `"` character itself. That would mean the entire string body must + be to the right of the opening quote, which does force a more verbose layout that takes + more vertical and horizontal space: + +```scala +def openingParagraph = " + i am cow + hear me moo + " +def openingParagraph = + " + i am cow + hear me moo + " +``` + +2. The indentation of the statement which contains the opening quote. This allows a more compact + syntax in some cases, but not others + +```scala +def openingParagraph = " + i am cow + hear me moo +" + +// The indentation of statement below is starts at "hello" +"hello" + .map( + foo => " +i am cow +hear me moo +" + ) +``` + +3. The column-offset of the first non-whitespace character on the line containing the opening `"` + +```scala +def openingParagraph = " + i am cow + hear me moo +" + +// The indentation/closing-quote is measured from the start of `foo =>` +"hello" + .map( + foo => " + i am cow + hear me moo + " + ) +``` + +In general, such lexing rules very unusual: there is no precedence for this kind of +_"string terminates on a line with an odd number of quotes sprinkled anywhere within it"_ +syntax anywhere in the broader programming landscape. Apart from violating users expectations, +such rules also violate tooling assumptions: while it is possible to do such "line-based" lexing in the Scala compiler's hand-written parser, I expect it will be challenging for other external tools, e.g. FastParse's parser combinators or syntax highlighters like Github Linguist, Highlight.js, or Prism.js are not typically able @@ -923,6 +1014,4 @@ val no_unexpected_spaces : string = However, Ocaml's "raw" string syntax `{| |}` does not have a mode that removes indentation, which is an [open issue on the OCaml repo](https://github.com/ocaml/ocaml/issues/13860) -## Other String Syntaxes - - \ No newline at end of file From bd13059388973ba7b87ad1c372a727920fa834da Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 08:58:46 +0200 Subject: [PATCH 35/78] . --- content/dedented-string-literals.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index a26be8b6..81fd721f 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -66,10 +66,7 @@ can simply adjust the contents accordingly: hear me moo ``` -And if a user wants leading or trailing newlines, they can add those as well - -If a user explicitly wants indentation to be present in the string, they -can simply adjust the contents accordingly: +And if a user wants leading or trailing newlines, they can add those as well: ```scala > def helper = { From 4bd654c72909a9be7accf3d0456ba366b5091264 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:05:22 +0200 Subject: [PATCH 36/78] . --- content/dedented-string-literals.md | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 81fd721f..af759697 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -456,10 +456,31 @@ strings is not allowed, val x = '''hello''' ``` -As mentioned above, the opening and closing delimiters MUST have a leading/trailing +For such single-line scenarios, the recommendation is to continue using traditional `"""` +strings, which do not need `.stripMargin` and thus avoid all the downfalls of multi-line +`""".stripMargin` strings discussed above. + +The opening and closing delimiters MUST have a trailing/leading newline, making the `'''` delimiters "vertical" delimiters that are easy to scan rather than "horizontal" delimiters like `"` or `"""` which requires the reader -to scan left and right to determine the bounds of the string literal. +to scan left and right to determine the bounds of the string literal. String +contents on the same line as the delimiters is not allowed: + +```scala +// not allowed! +val x = '''hello +world +i am cow''' +``` + +But code _outside_ the string literal has no limitations: + +```scala +val x = println(''' +hello +world +''') // this is fine! +``` All lines within a dedented `'''` string MUST be indented further than the closing From 9171b8295002fc27767783cbdd3f001b0a3bec22 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:05:41 +0200 Subject: [PATCH 37/78] . --- content/dedented-string-literals.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index af759697..8ccd6198 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -514,7 +514,7 @@ One option is to use current triple-quoted strings with an interpolator, e.g. ```scala def helper = { - val x = tq""" + val x = dedent""" i am cow hear me moo """ @@ -522,14 +522,14 @@ def helper = { } ``` -This `tq"""` interpolator could be a macro that looks at the source code and removes +This `dedent"""` interpolator could be a macro that looks at the source code and removes indentation, avoiding the problems with runtime indentation removal we [discussed above](#incorrectness-with-mutliline-interpolation). A custom interpolator could also work in [pattern matching](#pattern-matching). However, using an interpolator does not solve the other issues of multiline strings not being valid [literal types](#literalsingleton-types) or [literal string expressions](#literal-string-expressions). -Custom interpolators also do not compose: having a dedicate `tq"""` interpolator also +Custom interpolators also do not compose: having a dedicate `dedent"""` interpolator also means multiline strings cannot be used with other existing interpolators, such as `s""`, `r""`, or user-specified interpolators like `sql""` introduced by libraries like [ScalaSql](https://github.com/com-lihaoyi/scalasql). From 0755c61ad81a5aafd04643e859b23f0c8bac8b6e Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:07:07 +0200 Subject: [PATCH 38/78] . --- content/dedented-string-literals.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 8ccd6198..18bda2e6 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -550,9 +550,10 @@ Dart, and Elixir all have both `"""` and `'''` strings without any apparent issu with several (e.g. Groovy and Elixir) having different semantics between the two syntaxes. The similar "single-quote Char" syntax is `'\''` is relatively rare in typical -Scala code - a quick search of the libraries I have checked out finds 141 uses of `'\''`, -compared to 24331 uses of `.stripMargin` that could benefit from this improved syntax - -which suggests that the benefit will be widespread and the similarity with `'\''` would +Scala code: a quick search of the libraries I have checked out finds 141 uses of `'\''`, +compared to 24331 uses of `.stripMargin` that could benefit from this improved syntax, +172 times as many use sites. +This suggests that the benefit will be widespread and the similarity with `'\''` would be edge case that occurs rarely and cause minimal confusion. Other options to consider are listed below From fc73c07ea33ca29c3268128fb2aa0d9bdd3146ec Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:07:49 +0200 Subject: [PATCH 39/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 18bda2e6..954751c8 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -576,7 +576,7 @@ def helper = { For all intents and purposes this is identical to the `'''` proposal, with some tweaks: - `''` looks less similar to a `Char` literal `'\''`, so less chance of confusion -- `''` looks less simila to the triple-quoted strings common in other languages, so +- `''` looks less similar to the triple-quoted strings common in other languages, so there is less benefit of familiarity. #### Triple-or-more double-quotes From d8608e9f0771f1047544f52533f9eb1a8b2135c1 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:10:53 +0200 Subject: [PATCH 40/78] . --- content/dedented-string-literals.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 954751c8..738a1394 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -546,7 +546,7 @@ in [pattern matching](#pattern-matching). `'''` was chosen as a currently-unused syntax in Scala, with plenty of precedence for `'''`-quoted strings in other languages. Languages like Python, Groovy, -Dart, and Elixir all have both `"""` and `'''` strings without any apparent issue, +Dart, Elixir, and TOML all have both `"""` and `'''` strings without any apparent issue, with several (e.g. Groovy and Elixir) having different semantics between the two syntaxes. The similar "single-quote Char" syntax is `'\''` is relatively rare in typical @@ -577,7 +577,10 @@ For all intents and purposes this is identical to the `'''` proposal, with some - `''` looks less similar to a `Char` literal `'\''`, so less chance of confusion - `''` looks less similar to the triple-quoted strings common in other languages, so - there is less benefit of familiarity. + there is less benefit of familiarity. We are not aware of any language in the world + which uses `''` as a delimiter for string literals. +- `''` means "empty string" in a _very_ large number of programming languages. Using it + as a string _delimiter_ in Scala would likely cause confusion on that basis. #### Triple-or-more double-quotes From 46c239c4930fbe747f6d7faecc851fae5ad6b52c Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:13:33 +0200 Subject: [PATCH 41/78] . --- content/dedented-string-literals.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 738a1394..f7d4df67 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -685,11 +685,11 @@ foo( // This parses as `foo` being passed two parameter foo( " -this is + this is ", -" -not a drill -" + " + not a drill + " ) ``` From 07b1af740004ef327714495d54d715ce37142630 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:13:47 +0200 Subject: [PATCH 42/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index f7d4df67..4a67a3d8 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -695,7 +695,7 @@ foo( Another possible rule is indentation-based: _"the first single-quote preceded on a line only by whitespace that is indented equal-or-less than the opening quote"_ closes the string"_. -_"indentation of the opening quote"_ could mean one +_"indentation of the opening quote"_ could mean one of three things: 1. The column offset of the `"` character itself. That would mean the entire string body must be to the right of the opening quote, which does force a more verbose layout that takes From 141adb5cd53ad7fde19496f05a6c6dd0caede957 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:16:17 +0200 Subject: [PATCH 43/78] . --- content/dedented-string-literals.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 4a67a3d8..67c56058 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -767,7 +767,8 @@ One delimiter that uses `"`s, avoids introducing a new `'''` delimiter, and also avoids the parsing edge cases and implementation challenges would be , e.g. `"---\n` would need to be followed by `\n---"`. This header could be variable length, allowing the ability to embed arbitrary contents without escaping, similar to the extendable `'''` delimiters -proposed above and present in [C#'s](#c) or [Swift's](#swift). +proposed above and present in [C#](#c) or [Swift](#swift), or the HEREDOC strings in +[Bash](#bash) or [Ruby](#ruby). ```scala def openingParagraph = "--- From b76fad31d74f83c66ce1092f0572dec27491081b Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:22:31 +0200 Subject: [PATCH 44/78] . --- content/dedented-string-literals.md | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 67c56058..83e84d3a 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -752,11 +752,15 @@ def openingParagraph = " In general, such lexing rules very unusual: there is no precedence for this kind of _"string terminates on a line with an odd number of quotes sprinkled anywhere within it"_ -syntax anywhere in the broader programming landscape. Apart from violating users expectations, -such rules also violate tooling assumptions: while it is possible to do -such "line-based" lexing in the Scala compiler's hand-written parser, I expect it will -be challenging for other external tools, e.g. FastParse's parser combinators or syntax -highlighters like Github Linguist, Highlight.js, or Prism.js are not typically able +syntax anywhere in the broader programming landscape. While the edge cases where they misbehave +may not be super common, the misbehavior is sufficiently _weird_ that I expect it will cause +significant user confusion every time an edge case is encountered. + + +Apart from violating users expectations, such rules also violate tooling assumptions: while +it is possible to do such "line-based" lexing in the Scala compiler's hand-written parser, +I expect it will be challenging for other external tools, e.g. FastParse's parser combinators +or syntax highlighters like Github Linguist, Highlight.js, or Prism.js are not typically able to encode rules such as _"the first line starting with a `"` and with an odd number of `"`s terminates the multi-line string"_ @@ -781,7 +785,11 @@ def openingParagraph = "--- Although in theory the delimiter between `"` and `\n` could contain any characters except `"` and `\n` while remaining unambiguous, in practice we will likely want to limit it to -a small set e.g. dashes-only to avoid unnecessary flexibility in the syntax +a small set e.g. dashes-only to avoid unnecessary flexibility in the syntax. + +If we want to stick with `"` for strings, this _Single-Quote with Header_ syntax seems +like a good compromise that provides a `"`-based syntax while avoiding all the pitfalls +of the raw [Single-Quoted Multi-line Strings](#single-quoted-multiline-strings) delimiter above. ### Other Syntaxes - Triple-backticks are another syntax that is currently available, and so could be used as From 62ff3ca39d18ec5972fbd3c0c88af8eb3c0ac124 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:25:21 +0200 Subject: [PATCH 45/78] . --- content/dedented-string-literals.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 83e84d3a..e91b018b 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -753,7 +753,8 @@ def openingParagraph = " In general, such lexing rules very unusual: there is no precedence for this kind of _"string terminates on a line with an odd number of quotes sprinkled anywhere within it"_ syntax anywhere in the broader programming landscape. While the edge cases where they misbehave -may not be super common, the misbehavior is sufficiently _weird_ that I expect it will cause +may not be super common, the code examples above show they aren't _rare_ either, and +the misbehavior is sufficiently _weird_ that I expect it will cause significant user confusion every time an edge case is encountered. From 32ba7187d4632d781fd712bbde01d08b4a2617af Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:27:45 +0200 Subject: [PATCH 46/78] . --- content/dedented-string-literals.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index e91b018b..08d06447 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -768,9 +768,11 @@ and with an odd number of `"`s terminates the multi-line string"_ #### Single-Quote with Header -One delimiter that uses `"`s, avoids introducing a new `'''` delimiter, and also -avoids the parsing edge cases and implementation challenges would be , e.g. `"---\n` would -need to be followed by `\n---"`. This header could be variable length, allowing the ability +This would mean the multi-line string starts with a delimiter such as +`"---\n`, and would terminate with a corresponding delimiter `\n---"`. +This uses `"`s, avoids introducing a new `'''` delimiter, and also +avoids the parsing edge cases and implementation challenges of using `"` alone. +This `---` header could be variable length, allowing the ability to embed arbitrary contents without escaping, similar to the extendable `'''` delimiters proposed above and present in [C#](#c) or [Swift](#swift), or the HEREDOC strings in [Bash](#bash) or [Ruby](#ruby). From 327b1e7e92232ef3eb0d971ca2c39c77cb6453f1 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:28:54 +0200 Subject: [PATCH 47/78] . --- content/dedented-string-literals.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 08d06447..99ca4753 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -770,12 +770,6 @@ and with an odd number of `"`s terminates the multi-line string"_ This would mean the multi-line string starts with a delimiter such as `"---\n`, and would terminate with a corresponding delimiter `\n---"`. -This uses `"`s, avoids introducing a new `'''` delimiter, and also -avoids the parsing edge cases and implementation challenges of using `"` alone. -This `---` header could be variable length, allowing the ability -to embed arbitrary contents without escaping, similar to the extendable `'''` delimiters -proposed above and present in [C#](#c) or [Swift](#swift), or the HEREDOC strings in -[Bash](#bash) or [Ruby](#ruby). ```scala def openingParagraph = "--- @@ -786,6 +780,13 @@ def openingParagraph = "--- ---" ``` +This uses `"`s, avoids introducing a new `'''` delimiter, and also +avoids the parsing edge cases and implementation challenges of using `"` alone. +This `---` header could be variable length, allowing the ability +to embed arbitrary contents without escaping, similar to the extendable `'''` delimiters +proposed above and present in [C#](#c) or [Swift](#swift), or the HEREDOC strings in +[Bash](#bash) or [Ruby](#ruby). + Although in theory the delimiter between `"` and `\n` could contain any characters except `"` and `\n` while remaining unambiguous, in practice we will likely want to limit it to a small set e.g. dashes-only to avoid unnecessary flexibility in the syntax. From 671819ecb066f4e8a34a3978eb2ce4b9891b9f8b Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:29:46 +0200 Subject: [PATCH 48/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 99ca4753..dfc08a55 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -849,7 +849,7 @@ to contain arbitrary contents ### Java -Java since [JEP 378](https://openjdk.org/jeps/378) now multiline strings called "text blocks" +Java 15 since [JEP 378](https://openjdk.org/jeps/378) now multiline strings called "text blocks" that implement exactly this, with identical leading/trailing newline and indentation removal policies: From add558f395205ee7460c41f90a0e8d3674227f0b Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:29:59 +0200 Subject: [PATCH 49/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index dfc08a55..1552fffc 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -849,7 +849,7 @@ to contain arbitrary contents ### Java -Java 15 since [JEP 378](https://openjdk.org/jeps/378) now multiline strings called "text blocks" +Java since [JEP 378](https://openjdk.org/jeps/378)/Java-15 now multiline strings called "text blocks" that implement exactly this, with identical leading/trailing newline and indentation removal policies: From d91e056f2e94ff1781b3cfa8b3621a4145d722a9 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:30:24 +0200 Subject: [PATCH 50/78] . --- content/dedented-string-literals.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 1552fffc..ee7219a9 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -849,9 +849,9 @@ to contain arbitrary contents ### Java -Java since [JEP 378](https://openjdk.org/jeps/378)/Java-15 now multiline strings called "text blocks" -that implement exactly this, with identical leading/trailing newline and indentation removal -policies: +Java since [JEP 378](https://openjdk.org/jeps/378)/Java-15 (released Sep-2020) now supports +multiline strings called "text blocks" that implement exactly this, with identical +leading/trailing newline and indentation removal policies: ```java String html = """ From 8ba6966d5fe6f1333f43e6afe798a1b3b3791c3d Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:31:01 +0200 Subject: [PATCH 51/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index ee7219a9..80c5b9a6 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -850,7 +850,7 @@ to contain arbitrary contents ### Java Java since [JEP 378](https://openjdk.org/jeps/378)/Java-15 (released Sep-2020) now supports -multiline strings called "text blocks" that implement exactly this, with identical +multiline strings called "text blocks" that implement exactly this, with similar leading/trailing newline and indentation removal policies: ```java From dd78d78ba0372d6a9cb4cb849c8b2bba56aa6296 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:32:51 +0200 Subject: [PATCH 52/78] . --- content/dedented-string-literals.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 80c5b9a6..27b2d3b3 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -981,7 +981,10 @@ EOF ``` This includes `<<- HEREDOC` strings that strip indentation. This is done relatively naively, -by simply removing all leading `\t` tab characters. +by simply removing all leading `\t` tab characters. The customizable `EOF` header serves a +similar purpose to the _extended delimiters_ included in this proposal, and allows +the user to choose a delimiter that does not exist in the literal avoiding the need +for escaping entirely. > The first line starts with an optional command followed by the special redirection > operator `<<` and the delimiting identifier. From f6064cd72136fc10304f8e335752315f6d7958da Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:33:00 +0200 Subject: [PATCH 53/78] . --- content/dedented-string-literals.md | 1 - 1 file changed, 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 27b2d3b3..51fa9a65 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -1052,4 +1052,3 @@ val no_unexpected_spaces : string = However, Ocaml's "raw" string syntax `{| |}` does not have a mode that removes indentation, which is an [open issue on the OCaml repo](https://github.com/ocaml/ocaml/issues/13860) -- \ No newline at end of file From 9dcd438f597c6570cab54ac88df57f97d403b3f1 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:36:59 +0200 Subject: [PATCH 54/78] . --- content/dedented-string-literals.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 51fa9a65..d686fcf6 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -789,7 +789,8 @@ proposed above and present in [C#](#c) or [Swift](#swift), or the HEREDOC string Although in theory the delimiter between `"` and `\n` could contain any characters except `"` and `\n` while remaining unambiguous, in practice we will likely want to limit it to -a small set e.g. dashes-only to avoid unnecessary flexibility in the syntax. +to avoid unnecessary flexibility in the syntax. `---` seems like a reasonable choice, +but other syntaxes are possible. If we want to stick with `"` for strings, this _Single-Quote with Header_ syntax seems like a good compromise that provides a `"`-based syntax while avoiding all the pitfalls From 6fb49703c491cbd06478d948edb2048ed93e32a8 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 09:57:55 +0200 Subject: [PATCH 55/78] . --- content/dedented-string-literals.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index d686fcf6..7d02799c 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -790,7 +790,8 @@ proposed above and present in [C#](#c) or [Swift](#swift), or the HEREDOC string Although in theory the delimiter between `"` and `\n` could contain any characters except `"` and `\n` while remaining unambiguous, in practice we will likely want to limit it to to avoid unnecessary flexibility in the syntax. `---` seems like a reasonable choice, -but other syntaxes are possible. +inspired by the widespread use of `---` as a vertical document separator +(YAML, Markdown, Asciidoc, Pandoc), but other syntaxes are possible. If we want to stick with `"` for strings, this _Single-Quote with Header_ syntax seems like a good compromise that provides a `"`-based syntax while avoiding all the pitfalls From a92feace76c2a37879f1687e4301847fb67b84fd Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 10:00:04 +0200 Subject: [PATCH 56/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 7d02799c..140b2625 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -547,7 +547,7 @@ in [pattern matching](#pattern-matching). `'''` was chosen as a currently-unused syntax in Scala, with plenty of precedence for `'''`-quoted strings in other languages. Languages like Python, Groovy, Dart, Elixir, and TOML all have both `"""` and `'''` strings without any apparent issue, -with several (e.g. Groovy and Elixir) having different semantics between the two syntaxes. +with several (Groovy, Elixir, TOML) having different semantics between the two syntaxes. The similar "single-quote Char" syntax is `'\''` is relatively rare in typical Scala code: a quick search of the libraries I have checked out finds 141 uses of `'\''`, From 7f12c5c0f57ab25b59ab307bfb786042b69cb936 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 10:00:45 +0200 Subject: [PATCH 57/78] . --- content/dedented-string-literals.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 140b2625..5d38ee4a 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -548,6 +548,7 @@ in [pattern matching](#pattern-matching). for `'''`-quoted strings in other languages. Languages like Python, Groovy, Dart, Elixir, and TOML all have both `"""` and `'''` strings without any apparent issue, with several (Groovy, Elixir, TOML) having different semantics between the two syntaxes. +We expect this will provide familiarity for anyone coming to Scala from other languages. The similar "single-quote Char" syntax is `'\''` is relatively rare in typical Scala code: a quick search of the libraries I have checked out finds 141 uses of `'\''`, From 5bfbaeebb89a4882c83294461d41928baf3a0c4b Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 11:21:59 +0200 Subject: [PATCH 58/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 5d38ee4a..ad01f12c 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -126,7 +126,7 @@ foo match{ case ''' i am cow hear me moo - ''' => + ''' => } ``` From 0d78ae34891471dafe6035e24da682a37ae6b574 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 11:27:48 +0200 Subject: [PATCH 59/78] . --- content/dedented-string-literals.md | 70 +++++++++++++++++++++++++---- 1 file changed, 62 insertions(+), 8 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index ad01f12c..44bafe0b 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -798,7 +798,7 @@ If we want to stick with `"` for strings, this _Single-Quote with Header_ syntax like a good compromise that provides a `"`-based syntax while avoiding all the pitfalls of the raw [Single-Quoted Multi-line Strings](#single-quoted-multiline-strings) delimiter above. -### Other Syntaxes +#### Other Syntaxes - Triple-backticks are another syntax that is currently available, and so could be used as a multi-line delimiter. This has the advantage of being similar to blocks used in markdown, with a similar meaning, but several disadvantages: @@ -817,14 +817,68 @@ of the raw [Single-Quoted Multi-line Strings](#single-quoted-multiline-strings) The proposed rule of specifies the indentation to be removed relies on the indentation of the trailing `'''` delimiter. Other possible approaches include: -- The minimum indentation of any non-whitespace line within the string, which is why [Ruby does](#ruby) - - This does not allow the user to define strings with all lines indented by some amount, - unless the indentation of the closing delimiter is counted as well. But if the indentation - of the closing delimiter is counted, then it is simpler to just use that, and prohibit - other lines from being indented less than the delimiter +#### Minimum Indentation Within String +By this rule, rather than looking at the indentation of the closing delimiter to determine how +much to remove, we instead remove the minimum indentation of any non-whitespace line within the +string. This is why [Ruby does](#ruby) -- An explicit indentation-counter, which is what YAML does, e.g. with the below text block - dedenting the string by 4 characters: +The advantage of this is that some people may think indenting the contents of the string +looks subjectively better + +```scala +def helper = { + val x = ''' + i am cow + hear me moo + ''' + x +} +``` + +The downsides are twofold: + +- This does not allow the user to define strings with all lines indented by some amount, + unless the indentation of the closing delimiter is counted as well. This is likely an + uncommon use case, and can be mitigated by calling `.indent()`, but such strings are + then no longer literals with all the issues that that entails + +- There are now multiple ways to write the same string, adding some unnecessary flexibility. + In current proposal there is a single valid multi-line syntax for any particular string, + whereas with the rule of using the minimum-indentation-within-string the same string can + be written in many different ways. + +```scala +def helper = { // two space indented contents + val x = ''' + i am cow + hear me moo + ''' + x +} +``` +```scala +def helper = { // non-indented contents + val x = ''' + i am cow + hear me moo + ''' + x +} +``` +```scala +def helper = { // four space indented contents + val x = ''' + i am cow + hear me moo + ''' + x +} +``` + +#### An explicit indentation-counter + +This is what YAML does, e.g. with the below text block +dedenting the string by 4 characters: ```yaml example: >4 Several lines of text, From 593f5102bfcaaf9f498871ad0128af314677d51c Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 13:01:01 +0200 Subject: [PATCH 60/78] . --- content/dedented-string-literals.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 44bafe0b..68d8f54d 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -548,6 +548,8 @@ in [pattern matching](#pattern-matching). for `'''`-quoted strings in other languages. Languages like Python, Groovy, Dart, Elixir, and TOML all have both `"""` and `'''` strings without any apparent issue, with several (Groovy, Elixir, TOML) having different semantics between the two syntaxes. +The "ambiguity" of `'''` looking like it should mean the 1-character string `"'"` also does +not appear to be a problem in practice. We expect this will provide familiarity for anyone coming to Scala from other languages. The similar "single-quote Char" syntax is `'\''` is relatively rare in typical From f27e08cfd830470ca1ab4c2f7872c008bcbb93c0 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 17:30:32 +0200 Subject: [PATCH 61/78] . --- content/dedented-string-literals.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 68d8f54d..e3f49c35 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -548,9 +548,9 @@ in [pattern matching](#pattern-matching). for `'''`-quoted strings in other languages. Languages like Python, Groovy, Dart, Elixir, and TOML all have both `"""` and `'''` strings without any apparent issue, with several (Groovy, Elixir, TOML) having different semantics between the two syntaxes. -The "ambiguity" of `'''` looking like it should mean the 1-character string `"'"` also does -not appear to be a problem in practice. -We expect this will provide familiarity for anyone coming to Scala from other languages. +The "ambiguity" of `'''` looking similar to the 1-character string `'\''` also does +not appear to be a problem in practice. We expect this will provide familiarity for +anyone coming to Scala from other languages. The similar "single-quote Char" syntax is `'\''` is relatively rare in typical Scala code: a quick search of the libraries I have checked out finds 141 uses of `'\''`, From 9408ba6cf47f0d2c763c919b94ee149c093149cd Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Tue, 19 Aug 2025 17:54:18 +0200 Subject: [PATCH 62/78] . --- content/dedented-string-literals.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index e3f49c35..337ce074 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -580,8 +580,8 @@ For all intents and purposes this is identical to the `'''` proposal, with some - `''` looks less similar to a `Char` literal `'\''`, so less chance of confusion - `''` looks less similar to the triple-quoted strings common in other languages, so - there is less benefit of familiarity. We are not aware of any language in the world - which uses `''` as a delimiter for string literals. + there is less benefit of familiarity. The only language I'm aware of that uses this + syntax for multi-line strings is Nix. - `''` means "empty string" in a _very_ large number of programming languages. Using it as a string _delimiter_ in Scala would likely cause confusion on that basis. From aa699caeb17c34729fb8f5cdc024c86d03580a2e Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 20 Aug 2025 08:04:02 +0200 Subject: [PATCH 63/78] . --- content/dedented-string-literals.md | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 337ce074..8edb0319 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -791,10 +791,19 @@ proposed above and present in [C#](#c) or [Swift](#swift), or the HEREDOC string [Bash](#bash) or [Ruby](#ruby). Although in theory the delimiter between `"` and `\n` could contain any characters except -`"` and `\n` while remaining unambiguous, in practice we will likely want to limit it to -to avoid unnecessary flexibility in the syntax. `---` seems like a reasonable choice, -inspired by the widespread use of `---` as a vertical document separator -(YAML, Markdown, Asciidoc, Pandoc), but other syntaxes are possible. +`"` and `\n` while remaining unambiguous. However, given `"""` is not available, it seems +that most syntaxes would look somewhat out of place in Scala: + +* `"---` looks reasonable, but people would be more used to seeing this kind of `---` separator + in config and markup languages like YAML or Markdown, less so in a programming language like Scala + +* `"HEREDOC` and similar headers could work, but again are more commonly seen in "shell" + languages like Bash, Ruby, and Perl, and looks somewhat out of place in Scala + +Compared to these syntaxes, `'''` is seen in Python, and the closely-related `"""` is seen +in, Python, Java, C#, and Swift, all of which are languages more similar to Scala than +YAML, Markdown, Bash, or Perl. So `'''` would likely fit better into the conventions of this +family of programming languages. If we want to stick with `"` for strings, this _Single-Quote with Header_ syntax seems like a good compromise that provides a `"`-based syntax while avoiding all the pitfalls @@ -1089,6 +1098,11 @@ expected_result = <<~MY_CUSTOM_SQUIGGLY_HEREDOC MY_CUSTOM_SQUIGGLY_HEREDOC ``` + +### Perl + +Perl has a similar HEREDOC syntax to Ruby + ### Ocaml Ocaml [allows single-quoted to span multiple lines](https://ocaml.org/manual/5.3/lex.html#sss:stringliterals), From 96e858404f33607b5a96b6794bfe02f8b3c8db71 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 20 Aug 2025 08:25:00 +0200 Subject: [PATCH 64/78] . --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 8edb0319..d7d58329 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -790,7 +790,7 @@ to embed arbitrary contents without escaping, similar to the extendable `'''` de proposed above and present in [C#](#c) or [Swift](#swift), or the HEREDOC strings in [Bash](#bash) or [Ruby](#ruby). -Although in theory the delimiter between `"` and `\n` could contain any characters except +In theory the delimiter between `"` and `\n` could contain any characters except `"` and `\n` while remaining unambiguous. However, given `"""` is not available, it seems that most syntaxes would look somewhat out of place in Scala: From 5258105fb9ae31cda2199d9b8550905eb5995d96 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 20 Aug 2025 08:25:52 +0200 Subject: [PATCH 65/78] . --- content/dedented-string-literals.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index d7d58329..2ee83c35 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -800,10 +800,10 @@ that most syntaxes would look somewhat out of place in Scala: * `"HEREDOC` and similar headers could work, but again are more commonly seen in "shell" languages like Bash, Ruby, and Perl, and looks somewhat out of place in Scala -Compared to these syntaxes, `'''` is seen in Python, and the closely-related `"""` is seen -in, Python, Java, C#, and Swift, all of which are languages more similar to Scala than -YAML, Markdown, Bash, or Perl. So `'''` would likely fit better into the conventions of this -family of programming languages. +Compared to these syntaxes, `'''` is seen in Python, Dart, Elixir, and Groovy, and the +closely-related `"""` is seen in, Python, Java, C#, and Swift, all of which are languages +more similar to Scala than YAML, Markdown, Bash, or Perl. So `'''` would likely fit better +into the conventions of this family of programming languages. If we want to stick with `"` for strings, this _Single-Quote with Header_ syntax seems like a good compromise that provides a `"`-based syntax while avoiding all the pitfalls From b06990400dde5bc86be0e9c448cb7fb62fd1e013 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Thu, 21 Aug 2025 10:24:02 +0200 Subject: [PATCH 66/78] . --- content/dedented-string-literals.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 2ee83c35..059f2218 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -333,6 +333,22 @@ scala> val x: """i am cow | end of statement expected but '.' found ``` +This means multi-line strings cannot be used together with libraries like +[Iron](https://github.com/Iltotore/iron), which rely on literal types. For example, you +cannot use `""".stripMargin` strings to add error messages to Iron's refinement types +and aid users in understanding failures + +```scala +object helper { + type UUIDConstraint = DescribedAs[ + Match["^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"], + """This string must be a UUID + |A UUID comprises 5 dash-separated blocks of 8, 4, 4, 4, and 12 + |hexadecimal characters, all lowercase""".stripMargin + ] +} +``` + ### Literal String Expressions This also means that any macros that may work on string literals, e.g. validating From 652874096d5a7f2104654f2cc4548c67c0a07bcf Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 29 Aug 2025 09:31:13 +0800 Subject: [PATCH 67/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 059f2218..5f8ca6c1 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -568,12 +568,14 @@ The "ambiguity" of `'''` looking similar to the 1-character string `'\''` also d not appear to be a problem in practice. We expect this will provide familiarity for anyone coming to Scala from other languages. -The similar "single-quote Char" syntax is `'\''` is relatively rare in typical -Scala code: a quick search of the libraries I have checked out finds 141 uses of `'\''`, -compared to 24331 uses of `.stripMargin` that could benefit from this improved syntax, -172 times as many use sites. -This suggests that the benefit will be widespread and the similarity with `'\''` would -be edge case that occurs rarely and cause minimal confusion. +While it is tempting to come up with clever syntax and novel lexing strategies to add a new +multi-line string syntax into the language, that is exactly the wrong approach here. Rather, +we should pick the available multi-line string syntax that would be most familiar to programmers +learning Scala with experience in other languages. After triple double-quoted strings `"""` which +are already taken, triple single-quoted strings `'''` are without doubt the second-most +widely used multi-line string syntax across the programming landscape as a whole. And so +this proposal chooses `'''` as the delimiter to maximize familiarity and learnability to +users who will overwhelmingly have already seen such syntax before in Python or elsewhere. Other options to consider are listed below From d945694edaf759617b2baba819a4e07a4cafffcc Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 29 Aug 2025 09:53:23 +0800 Subject: [PATCH 68/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 5f8ca6c1..e0cd98b8 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -566,7 +566,9 @@ Dart, Elixir, and TOML all have both `"""` and `'''` strings without any apparen with several (Groovy, Elixir, TOML) having different semantics between the two syntaxes. The "ambiguity" of `'''` looking similar to the 1-character string `'\''` also does not appear to be a problem in practice. We expect this will provide familiarity for -anyone coming to Scala from other languages. +anyone coming to Scala from other languages, and should be a relatively easy delimiter +to lex/parse both in the compiler and in other ancilliary tools (e.g. syntax highlighters, +autoformatters, etc.) that should minimize the burden on downstream tool maintainers. While it is tempting to come up with clever syntax and novel lexing strategies to add a new multi-line string syntax into the language, that is exactly the wrong approach here. Rather, From 32e58a8cfad8c8bc7bc3b18c6513f5f206cc014b Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 29 Aug 2025 17:43:32 +0800 Subject: [PATCH 69/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index e0cd98b8..5bcd7067 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -569,6 +569,14 @@ not appear to be a problem in practice. We expect this will provide familiarity anyone coming to Scala from other languages, and should be a relatively easy delimiter to lex/parse both in the compiler and in other ancilliary tools (e.g. syntax highlighters, autoformatters, etc.) that should minimize the burden on downstream tool maintainers. +We have prototype implementations of the new syntax in: + +- IntelliJ-Scala: https://github.com/JetBrains/intellij-scala/pull/702 +- Tree-Sitter-Scala https://github.com/tree-sitter/tree-sitter-scala + +Many of the other delimiters discussed below are based on indentation or other "2D" syntax, +which would be much harder to implement in third-party lexers and parsers which overwhelmingly +work on a "1D" character stream with regexes or similar frameworks. While it is tempting to come up with clever syntax and novel lexing strategies to add a new multi-line string syntax into the language, that is exactly the wrong approach here. Rather, From 4eb5f0874bb0ab2401b880426a57718796cbcc1d Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 29 Aug 2025 19:38:19 +0800 Subject: [PATCH 70/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 5bcd7067..389a157f 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -572,7 +572,7 @@ autoformatters, etc.) that should minimize the burden on downstream tool maintai We have prototype implementations of the new syntax in: - IntelliJ-Scala: https://github.com/JetBrains/intellij-scala/pull/702 -- Tree-Sitter-Scala https://github.com/tree-sitter/tree-sitter-scala +- Tree-Sitter-Scala https://github.com/tree-sitter/tree-sitter-scala/pull/477 Many of the other delimiters discussed below are based on indentation or other "2D" syntax, which would be much harder to implement in third-party lexers and parsers which overwhelmingly From 776a2a5730e17dda2c7620c459154b00085662a0 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Fri, 29 Aug 2025 22:08:45 +0800 Subject: [PATCH 71/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 389a157f..e7760b98 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -572,6 +572,7 @@ autoformatters, etc.) that should minimize the burden on downstream tool maintai We have prototype implementations of the new syntax in: - IntelliJ-Scala: https://github.com/JetBrains/intellij-scala/pull/702 +- VsCode-Scala-Syntax https://github.com/scala/vscode-scala-syntax/pull/291 - Tree-Sitter-Scala https://github.com/tree-sitter/tree-sitter-scala/pull/477 Many of the other delimiters discussed below are based on indentation or other "2D" syntax, From 14f1fdc70ef1b4aafaee743da6ef9d805f8456ad Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 3 Sep 2025 16:00:29 +0800 Subject: [PATCH 72/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 41 +++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index e7760b98..d6a84174 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -113,6 +113,19 @@ hear me moo ''' ``` +Lastly, we _normalize newlines_ within the dedented string literals: regardless of whether the source +file uses `\n` or `\r` or `\r\n` or `\n\r` between those lines, the value of the `'''` string +literal contains `\n` as the line separator. That should avoid a common problem where code such as that below may +compile or not compile depending on what operating system you are running in and how the code +was acquired: + +```scala +val x: "a\nb" = """a +b""" +``` + +See [Newline Confusion](#newline-confusion) for more details on this + Dedented string literals should be able to be used anywhere a normal `"` or triple `"""` can be used: @@ -426,6 +439,32 @@ hear me moo""" => } ``` +### Newline Confusion + +One problem with current triple-quoted strings is that the value depends on many factors outside +of the code itself. For example consider the `"""` string below: + +```scala +"""a +b""" +``` + +What is the value of this string? You actually have no idea, and it depends on: +| | `git checkout` | Download Zip | +|--------------------------------------------|----------------|--------------| +| Written on Windows, Checked out on Windows | "a\r\nb" | "a\r\nb" | +| Written on Unix, Checked out on Windows | "a\r\nb" | "a\nb" | +| Written on Windows, Checked out on Unix | "a\nb" | "a\r\nb" | +| Written on Unix, Checked out on Unix | "a\nb" | "a\nb" | + +Thus `"""` multi-line strings are currently unusable for many use scenarios without +first normalizing them, which we can see in many codebases: + +- [Ammonite's `Util.normalizeNewlines`](https://github.com/search?q=repo%3Acom-lihaoyi%2FAmmonite%20normalizenewlines&type=code) +- [Mill's `.replaceAll("\r\n", "\n")](https://github.com/search?q=repo%3Acom-lihaoyi%2Fmill+replaceAll+%5Cr%5Cn&type=code) + +For `'''` strings, we normalize all newlines to `\n` regardless of the source file contents. + ### Downstream Tooling Complexity The last major problem with the existing `""".stripMargin` pattern is that all tools @@ -459,6 +498,8 @@ than every downstream tool needing hard-coded support to be `stripMargin`-aware, tools will only need to generate `'''` multiline strings, which can then be pasted into user code with arbitrary indentation and they will do the right thing. + + ## Implementation TODO From 6d33fc74e8cf89f2b92056724905db4007fde965 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 3 Sep 2025 16:03:21 +0800 Subject: [PATCH 73/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index d6a84174..8cc06a1f 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -33,6 +33,18 @@ i am cow hear me moo ``` +Replacing the old equivalent: + +```scala +> def helper = { + val x = """ + |i am cow + |hear me moo + |""".stripMargin.trip.replaceAll("\r\n", "\n") + x + } +``` + This is a common feature in other languages (see [Prior Art](#prior-art)) with exactly the same semantics, although unlike other languages Scala's `"""` already has an existing semantic, and so for this proposal the currently-unused `'''` syntax is chosen instead. From 0c1822dcb27e2e037fb8ab6eb378e15528271817 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 3 Sep 2025 16:03:42 +0800 Subject: [PATCH 74/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 8cc06a1f..b2840fef 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -40,7 +40,7 @@ Replacing the old equivalent: val x = """ |i am cow |hear me moo - |""".stripMargin.trip.replaceAll("\r\n", "\n") + |""".stripMargin.trim.replaceAll("\r\n", "\n") x } ``` From d275d57150d0ddc75a792479e75ec1fc24c9aaf5 Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 3 Sep 2025 16:18:27 +0800 Subject: [PATCH 75/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index b2840fef..f3ca96ed 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -630,7 +630,23 @@ We have prototype implementations of the new syntax in: Many of the other delimiters discussed below are based on indentation or other "2D" syntax, which would be much harder to implement in third-party lexers and parsers which overwhelmingly -work on a "1D" character stream with regexes or similar frameworks. +work on a "1D" character stream with regexes or similar frameworks: + +- VsCode-Scala-Syntax like other VSCode plugins relies on [TextMate Grammars](https://macromates.com/manual/en/language_grammars) + to lex your code for highlighting. TextMate Grammars rely on regexes to match tokens + do not support the more sophisticated indentation-dependent delimiters discussed below + +- Tree-Sitter-Scala relies on [Tree-Sitter's External Scanners](https://tree-sitter.github.io/tree-sitter/creating-parsers/4-external-scanners.html), + which are imperative C functions invoked when the starting delimiter is recognized and + imperatively lex the subsequent characters until deciding when to stop. External Scanners + support look-ahead, and support getting the current column offset (e.g. of the starting delimiter), + but do not support the kind of look-behind necessary for some of the indentation-dependent delimiters + discussed below + +- IntelliJ-Scala seems to be the only IDE that can support more sophisticated indentation-based + grammars, with it's [Flex-based Grammar specifications](https://github.com/JetBrains/intellij-scala/blob/idea252.x/scala/scala-impl/src/org/jetbrains/plugins/scala/lang/lexer/core/_ScalaCoreLexer.flex). + But doing so will be a lot more complicated than supporting `'''`-based delimiters that we did + in the prototype implementation above While it is tempting to come up with clever syntax and novel lexing strategies to add a new multi-line string syntax into the language, that is exactly the wrong approach here. Rather, From c99a6e0e33b5c55246fa9a702e5207f6d6ba06cb Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 3 Sep 2025 16:20:17 +0800 Subject: [PATCH 76/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index f3ca96ed..520ace4e 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -643,10 +643,11 @@ work on a "1D" character stream with regexes or similar frameworks: but do not support the kind of look-behind necessary for some of the indentation-dependent delimiters discussed below -- IntelliJ-Scala seems to be the only IDE that can support more sophisticated indentation-based - grammars, with it's [Flex-based Grammar specifications](https://github.com/JetBrains/intellij-scala/blob/idea252.x/scala/scala-impl/src/org/jetbrains/plugins/scala/lang/lexer/core/_ScalaCoreLexer.flex). +- IntelliJ-Scala seems to be the only IDE that may be able to support more sophisticated indentation-based + grammars, with it's [Flex-based Grammar specifications](https://github.com/JetBrains/intellij-scala/blob/idea252.x/scala/scala-impl/src/org/jetbrains/plugins/scala/lang/lexer/core/_ScalaCoreLexer.flex) + allowing stateful imperative Java code to be injected into the Lexer codegen. But doing so will be a lot more complicated than supporting `'''`-based delimiters that we did - in the prototype implementation above + in the prototype implementation above. While it is tempting to come up with clever syntax and novel lexing strategies to add a new multi-line string syntax into the language, that is exactly the wrong approach here. Rather, From 54492877c4916e17e4a258c0a28b0ee5ebfb687e Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 3 Sep 2025 16:22:11 +0800 Subject: [PATCH 77/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index 520ace4e..acff5988 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -475,6 +475,8 @@ first normalizing them, which we can see in many codebases: - [Ammonite's `Util.normalizeNewlines`](https://github.com/search?q=repo%3Acom-lihaoyi%2FAmmonite%20normalizenewlines&type=code) - [Mill's `.replaceAll("\r\n", "\n")](https://github.com/search?q=repo%3Acom-lihaoyi%2Fmill+replaceAll+%5Cr%5Cn&type=code) +See also https://youtrack.jetbrains.com/issue/SCL-19643 + For `'''` strings, we normalize all newlines to `\n` regardless of the source file contents. ### Downstream Tooling Complexity From 9d043fa323d89946e6785aefd28611c024755b2a Mon Sep 17 00:00:00 2001 From: Li Haoyi Date: Wed, 3 Sep 2025 16:25:19 +0800 Subject: [PATCH 78/78] Update dedented-string-literals.md --- content/dedented-string-literals.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/content/dedented-string-literals.md b/content/dedented-string-literals.md index acff5988..78e9a9c3 100644 --- a/content/dedented-string-literals.md +++ b/content/dedented-string-literals.md @@ -136,7 +136,8 @@ val x: "a\nb" = """a b""" ``` -See [Newline Confusion](#newline-confusion) for more details on this +See [Newline Confusion](#newline-confusion) for more details on this normalization, which is also done by +the [Java Text Block syntax](#java). Dedented string literals should be able to be used anywhere a normal `"` or triple `"""` can be used: @@ -1038,6 +1039,19 @@ String html = """ > The position of the opening """ characters has no effect on the algorithm, but the position > of the closing """ characters does have an effect if placed on its own line. +Java text blocks also support the _normalized newlines_ discussed in this proposal + +> Line terminators in the content are normalized from CR (`\u000D`) and CRLF (`\u000D\u000A`) +> to LF (`\u000A`) by the Java compiler. This ensures that the string derived from the content +> is equivalent across platforms, even if the source code has been translated to a platform +> encoding (see `javac -encoding`). + +> For example, if Java source code that was created on a Unix platform (where the line terminator +> is LF) is edited on a Windows platform (where the line terminator is CRLF), then without +> normalization, the content would become one character longer for each line. Any algorithm +> that relied on LF being the line terminator might fail, and any test that needed to verify +> string equality with `String::equals` would fail. + Java doesn't have extended delimiters like those proposed here, but requires you to escape `\"""` included in the text block using a backslash to prevent premature closing of the literal.