Add a new operator ^ for pinning of pattern variables #2951

richcarl · 2020-12-24T19:52:21Z

This allows you to annotate already-bound pattern variables as ^X, like in Elixir. This is less error prone when code is being refactored and moved around so that variables previously new in a pattern may become bound, or vice versa, and makes it easier for the reader to see the intent of the code. An new compiler flag 'warn_unpinned_vars' (disabled by default) provides warnings for all uses of already bound variables in patterns that have not been marked with ^.

Note that this branch is a preview, and is based on another branch (#2944) to avoid future complications. Only the last commit pertains to this PR.

josevalim · 2020-12-25T07:26:19Z

Hi @richcarl, I have a couple questions based on places where we need to handle ^ specially in Elixir:

What happens when you are binding the variable for the first time but it is repeated?
```
 {X, X} = {ok, ok}
```
In Elixir we don't require ^ because ^ is meant to bind to the previous value before the assignment and there is none here. This is also to avoid left-to-right semantics in pattern matching. For example, pinning only the right one would be weird.
There are two places in patterns that support guard expression instead of patterns: the size expression in binaries and map keys. Are you warning and requiring variables to be bound for them? So you have to write:
```
 %{^X => Value} = Map,
```
In binaries, you will only warn if you are matching on a previous value:
```
 Size = ...,
 <<Foo:^Size>>,
```
But not on this:
```
 <<X:X>>
```

Regardless of the direction to go, you should probably have tests around those scenarios.

There is another warning we have added to Elixir which I found extremely helpful which is to warn if an underscored variable is matched or repeated in a pattern. For example, if you are matching on Erlang AST, you may write:

some_fun({match, _Anno, {var, _Anno, Var}, _}) ->

And then this doesn't work because the underscored _Anno variables need to match. This will provide similar guarantees to the pin variable but only for underscored variables - but that's something I would say can be done by default because 99% of the cases where this happens, it is accidental.

josevalim · 2020-12-25T07:28:30Z

lib/stdlib/src/erl_lint.erl

+                  %% the stack var must be unused after processing the pattern;
+                  %% it can be used either if bound/unsafe before the try, or
+                  %% if it occurs in the class or term part of the pattern
+                  add_error(L, {stacktrace_bound,V}, St);


You have a commit from #2944 in this PR. :)

Thanks, in the hurry I forgot to mention that in the description. This is to avoid future conflicts.

richcarl · 2020-12-25T12:04:26Z

What happens when you are binding the variable for the first time but it is repeated?
 {X, X} = {ok, ok}
In Elixir we don't require ^ because ^ is meant to bind to the previous value before the assignment and there is none here. This is also to avoid left-to-right semantics in pattern matching. For example, pinning only the right one would be weird.

The current semantics does not change, so the meaning is still that these are both instances of the same new variable, with the constraint that it has to have the same value in both positions. (The compiler implements this using temporary names for both fields, adds a guard to check that they are equal, and then if they match, the value is assigned to X.)

Pinning does not apply here, because that says "evaluate this thing in the environment surrounding the pattern", and there is no previous X there. There is also no specific left/right evaluation order in patterns: conceptually, both variable instances are bound in parallel given the same current environment, and then they are checked for equality.

richcarl · 2020-12-25T12:14:24Z

There are two places in patterns that support guard expression instead of patterns: the size expression in binaries and map keys. Are you warning and requiring variables to be bound for them? So you have to write:
 %{^X => Value} = Map,

These subexpressions have the semantics of guard expressions. Their variables are already evaluated in the surrounding environment and can never be previously unbound, so pinning is never required (or even allowed). But you are right that I should add some more test cases.

richcarl · 2020-12-25T12:17:03Z

There is another warning we have added to Elixir which I found extremely helpful which is to warn if an underscored variable is matched or repeated in a pattern. For example, if you are matching on Erlang AST, you may write:
some_fun({match, _Anno, {var, _Anno, Var}, _}) ->
And then this doesn't work because the underscored _Anno variables need to match. This will provide similar guarantees to the pin variable but only for underscored variables - but that's something I would say can be done by default because 99% of the cases where this happens, it is accidental.

I think this is a good idea (the AST example has bitten me once or twice), but that's for another PR.

alavrik · 2021-01-11T14:40:13Z

A new EEP 55 was added to consider inclusion of this change: https://github.com/erlang/eep/blob/master/eeps/eep-0055.md

attah · 2021-01-12T15:02:47Z

Doesn't the purpose of this get a bit broken when the operator is optional?
Would it not match that fact better if unbound variables can be annotated to ask the compiler to tell you if it becomes an unintentional match instead?

richcarl · 2021-01-14T12:21:07Z

Doesn't the purpose of this get a bit broken when the operator is optional?

No, that's the way it needs to be introduced, to allow people with existing codebases to start using the new feature and preparing their code, but not breaking their builds with new warnings unless they enable the flag. In a subsequent major release, the warnings can be made on by default and off by option instead.

Would it not match that fact better if unbound variables can be annotated to ask the compiler to tell you if it becomes an unintentional match instead?

No, unbound variables is by far the most common case in patterns, being how you introduce new variable bindings in Erlang. You don't want to annotate each and every one of those.

RaimoNiskanen · 2021-01-14T12:49:59Z

As others have said: for Elixir this operator is essential, since they
rebind variables without it.

For Erlang, if using a pinning operator had been required from the start;
I think that would have been a bit better than the current "match
if already bound". It is hard to be sure by looking at the code
if the variable is already bound - you have to make a machine search.

Introducing a pinning operator now is trickier...

Having a compiler option to choose if pinning is allowed/required makes it
hard to know what to expect from the code. The compiler option is set in
some Makefile far away from the source code.

I think I would prefer that instead there should be a compiler pragma
(I wish it would not be allowed from an include file but that is probably
impossible to enforce) so it is visible in the current module what to
expect about operator pinning. Without the pragma the pinning operator is
not allowed, with it pinning is mandatory; not a warning - an error if
a pinning operator is missing.

You get the idea: it should be possible from the source code how to read
it, at least on the module level.

How to take the next step i.e when code not using pinning is the exception,
to remove the compiler pragma, I have not thought about yet...

ferd · 2021-01-14T13:23:19Z

No, that's the way it needs to be introduced, to allow people with existing codebases to start using the new feature and preparing their code, but not breaking their builds with new warnings unless they enable the flag. In a subsequent major release, the warnings can be made on by default and off by option instead.

I would absolutely hate that stated end state. I don't necessarily have a cogent argument there, if only that the base pattern matching semantics behind Erlang are one of the things I found the most useful and impressive about the language and getting away from there is something I can't find myself agreeing with. It's the kind of thing other languages do and then I'm forced to use them and think "it's much nicer in Erlang."

josevalim · 2021-01-14T13:30:54Z

stated end state

And I think this is the one of the most important aspect to discuss. If this is to be added but there is no agreement on the end state, then I would say it is best not to add it in the first place, rather than having two ways of writing the same code based on a compiler flag (or a pragma).

richcarl · 2021-01-14T13:47:57Z

I don't quite see what it is that you're against, though, Fred. On average, you'll only find a couple of instances of already-bound variables in any module (and the relative rarety is what makes it so much harder to spot when it actually is an important detail of the algorithm). You'd only need to mark these up as being intentional, and everything else stays exactly the same. /Richard Den tors 14 jan. 2021 kl 14:23 skrev Fred Hebert <[email protected]>:

…

No, that's the way it needs to be introduced, to allow people with existing codebases to start using the new feature and preparing their code, but not breaking their builds with new warnings unless they enable the flag. In a subsequent major release, the warnings can be made on by default and off by option instead. I would absolutely hate that stated end state. I don't necessarily have a cogent argument there, if only that the base pattern matching semantics behind Erlang are one of the things I found the most useful and impressive about the language and getting away from there is something I can't find myself agreeing with. It's the kind of thing other languages do and then I'm forced to use them and think "it's much nicer in Erlang." — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2951 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABPAAOGLKN2GWWMRTHEIDTSZ3V5RANCNFSM4VIPVJ2A> .

essen · 2021-01-14T13:56:33Z

I don't quite see what it is that you're against, though, Fred. On average, you'll only find a couple of instances of already-bound variables in any module (and the relative rarety is what makes it so much harder to spot when it actually is an important detail of the algorithm).

Not true in tests. And tests are often included in the same module as the code when using eunit so you can't just disable this for tests.

richcarl · 2021-01-14T14:07:36Z

I see how a test suite with a large number of lines with matches containing one or more already bound variables could be painful to update by hand, but on the whole I think even this kind of code would benefit from being explicit about when you're referring to a known value. It's a pity if a test stops working because you renamed a variable further up and the match-as-assert now simply passes quietly. Also for a person reading the test suite, it is hard to see what is actually being tested, until they have parsed the code enough to figure out which variables are already bound. /Richard Den tors 14 jan. 2021 kl 14:56 skrev Loïc Hoguin <[email protected]>:

…

I don't quite see what it is that you're against, though, Fred. On average, you'll only find a couple of instances of already-bound variables in any module (and the relative rarety is what makes it so much harder to spot when it actually is an important detail of the algorithm). Not true in tests. And tests are often included in the same module as the code when using eunit so you can't just disable this for tests. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2951 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABPAAIRZN3WMBCE52V2MGTSZ3Z2FANCNFSM4VIPVJ2A> .

ferd · 2021-01-14T14:27:49Z

I don't quite see what it is that you're against, though, Fred. On average, you'll only find a couple of instances of already-bound variables in any module (and the relative rarety is what makes it so much harder to spot when it actually is an important detail of the algorithm). You'd only need to mark these up as being intentional, and everything else stays exactly the same.

I generally like the power of Prolog-style unification and feel that moving towards it would be more desirable than moving away from it. I'd rather have to explicitly mark cases where you want to re-bind and be told "look, this one isn't a match", than cases where you don't want to rebind, and therefore to me this EEP moves in a direction opposite to what I'd prefer.

This is a stance that I take in a vacuum about my own preferences, where I don't have to care for other people's opinions and how confused they might feel by the unfamiliar semantics. Their views are valid and I can understand them, but I won't spend the energy to defend them here. If everything had to be in line with what felt the most welcoming and easy, we'd all be stuck using Go right now and mostly always building from a foundation of familiarity that wouldn't have given rise to Erlang itself; on the other hand without that perspective, there's probably gonna be a more limited community and adoption wouldn't pick up at all, and the sustainability would be in jeopardy.

It's a difficult balance to strike, and I don't know where the proper tradeoff is to be done. So amidst that uncertainty, I'm just voicing my own preferences.

richcarl · 2021-01-14T15:00:02Z

Den tors 14 jan. 2021 kl 15:28 skrev Fred Hebert <[email protected]

: I generally like the power of Prolog-style unification and feel that moving towards it would be more desirable than moving away from it. I'd rather have to explicitly mark cases where you want to re-bind and be told "look, this one isn't a match", than cases where you don't want to rebind, and therefore to me this EEP moves in a direction opposite to what I'd prefer.

Sounds like you're focusing on the effects in fun heads and list comprehensions, where ^-annotation would add possibilities which don't exist today. But this is far from the main point of the suggestion, and could even be dropped (but it would be an assymmetry if ^ was not allowed in those patterns as well). But for the common case of patterns in functions, =, and case/if/receive/try, there is no shadowing involved, and yet that is where ^-annotations are most important. Without annotations, you cannot read and understand a pattern until you have parsed enough of the context to know which variables are already bound - is it an assertion or just a binding? And the status of this can change without warning if someone does a simple renaming or adds what they thought was a new variable further up. When you're looking at a dozen pull requests every day, this sort of boobytrap is not fun to have around. Automatic matching on already bound variables can make short pieces of code quite elegant, but I have come to see this sort of thing as a cuteness that a language for large systems with many developers cannot afford - and I think it's also one of the little unnecessary obstacles that can turn off beginners who are trying to make sense of the examples they stumble over: "oh, you should simply have noted that X was bound up there, better luck next time". /Richard

kjnilsson · 2021-01-14T16:07:39Z

Automatic matching on already bound variables can make short pieces of code
quite elegant, but I have come to see this sort of thing as a cuteness that
a language for large systems with many developers cannot afford - and I
think it's also one of the little unnecessary obstacles that can turn off
beginners who are trying to make sense of the examples they stumble over:
"oh, you should simply have noted that X was bound up there, better luck
next time".

Absolutely agree here. This happens all the time in substantial code-bases. I even know of developers that have started favouring guards (ugh) for matching on previously bound values to avoid confusion of whether a value in a match is bound or not or if it should be or not. This way it match can be made explicit which is really valuable.

ferd · 2021-01-14T16:16:44Z

Sounds like you're focusing on the effects in fun heads and list comprehensions, where ^-annotation would add possibilities which don't exist today. But this is far from the main point of the suggestion, and could even be dropped (but it would be an assymmetry if ^ was not allowed in those patterns as well). But for the common case of patterns in functions, =, and case/if/receive/try, there is no shadowing involved, and yet that is where ^-annotations are most important. Without annotations, you cannot read and understand a pattern until you have parsed enough of the context to know which variables are already bound - is it an assertion or just a binding?

I don't really find too many of these "but what if you are reading code out of context" situations realistic. The first thing I always do in a code base is trying to figure out context because it drives everything. Performance optimizations are premature unless contextualized, fixes may break things at a distance unless they're contextualized, and naming of variables can be confusing unless they're contextualized in the domain they steep.

I never gave a crap about the arguments about "I'm refactoring and swapping lines and the , ; . stuff makes it hard to do" precisely because a change in return value never happens in isolation and always has to consider the caller to work. I won't give a crap about the argument of whether a thing is a rebinding match or not because whether it is a rebinding match or not will still need to be placed in its current context to know if a thing is useful or not, and if it's systematic I'm just asked to always provide markers for that context and intent that would be implicit otherwise.

Nothing we ever do here will remove the need for me to understand context. It will ask me to always declare my intent which arguably can be beneficial at times, but in my 10 years of Erlang I can remember 1 bug where this was very significant in a rare edge case situation (one that was too deeply nested and tricky to trigger for it to have been properly tested).

It's very possible other people feel this is very significant and important, but I personally don't.

And the status of this can change without warning if someone does a simple renaming or adds what they thought was a new variable further up. When you're looking at a dozen pull requests every day, this sort of boobytrap is not fun to have around. Automatic matching on already bound variables can make short pieces of code quite elegant, but I have come to see this sort of thing as a cuteness that a language for large systems with many developers cannot afford - and I think it's also one of the little unnecessary obstacles that can turn off beginners who are trying to make sense of the examples they stumble over: "oh, you should simply have noted that X was bound up there, better luck next time".

I haven't found that to be a significant case personally. Again, I'm not denying that experience in other people, but I'm speaking from my own position of comfort from this process. I likely have internalized the pitfall enough that I never mind it anymore.

Either way, the better systematic fix here if we really want to address the unexpected surprises would be to make these conditional expressions with matches have their own scope. I have found that almost everyone I introduced to the language just assumed that the scope of a new conditional was a new scope the way they would be in C-like languages, and that the problem with the pattern match was one of being surprised of that scope.

Arguably, introducing per-expression scopes would remove the very rare "I'm exporting the variable from all clauses so I can use it outside of the expression" case, bring in the shadowing warnings of funs and list comprehensions to case/if/receive/try, and make the use of a ^ pinning marker extremely obvious in intent with warnings otherwise. It would remove any need to specify stuff unless there is an actual risk of confusion (because the scoping rules tell you so), and require the least amount of work to get much clearer semantics. I've got this gut feeling that this current EEP hopes to get to that final result without actually fixing the scoping issues that actually underpin the perceived problem, and they'll be making me do work for the compiler that language semantics could instead address better than by having me toggle an operator every time.

ferd · 2021-01-14T16:20:15Z

I no kidding have had a lot more cases where the problem was that one of the thing I matched in the clause of a case expression is something I wanted to use externally and had to rename variables inside the case to not clash with usage after the case than when it happens beforehand, and per-expression scope there would fix that on top of the perceived problem behind this EEP. It would be a much more general solution.

Fixing the scoping creates a need for the operator. It feels backwards to create the operator to address the needs you'd have if you had the actual scoping in place, and doing one without the other feels misguided.

essen · 2021-01-14T16:32:43Z

I suspect people who find ^ most useful are also those who have to deal with larger function clauses. I do not see this being useful for my own code bases, but this is most likely very appealing for other code bases.

RaimoNiskanen · 2021-01-14T16:55:49Z

@ferd wrote:

the problem was that one of the thing I matched in the clause of a case expression is something I wanted to use externally and had to rename variables inside the case to not clash with usage after the case than when it happens beforehand,

I also encounter this every now and then, which raises a question: If you have Y partially matched in a case (not from all clauses), after the case end using ^Y would fail compilation, but a Y could be seen as either a) a deliberately new variable with no compile time problem, or, b) as both using a partially matched variable and missing the pinning operator.

I think a) could work since if you one day would match Y in all clauses you would get an error for missing the pinning operator. Ugly, though, since to utilize this you would have to conciously make sure Y is not matched in all clauses, so b) is probably the safe choice...

richcarl · 2021-01-15T13:23:03Z

Either way, the better systematic fix here if we really want to address the unexpected surprises would be to make these conditional expressions with matches have their own scope. I have found that almost everyone I introduced to the language just assumed that the scope of a new conditional was a new scope the way they would be in C-like languages, and that the problem with the pattern match was one of being surprised of that scope.

Arguably, introducing per-expression scopes would remove the very rare "I'm exporting the variable from all clauses so I can use it outside of the expression" case, bring in the shadowing warnings of funs and list comprehensions to case/if/receive/try, and make the use of a ^ pinning marker extremely obvious in intent with warnings otherwise. It would remove any need to specify stuff unless there is an actual risk of confusion (because the scoping rules tell you so), and require the least amount of work to get much clearer semantics. I've got this gut feeling that this current EEP hopes to get to that final result without actually fixing the scoping issues that actually underpin the perceived problem, and they'll be making me do work for the compiler that language semantics could instead address better than by having me toggle an operator every time.

You're getting ahead of me, but that's definitely something I and the team at WhatsApp have been discussing. But it would be a really big change to the language, and to have any hope of succeeding it would need to be done in stages, with a smooth migration path. The pinning stage is a good first step, making the meaning of each pattern exactly clear regardless of whether shadowing is used or not. I'll be following this up with other suggestions, and local scopes is one of the things we've been trying out - very little code actually uses variables that are exported from subexpressions. But if all of that was done in a big bang change, it might as well be a fork of the language because of all the adoption problems.

RaimoNiskanen · 2021-01-15T14:26:04Z

@ferd wrote:

I generally like the power of Prolog-style unification and feel that moving towards it would be more desirable than moving away from it. I'd rather have to explicitly mark cases where you want to re-bind and be told "look, this one isn't a match", than cases where you don't want to rebind, and therefore to me this EEP moves in a direction opposite to what I'd prefer.

As I have understood Prolog (never used it myself), unification is quite different from Erlang's first bind then match approach, with more symmetry between the variable occurences.

I think it would be reasonable to have a syntax for matching, as opposed to binding, since they are fundamentally different operations (in contrast to in Prolog), and matching is the rare one.

galdor · 2021-01-15T15:15:33Z

For reference, a discussion on this very topic is in progress on the erlang-questions mailing-list.

HoloRin · 2021-01-15T15:52:03Z

stated end state

And I think this is the one of the most important aspect to discuss. If this is to be added but there is no agreement on the end state, then I would say it is best not to add it in the first place, rather than having two ways of writing the same code based on a compiler flag (or a pragma).

With regard to end state, I would be happy to see this be added but remain an optional annotation. It requires no changes to existing codebases as I understand it - which means that individual developers or teams can be free to adopt it should it be a net increase of clarity in their particular case.

michaelklishin · 2021-01-15T16:01:02Z

Great to see some feedback from @josevalim. Pinning from my experience is very useful to have in Elixir.

This feature certainly would have saved team RabbitMQ hours in investigations of some subtle bugs. And sure, there were larger functions involved. It's reasonable to expect that some functions will need some nesting of cases, and there are functions that accept anonymous functions as arguments. So shadowing is a very real thing in Erlang, even in relatively small modules. And so are the pains and time waste associated with it.

Oh, and the "unified Prolog syntax" argument made me literally LOL. Erlang has a certain reputation in our industry for its syntax. Whether justified or not, let us be mindful of it. Unless we want the industrial Erlang user community to eventually reach the size of that of Prolog!

fisher · 2021-01-15T16:45:05Z

I don't like this idea at all, the very existence of such a proposal makes me upset.
Instead, I'd like Elixir to fail on x = x + 1 because there is no variable, it is a binding, and it is not an assingment, it is a pattern matching, and this is a clear, concise default.

Adding this as an optional operator makes things even worse in my opinion, because it adds ambiguity and as a consequence, unnecessary complexity. Please keep it simple.

michalmuskala · 2021-01-15T18:49:08Z

@fisher then you'd be happy that this proposal does not make this work, moreover it would make this fail at compile-time, rather than runtime as it does today. The compiler would warn you that if you indeed want to match (preserving current Erlang semantics) you should have used ^X = X + 1.

Observing the conversation around this proposal here and on the mailing list, I have a general remark. It seems that large number of people commenting have not read through the proposal and are opposing or commenting about things completely unrelated. This is not helpful.

michaelklishin · 2021-01-15T20:43:16Z

@fisher how does an explicit, opt-in way to express the exact intent of the developer add meaningfully to the complexity? What really does add to the practical complexity of using Erlang is the way things work today. Several members of my team can attest to that.

essen · 2021-01-17T08:54:45Z

@okeuday I don't think ROK has a very good reply. There's no reason there can't still be a shadow warning. When ^ is used the variable refers to the same value. Otherwise the shadow warning can still tell you to be careful about what you're doing.

Funs are not commonly used like this, so translate the example into a scoped case expression and imagine what things could be like. For example we could write something like this with scoped case expressions:

Config = case get_config() of
    no_config -> default_config();
    Config -> Config
end

Today we have to use an intermediate variable, despite Config being the same value. With scopes, we end up with two Config variables essentially, one for each scope, and they simply refer to the same value.

Now if we have scoped case (and other) expressions, we need some way to "import" the variable into the scope for matching. This would be one way. We could write the following, where the "case .. of" part is still in the parent scope and the clause is its own scope:

OldConfig = get_config(),
NewConfig = maybe_update_config(OldConfig),
case OldConfig of
    ^NewConfig -> ok;
    _ -> write_config(NewConfig)
end

There might be other ways than the operator, I don't know, but this is what I see as desirable.

Now I don't know if adding this operator without local scoping is a good idea, but I agree with Richard that local scoping would be good to have and that there needs to be some way to identify if a variable was previously bound. Perhaps the steps to get there should be something like:

Add warning about matching existing variables (optional and would help some people)
Completely disallow exporting variables from case/if expressions, make the warning an error
Add local scoping with the operator (or some other solution, but what?) and clarify and make consistent existing scoping rules
Allow the operator to be used anywhere

Steps 1 and 2 could be done today with little pushback I believe. Step 3 and 4 later on whenever there's consensus on what it would look like. Step 4 should remain optional. I personally have no strong feelings about these steps as long as usage of the operator remains optional.

okeuday · 2021-01-17T10:30:56Z

@essen Your example is clearer like this (the example below is clear without the operator):

OldConfig = get_config(),
case maybe_update_config(OldConfig) of
    OldConfig -> ok;
    NewConfig -> write_config(NewConfig)
end

Even with the operator as an optional feature, an intended use to break scopes makes the feature worse. Keeping expressions referentially transparent is important. Benefiting from the isolation of scopes, to isolate complexity is also important (more of a concern in C/C++/Java, but still important here). I haven't seen any positive benefit that the operator could provide in Erlang source code.

essen · 2021-01-17T10:46:42Z

@essen Your example is clearer like this (the example below is clear without the operator):
OldConfig = get_config(),
case maybe_update_config(OldConfig) of
    OldConfig -> ok;
    NewConfig -> write_config(NewConfig)
end

This wouldn't work under the conditions I laid out because you'd have to write ^OldConfig -> ok;, otherwise the second OldConfig is a different variable not bound and it would always match and never write the config.

slepher · 2021-01-20T07:40:28Z

Rebinding Variables in erlang could be implemented by parse_transform
maybe you want to try this:
https://github.com/slepher/astranaut#rebinding

paulo-ferraz-oliveira · 2021-01-20T12:53:17Z

EEP-55's Backwards Compatibility seems to refer to using old code on an Erlang/OTP version that already supports the pinning operator (e.g. OTP 24, 25, ...).

I'm wondering how we'd be able to use new code that uses it in older Erlang/OTP-supported software (e.g. OTP 19, 20, ...). Would a parse transform of any kind be possible, here?

richcarl · 2021-01-20T13:20:15Z

That would be forwards compatibility. Older versions of the compiler do not know about the ^, and will fail to compile such code. A parse transform could easily be made to strip them from the parse tree, but then you'd have to use a newer parser that accepts the ^ first. One possibility is a more general conversion tool that I've been working on, using the erlfmt formatter for reading and prettyprinting the source code and doing transformations like these to automatically "upgrade or downgrade" code, but it's still in an early stage. /Richard Den ons 20 jan. 2021 kl 13:53 skrev Paulo F. Oliveira < [email protected]>:

…

EEP-55's Backwards Compatibility <https://github.com/erlang/eep/blob/master/eeps/eep-0055.md#backwards-compatibility> seems to refer to using old code on an Erlang/OTP version that already supports the pinning operator (e.g. OTP 24, 25, ...). I'm wondering how we'd be able to use new code that uses it in older Erlang/OTP-supported software (e.g. OTP 19, 20, ...). Would a parse transform of any kind be possible, here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2951 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABPAAIFTRYIYMUDV4QXNADS23G47ANCNFSM4VIPVJ2A> .

michaelklishin · 2021-01-23T05:27:05Z

We now have some practical examples of what this feature could reveal in real world open source projects.

@richcarl experimented with annotating the OTP codebase and published his findings. This seems to be very promising, so @kjnilsson decided to do the same thing to one of the smaller but key RabbitMQ dependencies, Ra. We also found code that looks suspicious as a result: rabbitmq/ra#209.

In a few days worth of work, this helped reveal potential bugs in pretty complex Erlang codebases.

ranjanified · 2021-01-23T05:40:05Z

These empirical evidences are only making the case for a "Warning" stronger, to which every body is already consenting to. The divergence is over the annotation capability, as that improves programming in Erlang better and not the language itself. नमस्ते। नलिन रंजन

…

On Sat, Jan 23, 2021, 10:57 AM Michael Klishin ***@***.***> wrote: We now have some practical examples of what this feature could reveal in real world open source projects. @richcarl <https://github.com/richcarl> experimented with annotating the OTP codebase and published his findings <http://erlang.org/pipermail/erlang-questions/2021-January/100468.html>. This seems to be very promising, so @kjnilsson <https://github.com/kjnilsson> decided to do the same thing to one of the smaller but key RabbitMQ dependencies, Ra. We also found code that looks suspicious as a result: rabbitmq/ra#209 <rabbitmq/ra#209>. In a few days worth of work, this helped reveal potential bugs in pretty complex Erlang codebases. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2951 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABRD7V2FYSMQGXKTW4D6V6TS3JM4JANCNFSM4VIPVJ2A> .

This allows you to annotate already-bound pattern variables as ^X, like in Elixir. This is less error prone when code is being refactored and moved around so that variables previously new in a pattern may become bound, or vice versa, and makes it easier for the reader to see the intent of the code. An new compiler flag 'warn_unpinned_vars' (disabled by default) provides warnings for all uses of already bound variables in patterns that have not been marked with ^.

paulo-ferraz-oliveira · 2021-01-31T00:58:03Z

I tried this today on a lib. I maintain. [will try in more if I have time for it]

I liked what I saw. The ^ does improve readability for intent. [mind you, this is a VERY superficial analysis on my behalf, but I'd though about some kind of mechanism like this, in the past, so am probably "more open" to the idea]

I also found a potential issue regarding dead code, as in:

try create_table(A) of
    A -> dead_code; % :-)
    _ -> something_else
...
end.

(even if it doesn't go forward, at least it will help me find some situations that were apparently there that shouldn't be).

Also, yeah, we tend to move code a lot in refactors (not saying that this is something everybody does) and I've been bit by pattern matching errors quite a few times: one can argue "But there should be tests!", but there's not always time to do the right thing (which I'm not, again, proposing is the best way to go).

I also do maintain functions that are sometimes a couple of dozens of lines and having to figure out if a given variable was already bound before the code I copied or wrote, is sometimes a pain (think _Result, which is rather common in our code and could otherwise cause a matching error).

All in all, I'd probably like it to exist in the language (optionally, as proposed) but have not a very strong opinion on the matter. [I tried going through the whole mailing list -stuff, but it got overwhelming after a while]

fisher · 2021-01-31T12:31:27Z

I also do maintain functions that are sometimes a couple of dozens of lines and having to figure out if a given variable

It's not a variable, it does not change its value.

was already bound before the code I copied or wrote, is sometimes a pain (think _Result, which is rather common in our code and could otherwise cause a matching error).

If you don't care about the result, don't pattern match it.

richcarl · 2021-01-31T12:40:18Z

I tried this today on a lib. I maintain. [will try in more if I have time for it]

Thanks for trying it out!

I also found a potential issue regarding dead code, as in:
try create_table(A) of
    A -> dead_code; % :-)
    _ -> something_else
...
end.

I'm not clear what you mean that the issue is - if the code is dead because the pattern A could never match the result of create_table(A) (but Dialyzer could warn about that)? Or if the author thought the A in the clause pattern was a new variable, but then the _ case would be dead instead and the compiler would say so.

paulo-ferraz-oliveira · 2021-01-31T13:02:31Z

It's not a variable, it does not change its value.

:-) I don't want to get into an argument over semantics (I'm well aware of how to use variables in Erlang). Let's just say it's documented as being "a variable", even if it is single-assignment.

If you don't care about the result, don't pattern match it.

That was an example, not the whole truth. There might be good reason to name an ignored (_) variable, for readability/maintainability. It's helped us before.

Why else would people do stuff like

    case compile:file(File, Opts) of
        {ok, _Mod} ->

otherwise? [this code is from rebar3]

To me, though, that code is much more readable than

    case compile:file(File, Opts) of
        {ok, _} ->

(and especially when the return is not a simple two-element tuple).

(I also mentioned copying code between modules, which, when left not-completely-tested, may only surface issues much later - which the pinning would help surface sooner).

paulo-ferraz-oliveira · 2021-01-31T13:17:01Z

the code is dead because the pattern A could never match the result of create_table(A) (but Dialyzer could warn about that)

You are right that dialyzer "could" warn about that, but if my function depends on code not available for the analysis that'll not be so, right?

local_func(A) ->
    case remote:func(A) of
       A -> dead_code;
       _ -> ok
    end.

(@richcarl, maybe I'm missing something)

Note: I used rebar3 dialyzer with options (error_handling, underspecs, unmatched_returns) - unknown left out since the function will only be available after boot.

In the meantime, in another lib., it's helped me find another case where it was useful to re-examine the code:

A = do_this(),
case A of
    {ok, _} ->
        A = do_this(),
        done
    _ ->
        ok
end

... do_this() being executed twice for no reason (and also, with no error), but hindering performance.

richcarl · 2021-01-31T13:25:58Z

You are right that dialyzer "could" warn about that, but if my function depends on code not available for the analysis that'll not be so, right?
local_func(A) ->
    case remote:func(A) of
       A -> dead_code;
       _ -> ok
    end.

Sure, but the thing is, we can't be sure it's dead unless we know what remote:func/1 does (at least its return type). It could be that it's the identity function, for some values of A at least, and then dead_code would in fact be used.

In the meantime, in another lib., it's helped me find another case where it was useful to re-examine the code:
A = do_this(),
case A of
    {ok, _} ->
        A = do_this(),
        done
    _ ->
        ok
end
... do_this() being executed twice for no reason (and also, with no error), but hindering performance.

Yeah, I found one of those in the xmerl library as well. It can happen when copying and pasting, but not cutting out the original line. And the code quietly keeps working. :-)

paulo-ferraz-oliveira · 2021-01-31T13:31:30Z

Sure, but the thing is, we can't be sure it's dead unless we know what remote:func/1 does (at least its return type). It could be that it's the identity function, for some values of A at least, and then dead_code would in fact be used.

You're right (I'm not stating we should have an automatic way of finding out if it's dead 😄 - unless I wrapped remote:func/1 in a locally -spec(_).ed function - which would least give me the static analysis guarantee). In this case, though:

before pinning, dialyzer wouldn't complain,
after pinning, dialyzer wouldn't complain, but warn_unpinned_vars would.

After examining that bit of code I found out that it was dead. I wouldn't have, otherwise, if not for this warning: I'm not stating pinning is for that, it just happened to have helped.

Edit:
I've now re-read my message and see how it can be misleading/confusing.

"I also found a potential issue regarding dead code, as in..."
should have read
"I also found dead code in one of our lib.s, using the warn_unpinned_vars, and after analysing the warning it issued".

(not a native English speaker, sorry about that)

richcarl · 2021-01-31T16:26:40Z

after pinning, dialyzer wouldn't complain, but warn_unpinned_vars would.

After examining that bit of code I found out that it was dead. I wouldn't have, otherwise, if not for this warning: I'm not stating pinning is for that, it just happened to have helped.

Yes, the code was not obviously wrong, but definitely suspicious looking.

paulo-ferraz-oliveira · 2021-01-31T16:33:16Z

Yes, the code was not obviously wrong, but definitely suspicious looking.

That's what I meant: not wrong (in the same module, the same pattern was used 3 more times), but suspicious 😄.

In the meantime, I've ran it (the warn_unpinned_vars) over 40 something lib.s we maintain internally. Most of the time I find the warnings were for stuff like lists:keyfind (and similar) or pattern-matching in catch, thus the code was OK as was, but still I had to re-read it to understand the developer's initial intention.

My only worries would now be forward compatibility (as expressed before) and the fact that it can/should remain optional.

KennethL · 2021-02-01T14:01:58Z

The OTP team had a meeting about EEP 55 and this pull request, and we came to the following conclusion.

We will not approve this pull request for inclusion in OTP 24. OTP 24 will be released in May and before that there will be 3 release candidates with the first one on February 24:th.

We agree that there is a problem that binding when match was intended, and vice-versa,
is a source of error especially in big projects with many beginners and large functions.
This can be worth to solve.

But we need to investigate this more before OTP 25 i order to take a decision.
We also note that if a change like this should be released many other things must be in place before an introduction. Things such as: Erlang editor modes, other tools, documentation, examples, test suites. parse transforms, ...

s-ledenev · 2021-04-24T06:20:00Z

The rationale itself is absolutely concocted out of thin air. For anyone who face such "problems" I suggest to write smaller functions and think better on their code composition.

Arguments like "let's do X in our language because I like this at Y" with such flimsy arguments is absolute heresy. And defending arguments like "no context" and so on are weak!

For 20+ years I've seen this many times. Any small unnecessary addition to language itself is the first step for spoiling language. Look at C++ it was mess then and it is complete mess now! Look at the last changes of C# (ex., default implementation for interface) - mess is on the way. Erlang as the language is absolutely perfect! It has small vocabulary and simple usage rules. Don't spoil it! If anyone likes something in Elixir - use it! When in Rome, do as the Romans do!

And I am unhappily surprised that such EEP was registered at all. How could absolutely reasonable Erlang community do that? I hope this EEP would be rejected and this incident would be over and done with.

lhoguin · 2021-04-24T07:26:02Z

@s-ledenev EEPs being registered is not indicative of anything other than them being a proposal for Erlang. There are plenty of EEPs that never led to anything, regardless of their perceived merits.

TD5 · 2022-09-29T14:15:44Z

I would personally have been saved a headache if I'd had this when refactoring some legacy code and accidentally reused a variable.

We agree that there is a problem that binding when match was intended, and vice-versa, is a source of error especially in big projects with many beginners and large functions.

...

We also note that if a change like this should be released many other things must be in place before an introduction.
Things such as: Erlang editor modes, other tools, documentation, examples, test suites. parse transforms, ...

Would putting this change behind an erl_features toggle make the switch more palatable? Then we have an onboarding path so those who want it can get it quickly, but it can stay disabled for maximum compatibility by default until there has been plenty of time to add support in community tooling.

yushli · 2023-06-05T07:34:16Z

Stumble across this pull request and the corresponding EEP. Looks like it can nicely fix the variable shadowing issue in ets:fun2ms: https://erlangforums.com/t/variable-shadowing-constraint-of-match-spec-generated-using-ms-transform/2656

yushli · 2023-06-05T18:25:13Z

Is it desirable to constraint this pinning operation to be allowed just inside fun header to solve this particular variable shadowing issue?
Also this is still a valid use case for allowing pinning inside a fun head:

You need an operator to avoid shadowing issues, regardless of other uses. This example:
f(X, Y) ->
    F = fun ({a, ^Y}) -> {ok, Y};
            (_) -> error
        end,
    F(X).

josevalim reviewed Dec 25, 2020

View reviewed changes

rickard-green added the team:VM Assigned to OTP team VM label Jan 4, 2021

richcarl force-pushed the pinning-operator branch from ae2a5cf to c8ab938 Compare January 28, 2021 13:16

maxno-kivra mentioned this pull request Jan 29, 2021

Contributing to syntax tools? slepher/astranaut#11

Open

zuiderkwast mentioned this pull request May 28, 2021

Variable bindings in comprehensions #4862

Closed

potatosalad deleted the pinning-operator branch May 31, 2022 23:31

jhogberg added team:LG Assigned to OTP language group and removed team:VM Assigned to OTP team VM labels Oct 16, 2023

Add a new operator ^ for pinning of pattern variables #2951

Are you sure you want to change the base?

Add a new operator ^ for pinning of pattern variables #2951

Conversation

richcarl commented Dec 24, 2020 • edited Loading

josevalim commented Dec 25, 2020 • edited Loading

josevalim Dec 25, 2020

Choose a reason for hiding this comment

richcarl Dec 25, 2020

Choose a reason for hiding this comment

richcarl commented Dec 25, 2020

richcarl commented Dec 25, 2020

richcarl commented Dec 25, 2020

alavrik commented Jan 11, 2021

attah commented Jan 12, 2021 • edited Loading

richcarl commented Jan 14, 2021

RaimoNiskanen commented Jan 14, 2021

ferd commented Jan 14, 2021 • edited Loading

josevalim commented Jan 14, 2021 • edited Loading

richcarl commented Jan 14, 2021 via email

essen commented Jan 14, 2021

richcarl commented Jan 14, 2021 via email

ferd commented Jan 14, 2021 • edited Loading

richcarl commented Jan 14, 2021 via email

kjnilsson commented Jan 14, 2021

ferd commented Jan 14, 2021

ferd commented Jan 14, 2021 • edited Loading

essen commented Jan 14, 2021

RaimoNiskanen commented Jan 14, 2021

richcarl commented Jan 15, 2021

RaimoNiskanen commented Jan 15, 2021

galdor commented Jan 15, 2021

HoloRin commented Jan 15, 2021

michaelklishin commented Jan 15, 2021

fisher commented Jan 15, 2021

michalmuskala commented Jan 15, 2021

michaelklishin commented Jan 15, 2021

essen commented Jan 17, 2021

okeuday commented Jan 17, 2021

essen commented Jan 17, 2021

slepher commented Jan 20, 2021

paulo-ferraz-oliveira commented Jan 20, 2021

richcarl commented Jan 20, 2021 via email

michaelklishin commented Jan 23, 2021 • edited Loading

ranjanified commented Jan 23, 2021 via email

paulo-ferraz-oliveira commented Jan 31, 2021

fisher commented Jan 31, 2021

richcarl commented Jan 31, 2021 • edited Loading

paulo-ferraz-oliveira commented Jan 31, 2021 • edited Loading

paulo-ferraz-oliveira commented Jan 31, 2021

richcarl commented Jan 31, 2021

paulo-ferraz-oliveira commented Jan 31, 2021 • edited Loading

richcarl commented Jan 31, 2021

paulo-ferraz-oliveira commented Jan 31, 2021 • edited Loading

KennethL commented Feb 1, 2021

s-ledenev commented Apr 24, 2021

lhoguin commented Apr 24, 2021

TD5 commented Sep 29, 2022 • edited Loading

yushli commented Jun 5, 2023

yushli commented Jun 5, 2023 • edited Loading

richcarl commented Dec 24, 2020 •

edited

Loading

josevalim commented Dec 25, 2020 •

edited

Loading

attah commented Jan 12, 2021 •

edited

Loading

ferd commented Jan 14, 2021 •

edited

Loading

josevalim commented Jan 14, 2021 •

edited

Loading

ferd commented Jan 14, 2021 •

edited

Loading

ferd commented Jan 14, 2021 •

edited

Loading

michaelklishin commented Jan 23, 2021 •

edited

Loading

richcarl commented Jan 31, 2021 •

edited

Loading

paulo-ferraz-oliveira commented Jan 31, 2021 •

edited

Loading

paulo-ferraz-oliveira commented Jan 31, 2021 •

edited

Loading

paulo-ferraz-oliveira commented Jan 31, 2021 •

edited

Loading

TD5 commented Sep 29, 2022 •

edited

Loading

yushli commented Jun 5, 2023 •

edited

Loading