-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new operator ^ for pinning of pattern variables #2951
base: master
Are you sure you want to change the base?
Conversation
Hi @richcarl, I have a couple questions based on places where we need to handle
Regardless of the direction to go, you should probably have tests around those scenarios. There is another warning we have added to Elixir which I found extremely helpful which is to warn if an underscored variable is matched or repeated in a pattern. For example, if you are matching on Erlang AST, you may write:
And then this doesn't work because the underscored |
lib/stdlib/src/erl_lint.erl
Outdated
%% the stack var must be unused after processing the pattern; | ||
%% it can be used either if bound/unsafe before the try, or | ||
%% if it occurs in the class or term part of the pattern | ||
add_error(L, {stacktrace_bound,V}, St); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a commit from #2944 in this PR. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, in the hurry I forgot to mention that in the description. This is to avoid future conflicts.
The current semantics does not change, so the meaning is still that these are both instances of the same new variable, with the constraint that it has to have the same value in both positions. (The compiler implements this using temporary names for both fields, adds a guard to check that they are equal, and then if they match, the value is assigned to X.) Pinning does not apply here, because that says "evaluate this thing in the environment surrounding the pattern", and there is no previous X there. There is also no specific left/right evaluation order in patterns: conceptually, both variable instances are bound in parallel given the same current environment, and then they are checked for equality. |
These subexpressions have the semantics of guard expressions. Their variables are already evaluated in the surrounding environment and can never be previously unbound, so pinning is never required (or even allowed). But you are right that I should add some more test cases. |
I think this is a good idea (the AST example has bitten me once or twice), but that's for another PR. |
A new EEP 55 was added to consider inclusion of this change: https://github.com/erlang/eep/blob/master/eeps/eep-0055.md |
Doesn't the purpose of this get a bit broken when the operator is optional? |
No, that's the way it needs to be introduced, to allow people with existing codebases to start using the new feature and preparing their code, but not breaking their builds with new warnings unless they enable the flag. In a subsequent major release, the warnings can be made on by default and off by option instead.
No, unbound variables is by far the most common case in patterns, being how you introduce new variable bindings in Erlang. You don't want to annotate each and every one of those. |
As others have said: for Elixir this operator is essential, since they For Erlang, if using a pinning operator had been required from the start; Introducing a pinning operator now is trickier... Having a compiler option to choose if pinning is allowed/required makes it I think I would prefer that instead there should be a compiler pragma You get the idea: it should be possible from the source code how to read How to take the next step i.e when code not using pinning is the exception, |
I would absolutely hate that stated end state. I don't necessarily have a cogent argument there, if only that the base pattern matching semantics behind Erlang are one of the things I found the most useful and impressive about the language and getting away from there is something I can't find myself agreeing with. It's the kind of thing other languages do and then I'm forced to use them and think "it's much nicer in Erlang." |
And I think this is the one of the most important aspect to discuss. If this is to be added but there is no agreement on the end state, then I would say it is best not to add it in the first place, rather than having two ways of writing the same code based on a compiler flag (or a pragma). |
I don't quite see what it is that you're against, though, Fred. On average,
you'll only find a couple of instances of already-bound variables in any
module (and the relative rarety is what makes it so much harder to spot
when it actually is an important detail of the algorithm). You'd only need
to mark these up as being intentional, and everything else stays exactly
the same.
/Richard
Den tors 14 jan. 2021 kl 14:23 skrev Fred Hebert <[email protected]>:
… No, that's the way it needs to be introduced, to allow people with
existing codebases to start using the new feature and preparing their code,
but not breaking their builds with new warnings unless they enable the
flag. In a subsequent major release, the warnings can be made on by default
and off by option instead.
I would absolutely hate that stated end state. I don't necessarily have a
cogent argument there, if only that the base pattern matching semantics
behind Erlang are one of the things I found the most useful and impressive
about the language and getting away from there is something I can't find
myself agreeing with. It's the kind of thing other languages do and then
I'm forced to use them and think "it's much nicer in Erlang."
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2951 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABPAAOGLKN2GWWMRTHEIDTSZ3V5RANCNFSM4VIPVJ2A>
.
|
Not true in tests. And tests are often included in the same module as the code when using eunit so you can't just disable this for tests. |
I see how a test suite with a large number of lines with matches containing
one or more already bound variables could be painful to update by hand, but
on the whole I think even this kind of code would benefit from being
explicit about when you're referring to a known value. It's a pity if a
test stops working because you renamed a variable further up and the
match-as-assert now simply passes quietly.
Also for a person reading the test suite, it is hard to see what is
actually being tested, until they have parsed the code enough to figure out
which variables are already bound.
/Richard
Den tors 14 jan. 2021 kl 14:56 skrev Loïc Hoguin <[email protected]>:
… I don't quite see what it is that you're against, though, Fred. On
average, you'll only find a couple of instances of already-bound variables
in any module (and the relative rarety is what makes it so much harder to
spot when it actually is an important detail of the algorithm).
Not true in tests. And tests are often included in the same module as the
code when using eunit so you can't just disable this for tests.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2951 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABPAAIRZN3WMBCE52V2MGTSZ3Z2FANCNFSM4VIPVJ2A>
.
|
I generally like the power of Prolog-style unification and feel that moving towards it would be more desirable than moving away from it. I'd rather have to explicitly mark cases where you want to re-bind and be told "look, this one isn't a match", than cases where you don't want to rebind, and therefore to me this EEP moves in a direction opposite to what I'd prefer. This is a stance that I take in a vacuum about my own preferences, where I don't have to care for other people's opinions and how confused they might feel by the unfamiliar semantics. Their views are valid and I can understand them, but I won't spend the energy to defend them here. If everything had to be in line with what felt the most welcoming and easy, we'd all be stuck using Go right now and mostly always building from a foundation of familiarity that wouldn't have given rise to Erlang itself; on the other hand without that perspective, there's probably gonna be a more limited community and adoption wouldn't pick up at all, and the sustainability would be in jeopardy. It's a difficult balance to strike, and I don't know where the proper tradeoff is to be done. So amidst that uncertainty, I'm just voicing my own preferences. |
Den tors 14 jan. 2021 kl 15:28 skrev Fred Hebert <[email protected]
:
I generally like the power of Prolog-style unification and feel that
moving towards it would be more desirable than moving away from it. I'd
rather have to explicitly mark cases where you want to re-bind and be told
"look, this one isn't a match", than cases where you don't want to rebind,
and therefore to me this EEP moves in a direction opposite to what I'd
prefer.
Sounds like you're focusing on the effects in fun heads and list
comprehensions, where ^-annotation would add possibilities which don't
exist today. But this is far from the main point of the suggestion, and
could even be dropped (but it would be an assymmetry if ^ was not allowed
in those patterns as well). But for the common case of patterns in
functions, =, and case/if/receive/try, there is no shadowing involved, and
yet that is where ^-annotations are most important. Without annotations,
you cannot read and understand a pattern until you have parsed enough of
the context to know which variables are already bound - is it an assertion
or just a binding? And the status of this can change without warning if
someone does a simple renaming or adds what they thought was a new variable
further up. When you're looking at a dozen pull requests every day, this
sort of boobytrap is not fun to have around.
Automatic matching on already bound variables can make short pieces of code
quite elegant, but I have come to see this sort of thing as a cuteness that
a language for large systems with many developers cannot afford - and I
think it's also one of the little unnecessary obstacles that can turn off
beginners who are trying to make sense of the examples they stumble over:
"oh, you should simply have noted that X was bound up there, better luck
next time".
/Richard
|
Absolutely agree here. This happens all the time in substantial code-bases. I even know of developers that have started favouring guards (ugh) for matching on previously bound values to avoid confusion of whether a value in a match is bound or not or if it should be or not. This way it match can be made explicit which is really valuable. |
I don't really find too many of these "but what if you are reading code out of context" situations realistic. The first thing I always do in a code base is trying to figure out context because it drives everything. Performance optimizations are premature unless contextualized, fixes may break things at a distance unless they're contextualized, and naming of variables can be confusing unless they're contextualized in the domain they steep. I never gave a crap about the arguments about "I'm refactoring and swapping lines and the Nothing we ever do here will remove the need for me to understand context. It will ask me to always declare my intent which arguably can be beneficial at times, but in my 10 years of Erlang I can remember 1 bug where this was very significant in a rare edge case situation (one that was too deeply nested and tricky to trigger for it to have been properly tested). It's very possible other people feel this is very significant and important, but I personally don't.
I haven't found that to be a significant case personally. Again, I'm not denying that experience in other people, but I'm speaking from my own position of comfort from this process. I likely have internalized the pitfall enough that I never mind it anymore. Either way, the better systematic fix here if we really want to address the unexpected surprises would be to make these conditional expressions with matches have their own scope. I have found that almost everyone I introduced to the language just assumed that the scope of a new conditional was a new scope the way they would be in C-like languages, and that the problem with the pattern match was one of being surprised of that scope. Arguably, introducing per-expression scopes would remove the very rare "I'm exporting the variable from all clauses so I can use it outside of the expression" case, bring in the shadowing warnings of funs and list comprehensions to case/if/receive/try, and make the use of a |
I no kidding have had a lot more cases where the problem was that one of the thing I matched in the clause of a case expression is something I wanted to use externally and had to rename variables inside the case to not clash with usage after the case than when it happens beforehand, and per-expression scope there would fix that on top of the perceived problem behind this EEP. It would be a much more general solution. Fixing the scoping creates a need for the operator. It feels backwards to create the operator to address the needs you'd have if you had the actual scoping in place, and doing one without the other feels misguided. |
I suspect people who find |
@ferd wrote:
I also encounter this every now and then, which raises a question: If you have I think a) could work since if you one day would match |
You're getting ahead of me, but that's definitely something I and the team at WhatsApp have been discussing. But it would be a really big change to the language, and to have any hope of succeeding it would need to be done in stages, with a smooth migration path. The pinning stage is a good first step, making the meaning of each pattern exactly clear regardless of whether shadowing is used or not. I'll be following this up with other suggestions, and local scopes is one of the things we've been trying out - very little code actually uses variables that are exported from subexpressions. But if all of that was done in a big bang change, it might as well be a fork of the language because of all the adoption problems. |
@ferd wrote:
As I have understood Prolog (never used it myself), unification is quite different from Erlang's first bind then match approach, with more symmetry between the variable occurences. I think it would be reasonable to have a syntax for matching, as opposed to binding, since they are fundamentally different operations (in contrast to in Prolog), and matching is the rare one. |
For reference, a discussion on this very topic is in progress on the erlang-questions mailing-list. |
With regard to end state, I would be happy to see this be added but remain an optional annotation. It requires no changes to existing codebases as I understand it - which means that individual developers or teams can be free to adopt it should it be a net increase of clarity in their particular case. |
Great to see some feedback from @josevalim. Pinning from my experience is very useful to have in Elixir. This feature certainly would have saved team RabbitMQ hours in investigations of some subtle bugs. And sure, there were larger functions involved. It's reasonable to expect that some functions will need some nesting of Oh, and the "unified Prolog syntax" argument made me literally LOL. Erlang has a certain reputation in our industry for its syntax. Whether justified or not, let us be mindful of it. Unless we want the industrial Erlang user community to eventually reach the size of that of Prolog! |
I don't like this idea at all, the very existence of such a proposal makes me upset. Adding this as an optional operator makes things even worse in my opinion, because it adds ambiguity and as a consequence, unnecessary complexity. Please keep it simple. |
@fisher then you'd be happy that this proposal does not make this work, moreover it would make this fail at compile-time, rather than runtime as it does today. The compiler would warn you that if you indeed want to match (preserving current Erlang semantics) you should have used Observing the conversation around this proposal here and on the mailing list, I have a general remark. It seems that large number of people commenting have not read through the proposal and are opposing or commenting about things completely unrelated. This is not helpful. |
@fisher how does an explicit, opt-in way to express the exact intent of the developer add meaningfully to the complexity? What really does add to the practical complexity of using Erlang is the way things work today. Several members of my team can attest to that. |
@okeuday I don't think ROK has a very good reply. There's no reason there can't still be a shadow warning. When Funs are not commonly used like this, so translate the example into a scoped case expression and imagine what things could be like. For example we could write something like this with scoped case expressions: Config = case get_config() of
no_config -> default_config();
Config -> Config
end Today we have to use an intermediate variable, despite Now if we have scoped case (and other) expressions, we need some way to "import" the variable into the scope for matching. This would be one way. We could write the following, where the "case .. of" part is still in the parent scope and the clause is its own scope: OldConfig = get_config(),
NewConfig = maybe_update_config(OldConfig),
case OldConfig of
^NewConfig -> ok;
_ -> write_config(NewConfig)
end There might be other ways than the operator, I don't know, but this is what I see as desirable. Now I don't know if adding this operator without local scoping is a good idea, but I agree with Richard that local scoping would be good to have and that there needs to be some way to identify if a variable was previously bound. Perhaps the steps to get there should be something like:
Steps 1 and 2 could be done today with little pushback I believe. Step 3 and 4 later on whenever there's consensus on what it would look like. Step 4 should remain optional. I personally have no strong feelings about these steps as long as usage of the operator remains optional. |
@essen Your example is clearer like this (the example below is clear without the operator): OldConfig = get_config(),
case maybe_update_config(OldConfig) of
OldConfig -> ok;
NewConfig -> write_config(NewConfig)
end Even with the operator as an optional feature, an intended use to break scopes makes the feature worse. Keeping expressions referentially transparent is important. Benefiting from the isolation of scopes, to isolate complexity is also important (more of a concern in C/C++/Java, but still important here). I haven't seen any positive benefit that the operator could provide in Erlang source code. |
This wouldn't work under the conditions I laid out because you'd have to write |
Rebinding Variables in erlang could be implemented by parse_transform |
EEP-55's Backwards Compatibility seems to refer to using old code on an Erlang/OTP version that already supports the pinning operator (e.g. OTP 24, 25, ...). I'm wondering how we'd be able to use new code that uses it in older Erlang/OTP-supported software (e.g. OTP 19, 20, ...). Would a parse transform of any kind be possible, here? |
That would be forwards compatibility. Older versions of the compiler do not
know about the ^, and will fail to compile such code. A parse transform
could easily be made to strip them from the parse tree, but then you'd have
to use a newer parser that accepts the ^ first. One possibility is a more
general conversion tool that I've been working on, using the erlfmt
formatter for reading and prettyprinting the source code and doing
transformations like these to automatically "upgrade or downgrade" code,
but it's still in an early stage.
/Richard
Den ons 20 jan. 2021 kl 13:53 skrev Paulo F. Oliveira <
[email protected]>:
… EEP-55's Backwards Compatibility
<https://github.com/erlang/eep/blob/master/eeps/eep-0055.md#backwards-compatibility>
seems to refer to using old code on an Erlang/OTP version that already
supports the pinning operator (e.g. OTP 24, 25, ...).
I'm wondering how we'd be able to use new code that uses it in older
Erlang/OTP-supported software (e.g. OTP 19, 20, ...). Would a parse
transform of any kind be possible, here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2951 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABPAAIFTRYIYMUDV4QXNADS23G47ANCNFSM4VIPVJ2A>
.
|
We now have some practical examples of what this feature could reveal in real world open source projects. @richcarl experimented with annotating the OTP codebase and published his findings. This seems to be very promising, so @kjnilsson decided to do the same thing to one of the smaller but key RabbitMQ dependencies, Ra. We also found code that looks suspicious as a result: rabbitmq/ra#209. In a few days worth of work, this helped reveal potential bugs in pretty complex Erlang codebases. |
These empirical evidences are only making the case for a "Warning"
stronger, to which every body is already consenting to.
The divergence is over the annotation capability, as that improves
programming in Erlang better and not the language itself.
नमस्ते।
नलिन रंजन
…On Sat, Jan 23, 2021, 10:57 AM Michael Klishin ***@***.***> wrote:
We now have some practical examples of what this feature could reveal in
real world open source projects.
@richcarl <https://github.com/richcarl> experimented with annotating the
OTP codebase and published his findings
<http://erlang.org/pipermail/erlang-questions/2021-January/100468.html>.
This seems to be very promising, so @kjnilsson
<https://github.com/kjnilsson> decided to do the same thing to one of the
smaller but key RabbitMQ dependencies, Ra. We also found code that looks
suspicious as a result: rabbitmq/ra#209
<rabbitmq/ra#209>.
In a few days worth of work, this helped reveal potential bugs in pretty
complex Erlang codebases.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2951 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRD7V2FYSMQGXKTW4D6V6TS3JM4JANCNFSM4VIPVJ2A>
.
|
This allows you to annotate already-bound pattern variables as ^X, like in Elixir. This is less error prone when code is being refactored and moved around so that variables previously new in a pattern may become bound, or vice versa, and makes it easier for the reader to see the intent of the code. An new compiler flag 'warn_unpinned_vars' (disabled by default) provides warnings for all uses of already bound variables in patterns that have not been marked with ^.
ae2a5cf
to
c8ab938
Compare
I tried this today on a lib. I maintain. [will try in more if I have time for it] I liked what I saw. The I also found a potential issue regarding dead code, as in: try create_table(A) of
A -> dead_code; % :-)
_ -> something_else
...
end. (even if it doesn't go forward, at least it will help me find some situations that were apparently there that shouldn't be). Also, yeah, we tend to move code a lot in refactors (not saying that this is something everybody does) and I've been bit by pattern matching errors quite a few times: one can argue "But there should be tests!", but there's not always time to do the right thing (which I'm not, again, proposing is the best way to go). I also do maintain functions that are sometimes a couple of dozens of lines and having to figure out if a given variable was already bound before the code I copied or wrote, is sometimes a pain (think All in all, I'd probably like it to exist in the language (optionally, as proposed) but have not a very strong opinion on the matter. [I tried going through the whole mailing list -stuff, but it got overwhelming after a while] |
It's not a variable, it does not change its value.
If you don't care about the result, don't pattern match it. |
Thanks for trying it out!
I'm not clear what you mean that the issue is - if the code is dead because the pattern A could never match the result of create_table(A) (but Dialyzer could warn about that)? Or if the author thought the A in the clause pattern was a new variable, but then the |
:-) I don't want to get into an argument over semantics (I'm well aware of how to use variables in Erlang). Let's just say it's documented as being "a variable", even if it is single-assignment.
That was an example, not the whole truth. There might be good reason to name an ignored ( Why else would people do stuff like case compile:file(File, Opts) of
{ok, _Mod} -> otherwise? [this code is from To me, though, that code is much more readable than case compile:file(File, Opts) of
{ok, _} -> (and especially when the return is not a simple two-element tuple). (I also mentioned copying code between modules, which, when left not-completely-tested, may only surface issues much later - which the pinning would help surface sooner). |
You are right that
(@richcarl, maybe I'm missing something) Note: I used rebar3 dialyzer with options ( In the meantime, in another lib., it's helped me find another case where it was useful to re-examine the code:
... |
Sure, but the thing is, we can't be sure it's dead unless we know what remote:func/1 does (at least its return type). It could be that it's the identity function, for some values of A at least, and then dead_code would in fact be used.
Yeah, I found one of those in the xmerl library as well. It can happen when copying and pasting, but not cutting out the original line. And the code quietly keeps working. :-) |
You're right (I'm not stating we should have an automatic way of finding out if it's dead 😄 - unless I wrapped
After examining that bit of code I found out that it was dead. I wouldn't have, otherwise, if not for this warning: I'm not stating pinning is for that, it just happened to have helped. Edit: "I also found a potential issue regarding dead code, as in..." (not a native English speaker, sorry about that) |
Yes, the code was not obviously wrong, but definitely suspicious looking. |
That's what I meant: not wrong (in the same module, the same pattern was used 3 more times), but suspicious 😄. In the meantime, I've ran it (the My only worries would now be forward compatibility (as expressed before) and the fact that it can/should remain optional. |
The OTP team had a meeting about EEP 55 and this pull request, and we came to the following conclusion. We will not approve this pull request for inclusion in OTP 24. OTP 24 will be released in May and before that there will be 3 release candidates with the first one on February 24:th. We agree that there is a problem that binding when match was intended, and vice-versa, But we need to investigate this more before OTP 25 i order to take a decision. |
The rationale itself is absolutely concocted out of thin air. For anyone who face such "problems" I suggest to write smaller functions and think better on their code composition. Arguments like "let's do X in our language because I like this at Y" with such flimsy arguments is absolute heresy. And defending arguments like "no context" and so on are weak! For 20+ years I've seen this many times. Any small unnecessary addition to language itself is the first step for spoiling language. Look at C++ it was mess then and it is complete mess now! Look at the last changes of C# (ex., default implementation for interface) - mess is on the way. Erlang as the language is absolutely perfect! It has small vocabulary and simple usage rules. Don't spoil it! If anyone likes something in Elixir - use it! When in Rome, do as the Romans do! And I am unhappily surprised that such EEP was registered at all. How could absolutely reasonable Erlang community do that? I hope this EEP would be rejected and this incident would be over and done with. |
@s-ledenev EEPs being registered is not indicative of anything other than them being a proposal for Erlang. There are plenty of EEPs that never led to anything, regardless of their perceived merits. |
I would personally have been saved a headache if I'd had this when refactoring some legacy code and accidentally reused a variable.
Would putting this change behind an |
Stumble across this pull request and the corresponding EEP. Looks like it can nicely fix the variable shadowing issue in ets:fun2ms: https://erlangforums.com/t/variable-shadowing-constraint-of-match-spec-generated-using-ms-transform/2656 |
Is it desirable to constraint this pinning operation to be allowed just inside fun header to solve this particular variable shadowing issue?
|
This allows you to annotate already-bound pattern variables as ^X, like in Elixir. This is less error prone when code is being refactored and moved around so that variables previously new in a pattern may become bound, or vice versa, and makes it easier for the reader to see the intent of the code. An new compiler flag 'warn_unpinned_vars' (disabled by default) provides warnings for all uses of already bound variables in patterns that have not been marked with ^.
Note that this branch is a preview, and is based on another branch (#2944) to avoid future complications. Only the last commit pertains to this PR.