Skip to content

Commit 11b93a4

Browse files
author
Jonathan Turner
authored
Merge pull request #4 from LhKipp/typeDeduction
Type deduction RFC
2 parents 8c228dc + ad3c2cb commit 11b93a4

File tree

1 file changed

+201
-0
lines changed

1 file changed

+201
-0
lines changed

text/0004-type-deduction.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
- Feature Name: type_deduction
2+
- Start Date: 2020-10-13
3+
- RFC PR: [nushell/rfcs#0004](https://github.com/nushell/rfcs/pull/4)
4+
- Nushell Issue: [nushell/nushell#0000](https://github.com/nushell/nushell/issues/0000)
5+
6+
# Summary
7+
8+
[summary]: #summary
9+
10+
The purpose of this RFC is to explore how type deduction can be implemented for nushell. Type deduction is helpful for e.G. aliases with typed variables, static analysis...
11+
12+
# Motivation
13+
14+
[motivation]: #motivation
15+
16+
Currently no module for the sake of type deduction is implemented. Groundwork for such a module has been layed out in alias.rs. However, the code in alias.rs doesn't handle all cases.
17+
18+
# Guide-level explanation
19+
20+
[guide-level-explanation]: #guide-level-explanation
21+
22+
Given a set of variables and a block of code, the purpose of the type deduction module (TDM) is to infer all possible types a variable can have so that the block of code is still valid. In nushell a "type" is represented by a SyntaxShape variant.
23+
The TDM infers by looking in which kind of expression (following: containing expression) a variable is used. Then it checks which expressions (following: sub expressions) are allowed inside the containing expression at the position of the variable and maps them to their corresponding SyntaxShape variant.
24+
For most cases it is sufficient to look at the nushell grammar, to figure out the allowed sub-expressions. E.G. `echo 1..$var` $var is of type Int as only Int is allowed in a range expression.
25+
Special cases are listed below.
26+
27+
## As a commands mandatory positional argument | As a argument to flag
28+
```shell Example positional argument
29+
ls | where $filter
30+
$filter -> SyntaxShape::Math
31+
```
32+
```shell Example named argument
33+
cal --full-year $year
34+
$year -> SyntaxShape::Int
35+
```
36+
37+
The shape of a variable used as a positional argument can be infered from the command signature.
38+
If the signature of the command is not available, no inference will be done. At the time of writing, no signature will be available for external commands.
39+
40+
## As a commands optional positional argument
41+
Note: At the time of writing this rfcs, optional positional arguments are parsed in order. Meaning: No optional positional argument can be left out. Trickier deductions might be needed, if the parser would handle optional arguments differently.
42+
```shell One optional variable
43+
git log signature: git log [<options>] [<revision range>] [[--] <path>...]
44+
git log $arg ./src/
45+
```
46+
One can infer that $arg has to be a revision range, as there is no other possibility.
47+
48+
```shell Multiple variables
49+
cmd signature: cmd [<FilePath>] [<FileSize>] [<Block>] [<Int>...]
50+
cmd $a1 $a2
51+
```
52+
One can infer that $a1 is FilePath and $a2 is FileSize
53+
54+
## As a part of a column in a table
55+
Example
56+
```shell
57+
echo [[names, $ranking_col_name]; [$best_shell, 1] [fish, 2] [zsh, $zsh_ranking] [bash, 4]]
58+
$ranking_col_name -> SyntaxShape::String
59+
$best_shell -> SyntaxShape::Any
60+
$zsh_ranking -> SyntaxShape::Any
61+
```
62+
The types within a column are heterogeneous. Therefore one can't deduce the type of a variable in a column. For variables in table headers, String seems as the most applicable SyntaxShape.
63+
64+
## As part of a binary expression
65+
A distinction has to be made by the operator in use (and depending on the operator additionaly the side on which the variable appears).
66+
67+
### Operator && || (Logical Operators)
68+
```shell Example
69+
ls | where $it.name == LICENSE || $catch_all
70+
$catch_all -> SyntaxShape::Boolean
71+
```
72+
The variable can be of any type that can be coerced to a boolean value.
73+
74+
### Operator In NotIn
75+
#### Variable on right side
76+
```shell Example
77+
ls | where name in $values
78+
$values -> SyntaxShape::Table
79+
```
80+
### Operator In NotIn
81+
#### Variable on left side
82+
```shell Example
83+
ls | where $value in [...]
84+
$value -> All SyntaxShapes present in the Table ([...])
85+
```
86+
87+
### Operator Plus Minus
88+
```shell Example 1
89+
ls | where size < 1kb + $offset
90+
$offset -> SyntaxShape::Unit
91+
```
92+
```shell Example 2
93+
ls | where 1.5 < 1 + $var
94+
$var -> SyntaxShape::Number, SyntaxShape::Int
95+
```
96+
The shape of the variable is Unit if other side of the binary expression is Unit, Number or Int otherwise.
97+
98+
### Operator Multiply Divide
99+
```shell Example
100+
ls | where size < 1.5 * $size
101+
$size -> SyntaxShape::Unit
102+
```
103+
In general: the variable can be one of Int, Number, Unit.
104+
The variable can't be Unit if the result expression has a undefined unit type (e.G. Unit * Unit or Int / Unit gives an undefined unit type for every unit nu currently implements).
105+
The variable must be of Unit type, if the result type of the binary expression is used as a Unit type and the not variable side of the binary is a Number or Unit (see example above).
106+
107+
### Operator Contains NotContains
108+
The variable must be of type String.
109+
110+
111+
## Correct Inference for dependencies
112+
Extra care has to be taken when deducing in binary expressions, with an Operator (like Plus, Minus, Multiply, Divide, In (var on lhs), NotIn (var on lhs)) where the variable type depends on the type of the other side.
113+
114+
```shell Variables on both sides
115+
ls | where size < $a * $b; kill $a
116+
$a -> SyntaxShape::Int
117+
$b -> SyntaxShape::Unit
118+
```
119+
```shell Variable on one side, math expression with variables on other
120+
ls | where size < $a * ($b * $c); kill $b $c
121+
$a -> SyntaxShape::Unit
122+
$b -> SyntaxShape::Int
123+
$c -> SyntaxShape::Int
124+
```
125+
126+
At the time of traversing the AST, any deduction for any variable may not be present, and if present may not be complete. Therefore constellations in which the deduction of one variable depends on the deduction of another one (we will call these constellations 'dependencies'), have to be postponed.
127+
128+
As soon as the AST is traversed, the dependencies have to be resolved.
129+
Please note: One might think of a dependency as a edge in a directed graph, with variables as nodes. Every Node with no outgoing edge is completly deduced. Every Node with an edge towards such a completly deduced node can then be inferred. Nodes with cycles (e.G. `ls | where size < $a * $b` $a depends on $b, $b depends on $a) can't be deduced completly.
130+
It is in question what to return here best dependend on the operator in use. One might solves this elegantly by returning the dependency and let the caller handle it (if needed).
131+
(Note: One might also think about this not as a graph problem, but as an CSP.)
132+
133+
## Merging of deductions.
134+
Any variable can occur multiple times within the block. In each position the variable can have different possible Types.
135+
```shell Example
136+
config | where size < 1kb * $a; kill $a
137+
$a -> SyntaxShape::Int
138+
```
139+
In the first position, $a might be SyntaxShape::Int or SyntaxShape::Number. In the second it is of SyntaxShape::Int.
140+
The result set of possible types for a variable can be computed as the set intersection between the already deduced types and the new deduced types. If the intersection is empty, it means that the variable has usages with different and non compatible types. An error is returned.
141+
142+
```rust
143+
fn checked_insert(existing_deductions, new_deductions) -> Result<set_of_deductions, ShellError>{
144+
if existing_deductions == None return new_deductions
145+
else return set_intersection(existing_deductions, new_deductions)?
146+
}
147+
148+
existing_deductions = checked_insert(existing_deductions, new_deductions);
149+
```
150+
151+
Special cases arise if one (or both) sets contains SyntaxShape::Any. SyntaxShape::Any is a placeholder for any possible type. The above pseudo code has to be changed to:
152+
```rust
153+
fn checked_insert(existing_deductions, new_deductions) -> Result<set_of_deductions, ShellError>{
154+
if existing_deductions == None return new_deductions
155+
else match (has_any(existing_deductions), has_any(new_deductions)){
156+
(true, true) -> set_union_including_any(existing_deductions, new_deductions)
157+
(true, false) -> set_union_excluding_any(new_deductions, set_intersection(existing_deductions, new_deductions))
158+
(false, true) -> set_union_excluding_any(existing_deductions, set_intersection(existing_deductions, new_deductions))
159+
(false, false) -> set_intersection(existing_deductions, new_deductions)?
160+
}
161+
}
162+
163+
existing_deductions = checked_insert(existing_deductions, new_deductions);
164+
```
165+
166+
# Reference-level explanation
167+
168+
[reference-level-explanation]: #reference-level-explanation
169+
170+
A start may be: https://github.com/nushell/nushell/pull/2685
171+
172+
# Drawbacks
173+
174+
[drawbacks]: #drawbacks
175+
176+
None I can think of.
177+
178+
# Rationale and alternatives
179+
180+
[rationale-and-alternatives]: #rationale-and-alternatives
181+
182+
# Prior art
183+
184+
[prior-art]: #prior-art
185+
186+
# Unresolved questions
187+
188+
[unresolved-questions]: #unresolved-questions
189+
190+
## Deducing of result type of column paths
191+
Currently it is not possible to infer the shape of column paths pointing to a table given by a prior command in the pipeline.
192+
Input and Output types of each command would be needed to support this deduction.
193+
```shell $size nor $name are deducable
194+
ls | where size == $size | get name | where $it == $name
195+
```
196+
197+
It is yet not clear how to best integrate this into nushell. Some thoughts can be found at: https://github.com/nushell/nushell/pull/2486#issuecomment-687704131.
198+
199+
# Future possibilities
200+
201+
[future-possibilities]: #future-possibilities

0 commit comments

Comments
 (0)