forked from dsyme/FSharp.Compiler.Service.HandsOn
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbonus untypedtree.fsx
242 lines (213 loc) · 10.9 KB
/
bonus untypedtree.fsx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
#I "packages/FSharp.Compiler.Service.0.0.73/lib/net45/"
(**
Compiler Services: Processing untyped syntax tree
=================================================
This tutorial demonstrates how to get the untyped abstract syntax tree (AST)
for F# code and how to walk over the tree. This can be used for creating tools
such as code formatter, basic refactoring or code navigation tools. The untyped
syntax tree contains information about the code structure, but does not contain
types and there are some ambiguities that are resolved only later by the type
checker. You can also combine the untyped AST information with the API available
from [editor services](editor.html).
> **NOTE:** The API used below is experimental and subject to change when later versions of the nuget package are published
Getting the untyped AST
-----------------------
To access the untyped AST, you need to create an instance of `InteractiveChecker`.
This type represents a context for type checking and parsing and corresponds either
to a stand-alone F# script file (e.g. opened in Visual Studio) or to a loaded project
file with multiple files. Once you have an instance of `InteractiveChecker`, you can
use it to perform "untyped parse" which is the first step of type-checking. The
second phase is "typed parse" and is used by [editor services](editor.html).
To use the interactive checker, reference `FSharp.Compiler.Service.dll` and open the
`SourceCodeServices` namespace:
*)
#r "FSharp.Compiler.Service.dll"
open System
open Microsoft.FSharp.Compiler.SourceCodeServices
(**
### Performing untyped parse
The untyped parse operation is very fast (compared to type checking, which can
take notable amount of time) and so we can perform it synchronously. First, we
need to create `InteractiveChecker` - the constructor takes an argument that
can be used to notify the checker about file changes (which we ignore).
*)
// Create an interactive checker instance
let checker = FSharpChecker.Create()
(**
To get the AST, we define a function that takes file name and the source code
(the file is only used for location information and does not have to exist).
We first need to get "interactive checker options" which represents the context.
For simple tasks, you can use `GetProjectOptionsFromScriptRoot` which infers
the context for a script file. Then we use the `ParseFileInProject` method and
return the `ParseTree` property:
*)
/// Get untyped tree for a specified input
let getUntypedTree (file, input) =
// Get compiler options for the 'project' implied by a single script file
let projOptions =
checker.GetProjectOptionsFromScript(file, input)
|> Async.RunSynchronously
// Run the first phase (untyped parsing) of the compiler
let parseFileResults =
checker.ParseFileInProject(file, input, projOptions)
|> Async.RunSynchronously
match parseFileResults.ParseTree with
| Some tree -> tree
| None -> failwith "Something went wrong during parsing!"
(**
The documentation for `InteractiveChecker` is currently work in progress, but
you can find some information in the inline comments in the F# source code
repository ([see `service.fsi` source code](https://github.com/fsharp/fsharp/blob/fsharp_31/src/fsharp/vs/service.fsi)).
Walking over the AST
--------------------
The abstract syntax tree is defined as a number of discriminated unions that represent
different syntactical elements (such as expressions, patterns, declarations etc.). The best
way to understand the AST is to look at the definitions in [`ast.fs` in the source
code](https://github.com/fsharp/fsharp/blob/master/src/fsharp/ast.fs#L464).
The relevant parts are in the following namespace:
*)
open Microsoft.FSharp.Compiler.Ast
(**
When processing the AST, you will typically write a number of mutually recursive functions
that pattern match on the different syntactical elements. There is a number of elements
that need to be supported - the top-level element is module or namespace declaration,
containing declarations inside a module (let bindings, types etc.). A let declaration inside
a module then contains expression, which can contain patterns.
### Walking over patterns and expressions
We start by looking at functions that walk over expressions and patterns - as we walk,
we print information about the visited elements. For patterns, the input is of type
`SynPat` and has a number of cases including `Wild` (for `_` pattern), `Named` (for
`<pat> as name`) and `LongIdent` (for a `Foo.Bar` name). Note that the parsed pattern
is occasionally more complex than what is in the source code (in particular, `Named` is
used more often):
*)
/// Walk over a pattern - this is for example used in
/// let <pat> = <expr> or in the 'match' expression
let rec visitPattern = function
| SynPat.Wild(_) ->
printfn " .. underscore pattern"
| SynPat.Named(pat, name, _, _, _) ->
visitPattern pat
printfn " .. named as '%s'" name.idText
| SynPat.LongIdent(LongIdentWithDots(ident, _), _, _, _, _, _) ->
let names = String.concat "." [ for i in ident -> i.idText ]
printfn " .. identifier: %s" names
| pat -> printfn " .. other pattern: %A" pat
(**
The function is recursive (for nested patterns such as `(foo, _) as bar`), but it does not
call any of the functions defined later (because patterns cannot contain other syntactical
elements).
The next function iterates over expressions - this is where most of the work would be and
there are around 20 cases to cover (type `SynExpr.` and you'll get completion with other
options). In the following, we only show how to handle `if .. then ..` and `let .. = ...`:
*)
/// Walk over an expression - if expression contains two or three
/// sub-expressions (two if the 'else' branch is missing), let expression
/// contains pattern and two sub-expressions
let rec visitExpression = function
| SynExpr.IfThenElse(cond, trueBranch, falseBranchOpt, _, _, _, _) ->
// Visit all sub-expressions
printfn "Conditional:"
visitExpression cond
visitExpression trueBranch
falseBranchOpt |> Option.iter visitExpression
| SynExpr.LetOrUse(_, _, bindings, body, _) ->
// Visit bindings (there may be multiple
// for 'let .. = .. and .. = .. in ...'
printfn "LetOrUse with the following bindings:"
for binding in bindings do
let (Binding(access, kind, inlin, mutabl, attrs, xmlDoc,
data, pat, retInfo, init, m, sp)) = binding
visitPattern pat
visitExpression init
// Visit the body expression
printfn "And the following body:"
visitExpression body
| expr -> printfn " - not supported expression: %A" expr
(**
The `visitExpression` function will be called from a function that visits all top-level
declarations inside a module. In this tutorial, we ignore types and members, but that would
be another source of calls to `visitExpression`.
### Walking over declarations
As mentioned earlier, the AST of a file contains a number of module or namespace declarations
(top-level node) that contain declarations inside a module (let bindings or types) or inisde
a namespace (just types). The following functions walks over declarations - we ignore types,
nested modules and all other elements and look only at top-level `let` bindings (values and
functions):
*)
/// Walk over a list of declarations in a module. This is anything
/// that you can write as a top-level inside module (let bindings,
/// nested modules, type declarations etc.)
let visitDeclarations decls =
for declaration in decls do
match declaration with
| SynModuleDecl.Let(isRec, bindings, range) ->
// Let binding as a declaration is similar to let binding
// as an expression (in visitExpression), but has no body
for binding in bindings do
let (Binding(access, kind, inlin, mutabl, attrs, xmlDoc,
data, pat, retInfo, body, m, sp)) = binding
visitPattern pat
visitExpression body
| _ -> printfn " - not supported declaration: %A" declaration
(**
The `visitDeclarations` function will be called from a function that walks over a
sequence of module or namespace declarations. This corresponds, for example, to a file
with multiple `namespace Foo` declarations:
*)
/// Walk over all module or namespace declarations
/// (basically 'module Foo =' or 'namespace Foo.Bar')
/// Note that there is one implicitly, even if the file
/// does not explicitly define it..
let visitModulesAndNamespaces modulesOrNss =
for moduleOrNs in modulesOrNss do
let (SynModuleOrNamespace(lid, isMod, decls, xml, attrs, _, m)) = moduleOrNs
printfn "Namespace or module: %A" lid
visitDeclarations decls
(**
Now that we have functions that walk over the elements of the AST (starting from declaration,
down to expressions and patterns), we can get AST of a sample input and run the above function.
Putting things together
-----------------------
As already discussed, the `getUntypedTree` function uses `InteractiveChecker` to run the first
phase (parsing) on the AST and get back the tree. The function requires F# source code together
with location of the file. The location does not have to exist (it is used only for location
information) and it can be in both Unix and Windows formats:
*)
// Sample input for the compiler service
let input = """
let foo() =
let msg = "Hello world"
if true then
printfn "%s" msg """
// File name in Unix format
let file = "/home/user/Test.fsx"
// Get the AST of sample F# code
let tree = getUntypedTree(file, input)
(**
When you run the code in F# interactive, you can enter `tree;;` in the interactive console and
see pretty printed representation of the data structure - the tree contains a lot of information,
so this is not particularly readable, but it gives you good idea about how the tree looks.
The returned `tree` value is again a discriminated union that can be two different cases - one case
is `ParsedInput.SigFile` which represents F# signature file (`*.fsi`) and the other one is
`ParsedInput.ImplFile` representing regular source code (`*.fsx` or `*.fs`). The implementation
file contains a sequence of modules or namespaces that we can pass to the function implemented
in the previous step:
*)
// Extract implementation file details
match tree with
| ParsedInput.ImplFile(implFile) ->
// Extract declarations and walk over them
let (ParsedImplFileInput(fn, script, name, _, _, modules, _)) = implFile
visitModulesAndNamespaces modules
| _ -> failwith "F# Interface file (*.fsi) not supported."
(**
Summary
-------
In this tutorial, we looked at basic of working with the untyped abstract syntax tree. This is a
comprehensive topic, so it is not possible to explain everything in a single article. The
[Fantomas project](https://github.com/dungpa/fantomas) is a good example of tool based on the untyped
AST that can help you understand more. In practice, it is also useful to combine the information here
with some information you can obtain from the [editor services](editor.html) discussed in the next
tutorial.
*)