Skip to content

Commit fe574a3

Browse files
authored
add sgf-parsing exercise (#136)
1 parent e9ea7f5 commit fe574a3

File tree

15 files changed

+770
-0
lines changed

15 files changed

+770
-0
lines changed

config.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -698,6 +698,14 @@
698698
"prerequisites": [],
699699
"difficulty": 9
700700
},
701+
{
702+
"slug": "sgf-parsing",
703+
"name": "SGF Parsing",
704+
"uuid": "917f0610-27c0-4b94-9b2e-affbd143ea1f",
705+
"practices": [],
706+
"prerequisites": [],
707+
"difficulty": 9
708+
},
701709
{
702710
"slug": "zebra-puzzle",
703711
"name": "Zebra Puzzle",
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Instructions
2+
3+
Parsing a Smart Game Format string.
4+
5+
[SGF][sgf] is a standard format for storing board game files, in particular go.
6+
7+
SGF is a fairly simple format. An SGF file usually contains a single
8+
tree of nodes where each node is a property list. The property list
9+
contains key value pairs, each key can only occur once but may have
10+
multiple values.
11+
12+
The exercise will have you parse an SGF string and return a tree structure of properties.
13+
14+
An SGF file may look like this:
15+
16+
```text
17+
(;FF[4]C[root]SZ[19];B[aa];W[ab])
18+
```
19+
20+
This is a tree with three nodes:
21+
22+
- The top level node has three properties: FF\[4\] (key = "FF", value
23+
= "4"), C\[root\](key = "C", value = "root") and SZ\[19\] (key =
24+
"SZ", value = "19"). (FF indicates the version of SGF, C is a
25+
comment and SZ is the size of the board.)
26+
- The top level node has a single child which has a single property:
27+
B\[aa\]. (Black plays on the point encoded as "aa", which is the
28+
1-1 point).
29+
- The B\[aa\] node has a single child which has a single property:
30+
W\[ab\].
31+
32+
As you can imagine an SGF file contains a lot of nodes with a single
33+
child, which is why there's a shorthand for it.
34+
35+
SGF can encode variations of play. Go players do a lot of backtracking
36+
in their reviews (let's try this, doesn't work, let's try that) and SGF
37+
supports variations of play sequences. For example:
38+
39+
```text
40+
(;FF[4](;B[aa];W[ab])(;B[dd];W[ee]))
41+
```
42+
43+
Here the root node has two variations. The first (which by convention
44+
indicates what's actually played) is where black plays on 1-1. Black was
45+
sent this file by his teacher who pointed out a more sensible play in
46+
the second child of the root node: `B[dd]` (4-4 point, a very standard
47+
opening to take the corner).
48+
49+
A key can have multiple values associated with it. For example:
50+
51+
```text
52+
(;FF[4];AB[aa][ab][ba])
53+
```
54+
55+
Here `AB` (add black) is used to add three black stones to the board.
56+
57+
All property values will be the [SGF Text type][sgf-text].
58+
You don't need to implement any other value type.
59+
Although you can read the [full documentation of the Text type][sgf-text], a summary of the important points is below:
60+
61+
- Newlines are removed if they come immediately after a `\`, otherwise they remain as newlines.
62+
- All whitespace characters other than newline are converted to spaces.
63+
- `\` is the escape character.
64+
Any non-whitespace character after `\` is inserted as-is.
65+
Any whitespace character after `\` follows the above rules.
66+
Note that SGF does **not** have escape sequences for whitespace characters such as `\t` or `\n`.
67+
68+
Be careful not to get confused between:
69+
70+
- The string as it is represented in a string literal in the tests
71+
- The string that is passed to the SGF parser
72+
73+
Escape sequences in the string literals may have already been processed by the programming language's parser before they are passed to the SGF parser.
74+
75+
There are a few more complexities to SGF (and parsing in general), which
76+
you can mostly ignore. You should assume that the input is encoded in
77+
UTF-8, the tests won't contain a charset property, so don't worry about
78+
that. Furthermore you may assume that all newlines are unix style (`\n`,
79+
no `\r` or `\r\n` will be in the tests) and that no optional whitespace
80+
between properties, nodes, etc will be in the tests.
81+
82+
[sgf]: https://en.wikipedia.org/wiki/Smart_Game_Format
83+
[sgf-text]: https://www.red-bean.com/sgf/sgf4.html#text
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
import Std
2+
3+
namespace SgfParsing
4+
5+
structure SgfTree where
6+
properties : Std.TreeMap String (Array String)
7+
children : Array SgfTree
8+
deriving Repr
9+
10+
def SgfTree.empty : SgfTree := {
11+
properties := {},
12+
children := #[]
13+
}
14+
15+
structure Result where
16+
result : SgfTree
17+
rest : List Char
18+
19+
inductive State where
20+
| zero : State
21+
| openTree : State
22+
| property : SgfTree → String → State
23+
| values : SgfTree → String → Array String → State
24+
25+
partial def parseHelper : List Char → State → Except String Result
26+
| '(' :: xs, .zero => parseHelper xs .openTree
27+
| '(' :: xs, .property m ps =>
28+
if !ps.isEmpty
29+
then .error "properties without delimiter"
30+
else do
31+
let ⟨result, rest⟩ ← parseHelper ('(' :: xs) .zero
32+
parseHelper rest (.property { m with children := m.children.push result } "")
33+
| ';' :: xs, .openTree => parseHelper xs (.property .empty "")
34+
| '[' :: xs, .property m ps => parseHelper xs (.values m ps #[""])
35+
| '\\' :: '\\' :: xs, .values m ps vs => parseHelper xs (.values m ps (vs.modify (vs.size - 1) (·.push '\\')))
36+
| '\\' :: ']' :: xs, .values m ps vs => parseHelper xs (.values m ps (vs.modify (vs.size - 1) (·.push ']')))
37+
| '\\' :: '\n' :: xs, .values m ps vs => parseHelper xs (.values m ps vs)
38+
| '\\' :: xs, .values m ps vs => parseHelper xs (.values m ps vs)
39+
| ']' :: '[' :: xs, .values m ps vs => parseHelper xs (.values m ps (vs.push ""))
40+
| ']' :: ';' :: xs, .values m ps vs => do
41+
let ⟨result, rest⟩ ← parseHelper ('(' :: ';' :: xs) .zero
42+
parseHelper (')' :: rest) (.property {
43+
m with
44+
properties := m.properties.insert ps vs,
45+
children := m.children.push result
46+
} "")
47+
| ']' :: xs, .values m ps vs => parseHelper xs (.property {
48+
m with
49+
properties := m.properties.insert ps vs
50+
} "")
51+
| x :: xs, .values m ps vs =>
52+
let modifiedValue := if x.isWhitespace && x != '\n'
53+
then .values m ps (vs.modify (vs.size - 1) (·.push ' '))
54+
else .values m ps (vs.modify (vs.size - 1) (·.push x))
55+
parseHelper xs modifiedValue
56+
| ')' :: _, .openTree => .error "tree with no nodes"
57+
| ')' :: xs, .property m k =>
58+
if k.isEmpty
59+
then .ok { result := m, rest := xs }
60+
else .error "properties without delimiter"
61+
| x :: xs, .property m ps =>
62+
if x.isUpper
63+
then parseHelper xs (.property m (ps.push x))
64+
else .error "property must be in uppercase"
65+
| _, _ => .error "tree missing"
66+
67+
def parse (encoded : String) : Except String SgfTree := do
68+
let ⟨result, _⟩ ← parseHelper encoded.toList .zero
69+
return result
70+
71+
end SgfParsing
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"authors": [
3+
"oxe-i"
4+
],
5+
"files": {
6+
"solution": [
7+
"SgfParsing.lean"
8+
],
9+
"test": [
10+
"SgfParsingTest.lean"
11+
],
12+
"example": [
13+
".meta/Example.lean"
14+
]
15+
},
16+
"blurb": "Parsing a Smart Game Format string."
17+
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
[
2+
{
3+
"description": "complex child trees",
4+
"property": "parse",
5+
"input": {
6+
"encoded": "(;FF[4](;B[aa];W[ab])(;B[dd];W[ee]))"
7+
},
8+
"expected": {
9+
"properties": {
10+
"FF": ["4"]
11+
},
12+
"children": [
13+
{
14+
"properties": {
15+
"B": ["aa"]
16+
},
17+
"children": [
18+
{
19+
"properties": {
20+
"W": ["ab"]
21+
},
22+
"children": []
23+
}
24+
]
25+
},
26+
{
27+
"properties": {
28+
"B": ["dd"]
29+
},
30+
"children": [
31+
{
32+
"properties": {
33+
"W": ["ee"]
34+
},
35+
"children": []
36+
}
37+
]
38+
}
39+
]
40+
}
41+
}
42+
]
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# This is an auto-generated file.
2+
#
3+
# Regenerating this file via `configlet sync` will:
4+
# - Recreate every `description` key/value pair
5+
# - Recreate every `reimplements` key/value pair, where they exist in problem-specifications
6+
# - Remove any `include = true` key/value pair (an omitted `include` key implies inclusion)
7+
# - Preserve any other key/value pair
8+
#
9+
# As user-added comments (using the # character) will be removed when this file
10+
# is regenerated, comments can be added via a `comment` key.
11+
12+
[2668d5dc-109f-4f71-b9d5-8d06b1d6f1cd]
13+
description = "empty input"
14+
15+
[84ded10a-94df-4a30-9457-b50ccbdca813]
16+
description = "tree with no nodes"
17+
18+
[0a6311b2-c615-4fa7-800e-1b1cbb68833d]
19+
description = "node without tree"
20+
21+
[8c419ed8-28c4-49f6-8f2d-433e706110ef]
22+
description = "node without properties"
23+
24+
[8209645f-32da-48fe-8e8f-b9b562c26b49]
25+
description = "single node tree"
26+
27+
[6c995856-b919-4c75-8fd6-c2c3c31b37dc]
28+
description = "multiple properties"
29+
30+
[a771f518-ec96-48ca-83c7-f8d39975645f]
31+
description = "properties without delimiter"
32+
33+
[6c02a24e-6323-4ed5-9962-187d19e36bc8]
34+
description = "all lowercase property"
35+
36+
[8772d2b1-3c57-405a-93ac-0703b671adc1]
37+
description = "upper and lowercase property"
38+
39+
[a759b652-240e-42ec-a6d2-3a08d834b9e2]
40+
description = "two nodes"
41+
42+
[cc7c02bc-6097-42c4-ab88-a07cb1533d00]
43+
description = "two child trees"
44+
45+
[724eeda6-00db-41b1-8aa9-4d5238ca0130]
46+
description = "multiple property values"
47+
48+
[28092c06-275f-4b9f-a6be-95663e69d4db]
49+
description = "within property values, whitespace characters such as tab are converted to spaces"
50+
51+
[deaecb9d-b6df-4658-aa92-dcd70f4d472a]
52+
description = "within property values, newlines remain as newlines"
53+
54+
[8e4c970e-42d7-440e-bfef-5d7a296868ef]
55+
description = "escaped closing bracket within property value becomes just a closing bracket"
56+
57+
[cf371fa8-ba4a-45ec-82fb-38668edcb15f]
58+
description = "escaped backslash in property value becomes just a backslash"
59+
60+
[dc13ca67-fac0-4b65-b3fe-c584d6a2c523]
61+
description = "opening bracket within property value doesn't need to be escaped"
62+
63+
[a780b97e-8dbb-474e-8f7e-4031902190e8]
64+
description = "semicolon in property value doesn't need to be escaped"
65+
66+
[0b57a79e-8d89-49e5-82b6-2eaaa6b88ed7]
67+
description = "parentheses in property value don't need to be escaped"
68+
69+
[c72a33af-9e04-4cc5-9890-1b92262813ac]
70+
description = "escaped tab in property value is converted to space"
71+
72+
[3a1023d2-7484-4498-8d73-3666bb386e81]
73+
description = "escaped newline in property value is converted to nothing at all"
74+
75+
[25abf1a4-5205-46f1-8c72-53273b94d009]
76+
description = "escaped t and n in property value are just letters, not whitespace"
77+
78+
[08e4b8ba-bb07-4431-a3d9-b1f4cdea6dab]
79+
description = "mixing various kinds of whitespace and escaped characters in property value"
80+
reimplements = "11c36323-93fc-495d-bb23-c88ee5844b8c"
81+
82+
[11c36323-93fc-495d-bb23-c88ee5844b8c]
83+
description = "escaped property"
84+
include = false
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import Std
2+
3+
namespace SgfParsing
4+
5+
structure SgfTree where
6+
properties : Std.TreeMap String (Array String)
7+
children : Array SgfTree
8+
deriving Repr
9+
10+
def parse (encoded : String) : Except String SgfTree :=
11+
sorry
12+
13+
end SgfParsing

0 commit comments

Comments
 (0)