|
| 1 | +=========== |
| 2 | +Treelib API |
| 3 | +=========== |
| 4 | + |
| 5 | +Library for manipulating trees in Python made up of dicts and lists. |
| 6 | + |
| 7 | + |
| 8 | +Goals |
| 9 | +===== |
| 10 | + |
| 11 | +The primary goal of this library is to make it less unwieldy to manipulate trees |
| 12 | +made up of Python dicts and lists. |
| 13 | + |
| 14 | +For example, say we want to get a value deep in the tree. we could do this:: |
| 15 | + |
| 16 | + value = tree['a']['b']['c'] |
| 17 | + |
| 18 | + |
| 19 | +That'll throw a ``KeyError`` if any of those bits are missing. So you could |
| 20 | +handle that:: |
| 21 | + |
| 22 | + try: |
| 23 | + value = tree['a']['b']['c'] |
| 24 | + except KeyError: |
| 25 | + value = None |
| 26 | + |
| 27 | + |
| 28 | +Alternatively, you could do this:: |
| 29 | + |
| 30 | + value = tree.get('a', {}).get('b', {}).get('c': None) |
| 31 | + |
| 32 | + |
| 33 | +These work, but both are unwieldy especially if you're doing this a lot. |
| 34 | + |
| 35 | +Similarly, setting things deep is also unenthusing:: |
| 36 | + |
| 37 | + tree['a']['b']['c'] = 5 |
| 38 | + |
| 39 | + |
| 40 | +The safer form is this:: |
| 41 | + |
| 42 | + tree.setdefault('a', {}).setdefault('b', {})['c'] = 5 |
| 43 | + |
| 44 | + |
| 45 | +This library aims to make sane use cases for tree manipulation easier to read |
| 46 | +and think about. |
| 47 | + |
| 48 | + |
| 49 | +Paths |
| 50 | +===== |
| 51 | + |
| 52 | +A path is a string specifying a period-delimited list of edges. Edges can be: |
| 53 | + |
| 54 | +1. a key (for a dict) |
| 55 | +2. an index (for a list) |
| 56 | + |
| 57 | +Example paths:: |
| 58 | + |
| 59 | + a |
| 60 | + a.[1].foo_bar.Bar |
| 61 | + a.b.[-1].Bar |
| 62 | + |
| 63 | + |
| 64 | +Paths can be composed using string operations since they're just strings. |
| 65 | + |
| 66 | +FIXME(willkg): Add diagram showing a tree with edges specified by a path. |
| 67 | + |
| 68 | + |
| 69 | +Key |
| 70 | +--- |
| 71 | + |
| 72 | +Keys are identifiers that are: |
| 73 | + |
| 74 | +1. composed entirely of ascii alphanumeric characters, hyphens, and underscores |
| 75 | +2. at least one character long |
| 76 | + |
| 77 | +For example, these are all valid keys:: |
| 78 | + |
| 79 | + a |
| 80 | + foo |
| 81 | + FooBar |
| 82 | + Foo-Bar |
| 83 | + foo_bar |
| 84 | + |
| 85 | + |
| 86 | +Index |
| 87 | +----- |
| 88 | + |
| 89 | +Indexes indicate a 0-based list index. They are: |
| 90 | + |
| 91 | +1. integers |
| 92 | +2. wrapped in ``[`` and ``]`` |
| 93 | +3. can be negative |
| 94 | + |
| 95 | +For example, these are all valid indexes:: |
| 96 | + |
| 97 | + [0] |
| 98 | + [1] |
| 99 | + [-50] |
| 100 | + |
| 101 | + |
| 102 | +API |
| 103 | +=== |
| 104 | + |
| 105 | +.. py:func:: tree_get(tree, path, default=None) |
| 106 | +
|
| 107 | + Given a tree consisting of dicts and lists, returns the value specified by |
| 108 | + the path. |
| 109 | + |
| 110 | + Some things to know about ``tree_get()``: |
| 111 | + |
| 112 | + 1. It doesn't alter the tree. |
| 113 | + 2. Once it hits an edge that's missing, it returns the default. |
| 114 | + |
| 115 | + Examples: |
| 116 | + |
| 117 | + >>> tree_get({'a': 1}, 'a') |
| 118 | + 1 |
| 119 | + >>> tree_get({'a': 1}, 'b') |
| 120 | + None |
| 121 | + >>> tree_get({'a': {'b': 2}}, 'a.b') |
| 122 | + 2 |
| 123 | + >>> tree_get({'a': {'b': 2}}, 'a.b.c', default=55) |
| 124 | + 55 |
| 125 | + >>> tree_get({'a': {'1': 2}}, 'a.1') |
| 126 | + 2 |
| 127 | + >>> tree_get({'a': [1, 2, 3]}, 'a.[1]') |
| 128 | + 2 |
| 129 | + >>> tree_get({'a': [{}, {'b': 'foo'}]}, 'a.[1].b') |
| 130 | + 'foo' |
| 131 | + |
| 132 | + |
| 133 | +.. py:func:: tree_set(tree, path, value, mutate=True, create_missing=False) |
| 134 | +
|
| 135 | + Given a tree consisting of dicts and lists, sets the item specified by path |
| 136 | + to the specified value. |
| 137 | + |
| 138 | + If one of the edges doesn't exist, then this raises either a ``KeyError`` |
| 139 | + for dicts or a ``IndexError`` for lists. |
| 140 | + |
| 141 | + :arg boolean mutate: If ``mutate`` is ``True`` (the default), then this |
| 142 | + changes the tree in place and returns the mutated tree. |
| 143 | + |
| 144 | + If ``mutate`` is ``False``, then this does a deepcopy of the tree, |
| 145 | + changes the copy, and returns the copy. This is expensive. |
| 146 | + |
| 147 | + :arg boolean create_missing: If ``create_missing`` is ``False`` (the default), |
| 148 | + then this will raise a ``KeyError`` for failed dict keys and |
| 149 | + ``IndexError`` for failed list indexes. |
| 150 | + |
| 151 | + If ``create_missing`` is ``True``, and this isn't |
| 152 | + the last item in the path, then this will create the intermediary |
| 153 | + dict/list. |
| 154 | + |
| 155 | + If the next edge is a key, it'll create a dict. If the next edge is an |
| 156 | + index, then it'll create a list filling in ``None`` for the required |
| 157 | + indices. |
| 158 | + |
| 159 | + Here are some examples. |
| 160 | + |
| 161 | + This sets ``a`` to 5. This isn't affected by ``create_missing``. |
| 162 | + |
| 163 | + >>> tree_set({}, 'a', value=5, create_missing=True) |
| 164 | + {'a': 5} |
| 165 | + >>> tree_set({}, 'a', value=5, create_missing=False) |
| 166 | + {'a': 5} |
| 167 | + |
| 168 | + This tries to traverse ``a``, but it doesn't exist and it's not the last |
| 169 | + edge in the path. The next edge is ``b``, which is a key, so it first sets |
| 170 | + ``a`` to an empty dict, then proceeds. |
| 171 | + |
| 172 | + >>> tree_set({}, 'a.b', value=5, create_missing=True) |
| 173 | + {'a': {'b': 5}} |
| 174 | + |
| 175 | + This tries to traverse ``a``, but it doesn't exist and it's not the last |
| 176 | + edge in the path. The next edge is ``[2]``, which is an index, so it first |
| 177 | + sets ``a`` to a list of 3 ``None`` values, then proceeds. |
| 178 | + |
| 179 | + >>> tree_set({}, 'a.[2]', value=5, create_missing=True) |
| 180 | + {'a': [None, None, 5]} |
| 181 | + |
| 182 | + This is similar, but with a negative index. |
| 183 | + |
| 184 | + >>> tree_set({}, 'a.[-1]', value=5, create_missing=True) |
| 185 | + {'a': [5]} |
| 186 | + |
| 187 | + This creates missing indices in an existing list. |
| 188 | + |
| 189 | + >>> tree_set({'a': []}, 'a.[2]', value=5, create_missing=True) |
| 190 | + {'a': [None, None, 5]} |
| 191 | + |
| 192 | + |
| 193 | + Examples: |
| 194 | + |
| 195 | + These don't mutate the tree: |
| 196 | + |
| 197 | + >>> tree = {'a': {'b': {'c': 1}}} |
| 198 | + >>> tree_set(tree, 'a', value=5, mutate=False) |
| 199 | + {'a': 5} |
| 200 | + >>> tree_set(tree, 'a.b.c', value=[], mutate=False) |
| 201 | + {'a': {'b': {'c': []}}} |
| 202 | + |
| 203 | + These raise errors if an edge is missing: |
| 204 | + |
| 205 | + >>> tree_set({}, 'a.b.c', value=5) |
| 206 | + KeyError ... |
| 207 | + >>> tree_set({}, 'a.[1].b', value=5) |
| 208 | + IndexError ... |
| 209 | + |
| 210 | + These create missing edges and indexes: |
| 211 | + |
| 212 | + >>> tree_set({}, 'a.b.c', value=5, create_missing=True) |
| 213 | + {'a': {'b': {'c': 5}}} |
| 214 | + >>> tree_set({}, 'a.[1].b', value=5, create_missing=True) |
| 215 | + {'a': [None, {'b': 5}]} |
| 216 | + |
| 217 | + |
| 218 | +.. py:func:: tree_flatten(tree) |
| 219 | +
|
| 220 | + Flattens a tree into a dict with keys of paths. |
| 221 | + |
| 222 | + >>> tree_flatten({'a': 1}) |
| 223 | + {'a': 1} |
| 224 | + >>> tree_flatten({'a': {'b': 1, 'c': 2}}) |
| 225 | + {'a.b': 1, 'a.c': 2} |
| 226 | + >>> tree_flatten({'a': [{'b': 1}, {'c': 2}]}) |
| 227 | + {'a.[0].b': 1, 'a.[1].c': 2} |
| 228 | + |
| 229 | + .. Note:: |
| 230 | + |
| 231 | + At this point, a flattened tree can't be used using ``tree_get`` and |
| 232 | + ``tree_set``. |
| 233 | + |
| 234 | + |
| 235 | +.. py:func:: tree_setdefault(tree, default_tree) |
| 236 | +
|
| 237 | + FIXME |
| 238 | + |
| 239 | + |
| 240 | +.. py:func:: tree_validate(tree, schema) |
| 241 | +
|
| 242 | + FIXME |
| 243 | + |
| 244 | + |
| 245 | +.. py:func:: tree_traverse(tree, fun) |
| 246 | +
|
| 247 | + FIXME |
| 248 | + |
| 249 | + |
| 250 | +Research and Inspirations |
| 251 | +========================= |
| 252 | + |
| 253 | +Python ``defaultdict`` |
| 254 | +---------------------- |
| 255 | + |
| 256 | +Python has a defaultdict |
| 257 | + |
| 258 | +https://docs.python.org/3/library/collections.html#defaultdict-objects |
| 259 | + |
| 260 | +This doens't handle lists and dicts well, though. |
| 261 | + |
| 262 | +We'd have to either create the original data structure as a defaultdict, or |
| 263 | +convert it to one. |
| 264 | + |
| 265 | +If you try to get something deep from a defaultdict, it mutates the |
| 266 | +structure. |
| 267 | + |
| 268 | +It doesn't easily support composable paths. |
| 269 | + |
| 270 | + |
| 271 | +jq processor |
| 272 | +------------ |
| 273 | + |
| 274 | +jq has interesting filter syntax. |
| 275 | + |
| 276 | +https://stedolan.github.io/jq/manual/#Basicfilters |
| 277 | + |
| 278 | + |
| 279 | +Creating a new subclass of Python ``dict`` |
| 280 | +------------------------------------------ |
| 281 | + |
| 282 | +We could do that and add ``get_path`` and ``set_path``, but I wonder if we can |
| 283 | +get the utility we want without having to box/unbox data. |
| 284 | + |
| 285 | +If we're just working with dicts and lists and standard Python things, then |
| 286 | +``json.dumps`` and other things just work without us having to do anything about |
| 287 | +them. |
0 commit comments