Skip to content
This repository was archived by the owner on Jun 12, 2020. It is now read-only.
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
===========
Treelib API
===========

Library for manipulating trees in Python made up of dicts and lists.


.. py:func:: tree_get(tree, key, default=None)

Given a tree consisting of mappings (like dict) and indexable sequences (like
list), returns the value specified by the key.

The key is a period-delimited sequence of edges to traverse. If one of the
ediges doesn't exist, then the default is returned immediately.

Examples:

>>> tree_get({'a': 1}, 'a')
1
>>> tree_get({'a': 1}, 'b')
None
>>> tree_get({'a': {'b': 2}}, 'a.b')
2
>>> tree_get({'a': {'b': 2}}, 'a.b.c', default=55)
55

This supports sequences, too:

>>> tree_get({'a': [1, 2, 3]}, 'a.1')
2
>>> tree_get({'a': {'1': 2}}, 'a.1')
2

Both dict and list support getitem notation, so the ``1`` works fine.

Some things to know about ``tree_get()``:

1. It doesn't alter the tree at all.
2. Once it hits an edge that's missing, it returns ``None`` or the default.


.. py:func:: tree_set(tree, key, value)

Given a tree consisting of mappings (like dict) and indexable sequences that
support getitem notation, sets the key to the value.

The key is a period-delimited sequence of edges to traverse. If one of the
ediges doesn't exist, then the edge is created using these rules:

1. if the next edge is an integer, then it creates a list
2. if the next edge is not an integer, then it creates a dict

This returns the tree which is mutated in place.

>>> tree_set({}, 'a', value=5)
{'a': 5}
>>> tree_set({}, 'a.b.c', value=5)
{'a': {'b': {'c': 5}}}

While ``tree_set`` does create new dicts and lists if they're missing, it
will not create new list indexes. Instead, it'll raise an ``IndexError``. For
example:

>>> tree_set({}, 'a.1', value=5)
IndexError('list index out of range')

This is the same error you'd get if you tried to access an index that doesn't
exist in a list.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should tree_set support an argument to create indexes? Maybe a create_indexes=True fills in indexes with None?

>>> tree_set({}, 'a.1.b', value=5, create_indexes=True)
{'a': [None, {'b': 5}]}

This might affect creating new lists if they're missing, too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we assume numerical indices are lists? What do we do with dictionaries have numerical keys?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tree_set and numbers should work fine with both lists and dicts except when we're creating things that don't already exist. In that case, then treevert wouldn't know what to do because the syntax is ambiguous.

We could use [] syntax:

tree_set({}, 'a[1].b' ...)

That's not ambiguous if we declare that keys can't have [ and ] in them.

I think configman appends the index to the item like destination.storage0, destination.storage1, and so on. I don't know if it lets you use that to set things, though.

Another thing we could do is assume anything that doesn't already exist is a dict. So then:

>>> tree_set({}, 'a.1.b', value='foo')
{'a': {'1': {'b': 'foo}}}

I'm not sure that's better and less surprising, but it's interesting.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another idea that has two parts.

  1. we make tree_set only builds dicts for things that are missing and in this way, builds "sparse lists"
  2. we add an argument to tree_set for "don't build things that don't exist"

So then we have use cases:

Developer wants to add something deep in the tree that doesn't involve lists. Works as advertised.

>>> tree_set({}, 'a.b.c', value='foo')
{'a': {'b': {'c': 'foo'}}}

Done.

Developer wants to add something deep in the tree that has a list in the middle. If the list can be sparse (this really depends on how the tree is used), then it treats everything as a dict key and creates dicts as it goes along:

>>> tree_set({}, 'a.1.b', value='foo')
{'a': {'1': {'b': 'foo'}}}

Done.

Developer wants lists to be lists. At this point, the developer has to build the list manually. Python has a setdefault--maybe we could do something like that which creates something if it doesn't exist already.

tree_setdefault(tree, default_tree)

>>> ret = tree_setdefault({}, {'a': [None, None, None]})
>>> ret
{'a': [None, None, None]}
>>> tree_set(ret, 'a.1.b', value='foo')
{'a': [None, {'b': 'foo'}, None]}

That tree_setdefault would also let us take a tree of data and add default values to it which I think would also be super handy. There are some behavior things there we'd have to figure out like what happens if an edge in the tree leads to a different kind of node than the equivalent edge of the default tree (for example, dict vs. list).


.. py:func:: tree_flatten(tree)

Flattens a tree into a dict with keys of paths.

>>> tree_flatten({'a': 1})
{'a': 1}
>>> tree_flatten({'a': {'b': 1, 'c': 2}})
{'a.b': 1, 'a.c': 2}
>>> tree_flatten({'a': [{'b': 1}, {'c': 2}]})
{'a.0.b': 1, 'a.1.c': 2}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want flatten to be reversible (via an unflatten), then we need a different syntax here and/or a different mode because otherwise things like 0 and 1 are ambiguous and unflattenable.



.. py:func:: tree_validate(tree, schema)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your consideration, my favorite Python schema lib is https://pypi.python.org/pypi/schema/, by the docopt guy. I've done some work on it. I use it in DXR. I've made some design proposals that I think would result in it being an elegant solution rather than being a bit slapdash in places.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema looks interesting. I'll keep that in mind.


FIXME
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a schema system for dicts for pyvideo a while back.

https://github.com/pyvideo/old-pyvideo-data/blob/master/src/clive/schemalib.py

With an example schema here:

https://github.com/pyvideo/old-pyvideo-data/blob/master/src/clive/pyvideo_schema.py

Pretty sure there are other schema systems out there, too.

Not sure we'd need this in the first version of treevert.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a schema system in use in the data pipeline? Do they already have a tool for it?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I've seen, the bulk of the telemetry data pipeline is not in Python. So I'm not sure we could use anything they've made without switching languages.

I'll ask around, though.



.. py:func:: tree_traverse(tree, fun)

FIXME
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea to have a traversal system, but we wouldn't need this in the first version of treevert.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would wait on this. Maybe on the flattening API, also, unless we had a clear use case for it up front.