-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose Graph Representation as NetworkX DiGraph #312
Comments
Disclaimer: I know the basics of category theory but am far from an expert. Are you talking about a categorical description of general YAML representation graphs, or about an encoding into (a subset of) YAML representation graphs of some other thing that has an existing categorical description? If the latter, then I imagine that YAML should work fine. Its major advantage over e.g. JSON for representing graphs is that a YAML representation graph is a true graph, not merely a tree, so you can represent information with a graph structure “natively” without needing to create an artificial system of references on top of the YAML data. If you mean the former, then I don't quite understand the motivation. If we consider a symmetric monoidal category of representation graphs, then what would be the tensor product? |
I guess me as well :)
I think the representation graph can be "lifted" to a more general category for free. Researchers and programmers could use YAML as a bridge. The tensor product could be defined by the mapping, where each pair is a parallel arrow (mappings are unordered). This only considers the basic features of YAML. E.g I haven't considered tags. |
Hi again @Thom1729! As I continued researching and implementing, I discovered that in a YAML document these b: a
c: a I haven't been able to implement the representation graph in NetworkX and I think it has to do with this definition. I think a directed graph can't represent this structure because it would lose some information.
Do you think this is a reasonable implementation concern / limitation? I'm hoping I missed something to make this work! WIP at: Thanks for your help. |
Please consider that YAML distinguishes the following two examples -- by choice:
If maps are vertices in a directed graph, then keys can be associated with outgoing edges that point to other vertices. In that understanding, the first is a tree of three vertices (root, a, a') and two edges (root-b->a, root-c->a') and the two "a" vertices are distinct. The second is a diamond-shaped DAG of two vertices (root, a) and two edges (root-b->a, root-c->a) that share start and end, but have different labels (b, c). But, keeping in mind that also keys may be aliases, things get weird quiet quickly. |
From 3.2.1.3 (emphasis added):
In general, a YAML document may contain nodes that are equal to each other, but not identical. For instance, in the following document the two mapping nodes are equal but not identical: - {}
- {} The same applies to scalars, with the caveat that pursuant to the above citation an implementation may consider equal scalars to be identical. So if you're modeling the representation graph of the document in your comment using some other graph modeling system, then that other system should have one node for each scalar, even scalars that are equal to each other. On the other hand, in the following document there is only one scalar node whose content is b: &x a
c: *x If you wanted to model a graph structure in YAML for some ordinary reason, you'd probably use something like the example in your comment, and link things together on the application end. But if I understand correctly, you're trying to embed arbitrary YAML representation graphs in NetworkX. I don't know anything about NetworkX or its semantics, but in order to model a representation graph in full generality, you would need to allow for multiple equal-but-not-identical nodes. If you're currently modeling YAML nodes with some NetworkX type that lacks identity semantics, then you might need to modify that model to ensure that equal nodes can be distinguished. If it's only a problem for scalar nodes (e.g. you're using something like a string type) then under 3.2.1.3 you are allowed to treat equal nodes as identical. |
Thank you both!
Yes. The NetworkX library is a widely used Python library with Graphs, DiGraphs, MultiGraphs. I want to use the Representation Graph but the problem I see is that equal-but-not-identical nodes don't have any difference in the token stream.
I would certainly want to do this, but I think this intermediate representation is still losing information. It seems there are more things to encode than the spec would suggest in order to make this "an implementation detail" in the load/dump processes. |
What do you mean by “the token stream”? Is this a NetworkX concept? |
https://github.com/yaml/pyyaml/blob/main/lib/yaml/tokens.py I think this is a better example than the first one: - a: b
- a: c Since An extended data structure can manage this — but is that still a graph? or is it better modeled as hypergraph, paths graph, etc.? |
It's hard to talk about nodes in terms of tokens because a node in the representation graph may correspond to zero, one, or many tokens. But a node that is identical to another node that occurs earlier will correspond to an alias token. - a: b
- a: c This is a document with seven nodes: a sequence node (with two children), two mapping nodes (each with one pair of children), and four scalar nodes. Two of the scalar nodes are equal to each other; all other pairs of nodes are inequal. So if you consider this document as a representation graph, and encode it into NetworkX, then you will presumably have seven nodes in NetworkX (unless the encoding creates additional nodes). If you instead interpret it as encoding information about a graph with three nodes named I don't quite understand what you mean about recovering the individual connectivity. |
I see what you mean. My goal is to have a representation in terms of the three scalar, with mapping and sequencing encoded within a data structure. I want to do this since the seven-node structure doesn't suit my application needs. This made me wonder if the spec is really defining graph structure. Shouldn't vertices and edges be sets? I'm looking into the categorical framework to extend the representation with a higher level description, preserving the inherent structure of existing documents (i.e preserving connectivity). |
Eh. It's traditional to define formal directed graphs in as something like a 2-tuple, with a set of nodes and a set of edges that are each an ordered pair of nodes. I think of this as an implementation detail; in my mind, the graph is not that 2-tuple of sets, but the abstract thing that we could represent in that fashion. The essential difference between a general directed graph and a YAML representation graph is that YAML nodes each have a kind and a tag, and rather than each node just having a set of edges to other nodes, it has either scalar content, an ordered list of nodes, or an unordered set of key-value pairs (where keys are unique under equality). You could represent a YAML representation graph as a formal directed graph by adding extra structure — coloring nodes by kind, etc, and probably using systems of extra nodes to represent collection contents. I may have lost track of exactly what you're looking to accomplish. I think that there are good solutions for encoding mathematical graphs as YAML representation graphs, and also good solutions for encoding YAML representation graphs as mathematical graphs, but due to the different data models I don't think you're going to find a reasonable bijection, or an encoding either way that never requires adds nodes/vertices — and an encoding that minimizes extra nodes/vertices may be far from the most ergonomic. |
The representation in the spec is one of my favorite parts, but I haven't found such mathematical encodings (at least in Python). Working at the representation level with a graph library like NetworkX would be awesome, since it would be closer to the mathematical description than an object oriented interface.
I think the spatial layout can be modeled as a hypergraph or string diagrams. I'm discovering challenges as I go but I'm doing this with enthusiasm. I have no intention of rushing a contribution. |
Hypergraphs seem like overkill to me. Off the top of my head, a reasonably simple encoding might use labelled directed graphs:
This requires one vertex per YAML node, plus one vertex per mapping entry, and resembles the representation graph closely. You could eliminate the vertex labelling or the edge labelling, or both, by adding auxiliary vertices. |
@Thom1729 I think I found the structure I was looking for: https://en.m.wikipedia.org/wiki/Levi_graph Do you think it's appropriate? I would like to collaborate with the spec in this regard. |
Hi! I'm interested in contributing a categorical definition of the representation graph.
I want to foster collaboration between some Python projects like this one to be the foundation of a programming environment based in YAML.
I believe YAML is meant to be the carrier because it is widely available. Every new language introduces a new syntax, which I would like to avoid. YAML support is mature in all languages and programmers are familiar with it. Human-friendliness, in my opinion, was a great achievement.
So with regards to categorification my intuition is YAML forms a symmetric monoidal category.
I started the following conversations which you might find interesting:
Imagine using YAML in projects like this one, and potentially tons of papers! https://youtu.be/IBESZlBRBzY
Thanks for your time!
The text was updated successfully, but these errors were encountered: