What does implementing ATProto Repos entail? (as part of a PDS) #2644
-
I read the docs around the What really is an ATProto repo, and what does it contain? How do the different parts relate to each other? And how might one go about implementing them? Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
Do you mean https://atproto.com/guides/data-repos ? (and https://atproto.com/specs/repository ) Or something else? |
Beta Was this translation helpful? Give feedback.
-
I'll give some notes on how I personally think about the atproto repo concept and MST data structure specifically, but first i'll link to existing resources. Relevant parts of the atproto specs:
The specs link to a scholarly paper describing MSTs specifically. This could be useful for folks "studying" the protocol and understanding design decisions and tradeoffs, but probably isn't the best resource for somebody just trying to implement or debug an implementation. Here is that paper anyways:
There are a couple implementations to look at: Bluesky TypeScript Reference Implementation (code): this is our main reference implementation. It is fairly performant, and has been reviewed by multiple folks on the team. There are regression tests for specific bugs and issues found over time; we have copied most of those tests to other implementations, but this was where they were originally encountered and debugged. Bluesky Go Implementation (code): this is a nearly line-by-line port of the TypeScript implementation. I think why literally copied the source code and translated it to go. It is not very idiomatic go, and could easily be confusing to work off of, and is probably "bug-compatible" (has same bugs and performance characteristics as Typescript). Bryan's Rust Implementation (code): I wrote this from scratch by reading old atproto docs (pre-specs), and doing some experiments loading CAR files. I have since updated it to ensure it works with current CAR files (the rest of the adenosine project isn't much maintained). It is relatively short code and written to be simple and easier to understand; performance isn't great, I think it only works with complete MSTs and not partial checkouts. The concept/value of this implementation is that it was relatively "clean room" and validates the TypeScript implementation; it probably isn't "bug compatible". There are some interop test cases in each of these implementations. The two most basic types of validation tests are full repositories (CAR files) and very simple repo layouts (a set of records and paths). The tests basically read in the repo (from CAR file or description) in to an array or some other simple data structure; and then re-produce / re-export the repo, and validate that the shape and data hashes all match up. I very strongly recommend that anybody working on an MST/repo implementation both write their own tests as they work (both unit and integration tests), and then copy in these exact interop tests to cover the known tricky cases. Some other existing docs: If other folks have helpful resources (write-ups, blog posts, etc), please comment on this thread to add them. |
Beta Was this translation helpful? Give feedback.
-
It can be tricky to implement repos! They can be a bit slippery to think about, and it is pretty easy to take some simplifying assumptions and then learn later that your implementation needed some additional feature or complexity. The very strong analogy is to I've laid out three different ways of answering your question and thinking about repos. In analogy to git these would roughly break down like:
Repos as a Software API: what functionality do you need to support?One way to think about atproto repos is as a black-box data container. What software API methods are needed for interacting with this data container? The basics:
Not directly related, but needed:
More advanced:
Implementation design decisions:
Keep in mind that there are repos in the network today with millions of records and gigabytes of data. Repo data formats and encoding: nuts and boltsThere is a whole grab-bag of data formats, serializations, and other incidental design details to work through. These are individually mostly simple and boring, but there are a whole bunch of acronyms which may or may not be familiar or come across as idiosyncratic. A lot of these are described in the "Data Model" specs. I'm not really sure how to help here: there are just a bunch of little details to bulldoze through. A partial list:
MST as an abstract data structureThis is the specific data structure used to turn a collection of records (with paths) in to a tree structure. It is similar to something like a Red-Black Tree: you get the general idea from the abstraction, but the overall concept is parameterized (what hash algorithm to use for hashing? what "prefix length" for counting tree depth?) which need to be specified (and are for atproto), and then a bunch of encoding and data format issues discussed in the section above. This one is complex and i've already spent a bunch of time writing this doc, so going to cut things a bit short. I hope that the atproto "Repository" spec roughly documents the details. Realistically, it would be most helpful to have some diagrams and worked examples (specific records and paths resulting in specific hashes) to demonstrate all this, but it is a huge amount of work to generate that kind of documentation and we haven't gotten around to it yet. |
Beta Was this translation helpful? Give feedback.
It can be tricky to implement repos! They can be a bit slippery to think about, and it is pretty easy to take some simplifying assumptions and then learn later that your implementation needed some additional feature or complexity.
The very strong analogy is to
git
repos.git
is famously complex as well! And harder when you try to actually implement anything hard with it.I've laid out three different ways of answering your question and thinking about repos. In analogy to git these would roughly break down like:
git
CLI support? those are analogous to thecom.atproto.sync.*
andcom.atproto.repo.*
lexicons. what functions and API does a git implementati…