What if stuff after the relay (optionally) also validated records? #2262

goeo- · 2024-03-01T19:12:48Z

goeo-
Mar 1, 2024

I really like a lot of the bsky protocol design, and I think the "billionaire proof" goal can actually be achieved as more people start adopting did:web, and/or owning their keys on plc and the plc being more auditable / there are easy ways for people to switch to a different authority etc.

One of the things I really like about bsky is how things are signed by your atproto signing key, and stored as merkle trees. This makes it so the transfers are authenticated, and things can't lie to you about a repository without controlling its key.

The app.bsky lexicons served by the app view not exposing signature proofs, and clients/pdses just trusting the app view, is antithetical to this in my opinion. Especially given relays are more expensive to run by design, and people are really only expected to have control over their client and their pds, I really think some verification should also be done by the client and/or the pds.

For now, it's possible to verify specific posts by getting a proof from the relay (also a worry - how do you know what relay your app view uses?) so that you're not doing queries to other people's pdses, but I'd really like to see lexicons like getTimeline and getPosts to at least have an option to return cryptographic proof. Implementation in clients would also be appreciated - for now, I've made a wasm library to verify a post as I described, and it can be plugged into web based apps. I'd like to make this into a user script or something, but modern javascript really makes things unnecessarily complex, so for now I've just modified https://github.com/mary-ext/langit. In my implementation I am only verifying when getPostThread is called as in when I click on a post. I plan to open source this soon. If app view were to change, I think verifying everything the client sees could become reasonable.

bnewbold · 2024-03-01T21:49:03Z

bnewbold
Mar 1, 2024
Maintainer

This is a great question to raise. There are a few different aspects to this, which will make this response somewhat sprawling.

What is an AppView, and what is the scope of trust? In the general sense, an AppView can return anything specified in a Lexicon. You could have a weather AppView that returns forecasts which are never persisted in repository records anywhere. That's a bit weird and "not-atproto-y", but there is nothing really wrong with it, and the generality makes "what can be built with atproto" broader in scope. The usual way of thinking about AppViews is that they aggregate repo content, identity metadata, and moderation metadata (labels) from the network ("atmosphere") and present aggregations and transforms, with the expectations and output being relatively deterministic and reproducible. Eg, how blocks work is relatively clear (though we do need to document and formalize the details better), or how follower accounts are computed. And sometimes content is just returned basically as-is, eg in an author feed view. All of which to say: some parts are more-or-less auditable, and most of the app.bsky AppView "should" be, but not everything will be in the general case.

One aspect we could do better on for sure is passed-through content. We have learned that it probably would make sense to pass-through full original record documents in most cases, and to identity records by AT-URI, and optionally CID, in responses themselves. For example, in a list of followers, what specific record describes the follow relationship. In a profile view, what was the full profile record (which might contain additional off-spec fields). Without that info, it is hard to fetch and verify content. As a stretch goal, maybe a generic wrapper type around records (a new Lexicon type or $ field?) would make it possible to extract and verify records in responses even when the Lexicon isn't known; this could allow things like the PDS to (optionally) participate in validation in a generic sense. Unfortunately we didn't design the current app.bsky Lexicons this way. On the other hand, this doesn't impact records themselves, and it might not be too disruptive to do a v2 of the AppView request schemas. My personal opinion is that we should probably get more experience with new app Lexicons before making big changes to the app.bsky Lexicons though.

Just thinking about it now, another possibly helpful piece of info in responses could be the currently-indexed repo revision (rev) that the AppView has seen, for each piece of content. Not sure that is worth the effort to index/include, but it shouldn't be too-too much.

Another aspect is being able to audit and verify aggregations and things like "follow count". Other platforms (like Twitter) have a lot of allegations of repression and manipulation thrown around. This does get a bit subjective because different people will define spam and bot accounts differently. But it would be nice from a design perspective to be able to audit things like follow accounts: should be able to fetch the full list of followers, confirm that records exist for them all, that no known publicly-distributed follows are missing, and that the count matches. Unfortunately today this isn't very easy, partially because of API decisions described above, and partially because our current implementation is a bit fuzzy about aggregations: if an account is taken down, that is represented immediately in follower lists, but the count isn't updated until a cleanup process happens. Not sure how important this is to you, but it is something we thought about a fair amount back in the early days: how can you efficiently trust things like counts? Should counts be signed? Generated deterministically? Etc.

One area that is hard to audit is full removal and deletion of content. Why might content not be in an appview? Maybe the records failed to validate with one implementation, but not another? Maybe there was a network error or timeout? Or a legitimate non-public takedown of some kind? Hard to distinguish that from redaction or suppression of content with some other motivation, and hard to sort this out just using signatures.

To get specifically at your question: I don't think we are likely to include proof chains (eg, signatures and MST chains) in AppView responses, even as an optional request, as part of our APIs for app.bsky. Folks could certainly design and include this sort of thing in their own Lexicons. The AppView would need to do a lot of additional work to enable this, storing the full MST trees in hot storage, and responses would be large: proof chains are pretty big overhead per-record (as opposed to entire-repo). Requiring folks who want to do this kind of validation to do the additional fetches, or have a hot copy of repository locally, doesn't seem like too big of a requirement, unless it was expected for everybody to do this all the time, which seems like a lot of work. This kind just an opinion and an efficiency/benefit trade-off though, I can see that somebody else might decide they really really care about signature and want to include them.

There is a whole other thread of conversation about why we ever include signatures, and what the actual motivation and benefits for making content self-certifying, and whether that applies to aggregations like AppView reponses, but this is already a pretty long post.

5 replies

goeo- Mar 2, 2024
Author

hmm counts seems tough, but i think it's very okay to have missing content not be auditable. re:

unless it was expected for everybody to do this all the time, which seems like a lot of work

i did mean that, at least at pds level, every client doing it for everything may be a bit too much but I think the pdses doing it could be reasonable

other than that, yeah i think it's okay to hit the relay. however the "(also a worry - how do you know what relay your app view uses?)" is a genuine concern, because for now i've hardcoded bsky.network and if my pds were to switch to a different app view that uses a different relay things may start breaking

goeo- Mar 2, 2024
Author

also

There is a whole other thread of conversation about why we ever include signatures, and what the actual motivation and benefits for making content self-certifying, and whether that applies to aggregations like AppView reponses, but this is already a pretty long post.

very curious now actually. why must the relay not trust the pdses if the clients can trust the app view?

bnewbold Mar 3, 2024
Maintainer

I feel like if you are really validating it is a bit more idiomatic to hit the user's PDS directly, instead of the Relay? traffic load shouldn't be bad depending on how many folks are doing this.

The PDS doesn't (or soon won't) determine the appview, client apps will. And, generally, any Relay trying to crawl the whole network is expected to have virtually complete coverage. The fact that the PDS is configured to notify specific Relays is a bit of a red herring, and sort of a hack added to make on-boarding new PDS instances a smooth dev experience. We expect that any Relay will be able to discover all PDS instances in the network (for example, by dumping the full PDS directory, or asking an existing Relay for a full list, which is a public API). We expect big-world Relays to be basically commodity and interchangeable.

goeo- Mar 3, 2024
Author

hmm bsky.network is definitely not "trying to crawl the whole network" as it requires a discord account and a ticket right now - other than that, i'm only aware of one relay hosted by a friend and it doesn't have a good way of discovering new pdses other than relying on bsky.network (but i'm sure things will improve here as things open up)

re:

I feel like if you are really validating it is a bit more idiomatic to hit the user's PDS directly, instead of the Relay? traffic load shouldn't be bad depending on how many folks are doing this.

i would like to make this a common thing, really. if we have self certifying data might as well verify it? doing it on the pds rather than on the client (like i am with https://github.com/goeo-/public-transport/) is probably a better idea for scalability (but that wouldn't be possible if the client starts talking directly to the app view rather than going through the pds), but i really think i should be hitting a relay (or an app view) and not every pds, because like, we could really be using mastodon at that point?

imax9000 Mar 3, 2024

I feel like if you are really validating it is a bit more idiomatic to hit the user's PDS directly, instead of the Relay? traffic load shouldn't be bad depending on how many folks are doing this.

If enough people will get serious about carefully selecting who can control their information flows - this will negate your design goal of a PDS being cheap to run, and query load will grow linearly with the number of followers.

We expect big-world Relays to be basically commodity and interchangeable.

This can't be true in the real world. If you reach any kind of meaningful scale - relays will be very expensive to run. Since you also explicitly rely on rich people with political agendas to finance them - each relay will be a special snowflake with its own kind of censorship.

Real-world example: web search engines. Only a few megacorps can afford to run their own independently from others. In case of China, it also applies heavy censorship.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What if stuff after the relay (optionally) also validated records? #2262

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What if stuff after the relay (optionally) also validated records? #2262

goeo- Mar 1, 2024

Replies: 1 comment · 5 replies

bnewbold Mar 1, 2024 Maintainer

goeo- Mar 2, 2024 Author

goeo- Mar 2, 2024 Author

bnewbold Mar 3, 2024 Maintainer

goeo- Mar 3, 2024 Author

imax9000 Mar 3, 2024

goeo-
Mar 1, 2024

Replies: 1 comment 5 replies

bnewbold
Mar 1, 2024
Maintainer

goeo- Mar 2, 2024
Author

goeo- Mar 2, 2024
Author

bnewbold Mar 3, 2024
Maintainer

goeo- Mar 3, 2024
Author