Feedback on trust graph calculations #2648

ed-kung · 2025-03-17T22:25:39Z

ed-kung
Mar 17, 2025

After having done a deep dive on SN's Web of Trust Algorithm, I have some feedback. Please don't view this as criticism: I think the current algorithm does what it's supposed to do and is directionally correct. These thoughts are shared in the spirit of discussing how the existing algorithm can be improved to make it more robust. It's also possible I made some mistakes so let me know if you see anything you disagree with or that I got wrong!

These comments all apply to the calculation of the trust graph, because everything else is a fairly straightforward application of the TrustRank algorithm.

1. The current system is gameable.

I didn't mention this in my SN post, but the current system is actually pretty gameable. That's because other users' trust of me is calculated based on how much they zap posts that I zapped first.

But the issue is that the before/after calculation is based on the timestamp of the first zap. I can easily game the system by zapping 1 sat on an item early, waiting to see if it becomes popular, and if it does, zap it a lot more. The timestamp associated with my action on the item will be very early, leading all the trust from other peoples' zaps to accumulate to myself.

2. Using the ratio of zap amounts leads to unintended scaling behavior.

Imagine three users, A, B, and C. They all agree with each other 100% of the time. However, users A and B always zap 21 sats, but user C always zaps 42 sats. Based on how "before" and "after" are calculated, the trust between C and the other users will be one half of the trust between A and B.

I understand the intent behind using the ratio of the zap amounts, but it seems like a person's baseline zapping behavior should be taken into consideration. I'm not sure what the solution is, but I think specifying an underlying behavioral model would help (see below).

3. It's unclear how to interpret the confidence interval calculation.

I understand the intent behind using the binomial proportion confidence interval, but I don't think it's well specified. The issue here is that the binomial model assumes an integer number of successes and trials, both of which should be non-negative.

But before - disagree can be negative, and b_total - after may not be an integer. The calculation still works and is directionally correct, but it's hard to interpret what the resulting number actually means and whether or not it's well scaled.

4. For users with long histories, it'll be hard to overcome historical behavior.

I believe a user's entire history of zaps is currently used to calculate trust graph. This makes it hard to overcome a long history of zapping behavior. It might make more sense to take the last $$N$$ item actions in a territory for each user when calculating the trust graph.

5. It's unclear what the underlying behavioral model is.

I think a lot of these problems can be solved by explicitly writing down an underlying behavioral model. It wouldn't have to be super realistic, it would just have to capture some basic features of user behavior. I'm not entirely sure what the model would look like, but I think it would involve at least the following ideas:

An underlying similarity parameter $$\theta_{ij}$$ between users $$i$$ and $$j$$ that we're trying to learn about.
A parameter describing the probability that user $$i$$ becomes aware of an item $$k$$.
Conditional on two users $$i$$ and $$j$$ both being aware of an item, the conditional probability that they both zap/downzap/disagree on item $$k$$, as a function of their similarity parameter, with some adjustment based on their usual zapping amounts.
Given the above, there should be a mapping from $$i$$ and $$j$$'s zapping behavior to an estimate of $$\theta_{ij}$$, and we should be able to calculate a confidence interval for it.

huumn · 2025-03-18T00:58:37Z

huumn
Mar 18, 2025
Maintainer

Man this is soooooo awesome thanks! To date, I've never had anyone review this code.

But the issue is that the before/after calculation is based on the timestamp of the first zap.

I'll switch it to max now that it's out of the closet. I remember noting this to myself 2 years ago when I wrote it, wanting to do a weighted average, but I totally forgot about it until now. Thanks!

Using the ratio of zap amounts leads to unintended scaling behavior.

Ideally we'd compare deviations from mean zapping amounts I think. Before a week ago, we only used zap counts and this was a lazy half step. Surprisingly it didn't seem to make much of a difference empirically.

At least for the toy example, C's trust of A and B is turned into a probability distribution, so if I only trust A and B, then I trust each 50% regardless of how much they're zapping. Still, your point stands, the effect of this is that I trust people who zap the same amount as me more - which is maybe not what we want. (The probability distribution thing gives me more confidence in this fudgeness than I probably deserve to give it.)

But before - disagree can be negative

There's a control statement that prevents this from being negative. If it's negative, we just score the user at 0. Still, ideally that wouldn't be necessary.

I believe a user's entire history of zaps is currently used to calculate trust graph.
It might make more sense to take the last N item actions in a territory for each user when calculating the trust graph.

Great suggestion. I've only considered creating a sliding window of time but not number of zaps.

I'm not entirely sure what the model would look like, but I think it would involve at least the following ideas

Wooo this is great!

0 replies

ed-kung · 2025-03-18T16:21:23Z

ed-kung
Mar 18, 2025
Author

Will take a look at coming up with a sensible behavioral model.

0 replies

ed-kung · 2025-04-14T21:36:18Z

ed-kung
Apr 14, 2025
Author

I'm going to use this space to lay out thoughts on different options for WoT. Consider this open brainstorming. The current algorithm is fine so I'm not suggesting replacing it. But if some ideas come out of this that can improve SN's WoT, then great!

I'll lay out different options and discuss what I see as pros and cons as I think about them, one post at a time. I'll focus on the theoretical underpinnings and not the implementation details.

IMO, it's better to think about the theoretical aspects first, then determine afterwards whether implementation is feasible (and check for off-the-shelf / open-source implementations.)

0 replies

ed-kung · 2025-04-14T21:52:19Z

ed-kung
Apr 14, 2025
Author

Collaborative Filtering

The main idea is to model the zap amount that user $$u$$ gives to item $$i$$ as:

$$ z_{ui} = \mu + \alpha_u + \beta_i + x_u^T y_i $$

$$\mu$$ is the global average of the zap amounts
$$\alpha_u$$ is a user-specific bias (some users zap more generously than others)
$$\beta_i$$ is an item-specific bias (some items have higher overall quality and get zapped more than others)
$$x_u$$ is a latent feature vector for users (i.e. embeddings that represent user tastes)
$$y_i$$ is a latent feature vector for items (i.e. embeddings that represent item characteristics)

The goal is to use data on zaps to estimate values for $$\mu$$, $$\alpha_u$$, $$\beta_i$$, $$x_u$$ and $$y_i$$.

How does it relate to Web of Trust?

$$x_u^T x_v$$, i.e. the dot-product between $$x_u$$ and $$x_v$$, will give the cosine similarity of user $$u$$ and user $$v$$. This measures how similar the two users are in terms of their tastes, as measured by what they zap. This could be used to form the trust graph.

Pros

Can derive embeddings for both users and items.
Can generate user-specific recommendations.
Can construct similarity matrix between users and similarity matrix between items
User bias term takes care of differences between users in terms of their average zapping behavior

Cons

Doesn't reward early zapping, which is a key component of the current algorithm
Can't generate embeddings for users and items with no data (and embeddings will be unreliable for users and items with little data)
- For items, a separate model can be used to predict $$y_i$$ based on the item's textual content.
- For users, they could probably be seeded with the average $$x_u$$ across all other users.
The basic model treats two behaviors, "seen, but no zap", and "didn't see", as equivalent, but this could lead to erroneous similarity metrics. Similarity may measure attention more than tastes. There are some papers written about how to deal with this.
It isn't clear how "zero" trust is reflected in this model. i.e. If we seed a new user with the average $$x_u$$, it just means that their similarity to other users will be that of an average user; they wouldn't definitionally have "low" trust. In other words, this approach doesn't have a concept of "more" or "less" trustworthy, only how similar a person is or isn't to me. Thus, this approach may not be solving the problem that WoT is trying to solve, since the WoT isn't just about providing recommendations.

0 replies

ed-kung · 2025-04-15T06:20:10Z

ed-kung
Apr 15, 2025
Author

Reverse Engineering

Instead of starting with a behavioral model, we can start by listing the desired properties of the system and work our way backwards.

Desired properties:

Incentivizes early zapping.
One way scale invariance:
- If I double all my zaps, it shouldn't affect how I allocate trust to other users
- If I double all my zaps, it could affect how others allocate trust to me (larger expenditure implies more skin in the game, and thus more trustworthy)
Does not penalize zapping unpopular posts. (The current algorithm does penalize zapping lots of posts that don't get attention, because it increases the denominator in b_total)

Based on these requirements, we could calculate the trust that user $$u$$ allocates to user $$v$$ in the following way:

Find the last $$N$$ items that user $$u$$ zapped.
For each such item $$i$$, define the following terms:
- $$z_{ui}$$ : the total amount that user $$u$$ zapped to item $$i$$
- $$Z_u = \sum_i z_{ui}$$ : the total amount that user $$u$$ zapped in the last $$N$$ items user $$u$$ zapped
- $$s_{ui} = z_{ui} / Z_u$$ : the share of user $$u$$'s zaps that item $$i$$ received out of the last $$N$$ items user $$u$$ zapped
For each of the above items $$i$$, find all the users that zapped the item before user $$u$$'s zap. (Calculate time based on the last zap that each user made on the item)
For each item $$i$$ zapped by user $$u$$ in his last $$N$$ zaps, and for each user $$v$$ that zapped item $$i$$ before user $$u$$, define the following terms:
- $$z_{vi}$$ : the total amount that user $$v$$ zapped to item $$i$$. This includes the posting fee for the creator of the item.
- $$\zeta_{i} = \sum_{v} z_{vi} $$ : the total amount of zaps that item $$i$$ received before user $$u$$ zapped it (including posting fees)
- $$\theta_{vi} = z_{vi} / \zeta_{i}$$ : out of all the zaps that item $$i$$ received before user $$u$$ zapped it, the fraction coming from each user $$v$$
Finally, we calculate user $$u$$'s trust for user $$v$$ (i.e. the $$u,v$$ index on the trust graph) as:

$$ g_{u,v} = \sum_{i} \theta_{vi} s_{ui} $$

In words:

Each user has 1 unit of trust to allocate.
You allocate trust among the last $$N$$ items you zapped, according to the ratio of zap amounts you zapped it with.
Within each item, you allocate trust to the item's creator and all the users who zapped the item before you. The share of trust they receive is equal to the share of zaps that they zapped the item with.

Pros

You can show a few properties of $$g_{u,v}$$:

$$\sum_{v} g_{u,v} = 1$$. Thus, the trust graph will sum up row-wise to 1.
It is one-way scale invariant. If user $$u$$ doubles all his/her zaps, $$g_{u,v}$$ won't change.
It is not scale invariant the other way around. If user $$v$$ doubles his/her zaps, user $$u$$ will allocate more trust to user $$v$$.
It doesn't penalize zapping unpopular posts. If you zap a lot of posts that others don't zap, you increase your trust in those posts' creators, but it doesn't reduce the trust that other users allocate to you, because those zaps won't enter into the denominator of any of their trust calculations.
You can't game the system by zapping 1 sat early. If you do, you will only receive a miniscule fraction of trust from others.
- Example: If you zap 1 sat and the post fee was 100, you receive only 1/101 of the trust the next zapper allocates to the item.

Cons

It rewards more trust to more generous zappers. Thus, if two users have the exact same zapping patterns but one user zaps double the amount, the more generous zapper will receive double the trust allocation.
- This could be seen as desirable because it incentivizes generous zapping and makes the system more sybil resistant
It is not invariant to territory post fee
- Example: In a territory with a 10 sat post fee, the item creator will receive a lower allocation of trust than in a territory with a 100 sat post fee, because of how trust is allocated.
- This could also be seen as desirable, to the extent that sybils and bots are more likely to post in low-fee territories. Item creators in high-fee territories have more skin in the game compared to low fee territories.

Unlike collaborative filtering, this is a bespoke solution and therefore requires more scrutiny. I think I found all the pros and cons, but it's possible there's some I missed.

0 replies

ed-kung · 2025-11-19T18:18:16Z

ed-kung
Nov 19, 2025
Author

Springing off this discussion, I also wanted to re-evaluate the suggestion of calculating trust off the "last N zaps".

I no longer think that makes sense, and I think a time-based rolling window would be much better.

The problem with "last N zaps" is that the set of visible items can differ drastically between users depending on the time range of the "last N zaps". If someone takes only one day to zap N times, their zaps will be concentrated on items posted in that day, whereas if someone takes one month to zap N times, their zaps will be distributed across items in that whole month. It would lead to an automatic low trust between these two users, even if their preferences are actually quite similar.

Conclusion: rolling time window is more sensible than last N zaps.

1 reply

huumn Nov 19, 2025
Maintainer

I've discussed doing this at some point. I think it's the right thing to do - to let the culture change over time.

It shouldn't be too hard to implement either - a few lines of code probably.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feedback on trust graph calculations #2648

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Feedback on trust graph calculations #2648

Uh oh!

ed-kung Mar 17, 2025

1. The current system is gameable.

2. Using the ratio of zap amounts leads to unintended scaling behavior.

3. It's unclear how to interpret the confidence interval calculation.

4. For users with long histories, it'll be hard to overcome historical behavior.

5. It's unclear what the underlying behavioral model is.

Replies: 6 comments · 1 reply

Uh oh!

Uh oh!

huumn Mar 18, 2025 Maintainer

Uh oh!

ed-kung Mar 18, 2025 Author

Uh oh!

ed-kung Apr 14, 2025 Author

Uh oh!

ed-kung Apr 14, 2025 Author

Collaborative Filtering

How does it relate to Web of Trust?

Pros

Cons

Uh oh!

ed-kung Apr 15, 2025 Author

Reverse Engineering

Pros

Cons

Uh oh!

ed-kung Nov 19, 2025 Author

Uh oh!

huumn Nov 19, 2025 Maintainer

ed-kung
Mar 17, 2025

Replies: 6 comments 1 reply

huumn
Mar 18, 2025
Maintainer

ed-kung
Mar 18, 2025
Author

ed-kung
Apr 14, 2025
Author

ed-kung
Apr 14, 2025
Author

ed-kung
Apr 15, 2025
Author

ed-kung
Nov 19, 2025
Author

huumn Nov 19, 2025
Maintainer