Skip to content

Commit 00741a4

Browse files
committed
rfc: address feedback
1 parent af39409 commit 00741a4

File tree

1 file changed

+16
-24
lines changed

1 file changed

+16
-24
lines changed

rfcs/rework-index-updates.md

+16-24
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,17 @@
11
# Summary
22

33
This RFC proposes moving away from the current model of fetching new releases
4-
to build, moving from running [crates-index-diff] in a thread to using webhooks
5-
and the [crates-index] crate.
4+
to build, moving from using a timer to receiving webhooks.
65

76
# Motivation
87

98
While the current approach has worked well for us so far, it has some problems:
109

11-
* Running the update in a cronjob every 2 minutes is wasteful, as there is
10+
* Running the update in a timer every 2 minutes is wasteful, as there is
1211
often a greater delay between two publishes.
13-
* Running the update in a crobjob every 2 minutes adds delay to getting the
12+
* Running the update in a timer every 2 minutes adds delay to getting the
1413
documentation built if the queue is empty, as the release might potentially
1514
have to wait those extra two minutes.
16-
* The approach doesn't scale, if we want to move to a setup where there is more
17-
than a single frontend server we'd have to elect which server runs the fetch.
1815
* The way crates-index-diff stores its state (a branch in the local repo) is
1916
fragile, as it might become out of sync causing the loss of a publish.
2017
* The way crates-index-diff stores its state makes it hard to move the server
@@ -27,34 +24,29 @@ endpoint, `/_/index-webhook`, which starts a index sync in the background. The
2724
payload of the webhook is ignored, but the webhook signature is validated if a
2825
secret key is provided to the application through an environment variable.
2926

30-
When an index synchronization starts, the [crates-index] crate is used to load
31-
in memory a list of all crates, their versions and whether each version is
32-
yanked. Then, the full list of releases and queued crates is fetched from the
33-
database, and it's compared with the contents of the index. Finally, idempotent
34-
queries are sent to the database to update its state (queueing crates and
35-
changing the yanked status) where needed.
27+
We also change [crate-index-diff] to store the hash of the last visited commit
28+
in the database instead of a local branch in the index repository: this will
29+
allow new instances to catch up immediately without the need of copying over
30+
the git repository.
31+
32+
For this proposal to work we need to make the updates to the queue idempotent,
33+
and add a lock on the index repository in each machine to prevent the same
34+
machine from updating the same repository multiple times.
3635

3736
# Rationale of the proposal
3837

39-
This proposal removes the cronjob and implements realtime updates of the index,
38+
This proposal removes the timer and implements realtime updates of the index,
4039
which does not have to happen on a specific machine if we ever move to multiple
4140
frontend servers.
4241

43-
This proposal also works if multiple index synchronizations start at the
44-
same time (for example, if two requests are received at the same time) without
45-
having to implement a job queue: since all the updates to the database are
46-
idempotent multiple syncs at the same time would not affect each other
47-
(provided we structure the SQL queries the right way). A single mutex on each
48-
host to lock `git fetch`es on the index might be needed though.
49-
5042
# Alternatives
5143

52-
We could implement only the webhook or the index synchronization, keeping the
53-
old code for the part we don't replace. While it would improve the status quo,
54-
it wouldn't address all the problems noted in the motivation.
44+
We could also switch from [crates-index-diff] to doing a full synchronization
45+
every time a new crate is published. While it would decrease the chances of an
46+
inconsistency between crates.io and docs.rs, it would impact performance every
47+
time a new crate is published.
5548

5649
We could also do nothing: while the current system is not perfect it works
5750
without much trouble.
5851

5952
[crates-index-diff]: https://crates.io/crates/crates-index-diff
60-
[crates-index]: https://crates.io/crates/crates-index

0 commit comments

Comments
 (0)