discv5: remove topic stuff (for now)

fjl · fjl · commit 326991598447 · 2023-03-31T11:10:15.000+02:00
This will be brought back later.
diff --git a/discv5/discv5-rationale.md b/discv5/discv5-rationale.md
@@ -176,8 +176,7 @@ discovery mechanism must be chosen.
 Another reason for UDP is communication latency: participants in the discovery protocol
 must be able to communicate with a large number of other nodes within a short time frame
 to establish and maintain the neighbor set and must perform regular liveness checks on
-their neighbors. For the topic advertisement system, registrants collect tickets and must
-use them as soon as the ticket expires to place an ad in a topic queue.
+their neighbors.
 
 These protocol interactions are difficult to implement in a TCP setting where connections
 require multiple round-trips before application data can be sent and the connection
@@ -207,7 +206,7 @@ understandable while providing a distributed database that scales with the numbe
 participants. Our system also relies on the routing table to allow enumeration and random
 traversal of the whole network, i.e. all participants can be found. Most importantly,
 having a structured network with routing enables thinking about DHT 'address space' and
-'regions of address space'. These concepts are used to build the [topic-based node index].
+'regions of address space'.
 
 Kademlia is often criticized as a naive design with obvious weaknesses. We believe that
 most issues with simple Kademlia can be overcome by careful programming and the benefits
@@ -219,8 +218,7 @@ The well-known 'sybil attack' is based on the observation that creating node ide
 essentially free. In any system using a measure of proximity among node identities, an
 adversary may place nodes close to a chosen node by generating suitable identities. For
 basic node discovery through network enumeration, the 'sybil attack' poses no significant
-challenge. Sybils are a serious issue for the topic-based node index, especially for
-topics provided by few participants, because the index relies on node distance.
+challenge.
 
 An 'eclipse attack' is usually based on generating sybil nodes with the goal of polluting
 the victim node's routing table. Once the table is overtaken, the victim has no way to
@@ -307,7 +305,7 @@ Go implementation shows that the handshake computation takes 500µs on a 2014-er
 using the default secp256k1/keccak256 identity scheme. That's a lot, but note the cost
 amortizes because nodes commonly exchange multiple packets. Subsequent packets in the same
 conversation can be decrypted and authenticated in just 2µs. The most common protocol
-interaction is a FINDNODE or TOPICQUERY request on an unknown node with 4 NODES responses.
+interaction is a FINDNODE request on an unknown node with 4 NODES responses.
 
 To put things into perspective: encryption and authentication in Discovery v5 is still a
 significant improvement over the authentication scheme used in Discovery v4, which
@@ -342,79 +340,6 @@ disturb the operation of the protocol. Session keys per node-ID/IP generally pre
 replay across sessions. The `request-id`, mirrored in response packets, prevents replay of
 responses within a session.
 
-## The Topic Index
-
-Using FINDNODE queries with appropriately chosen targets, the entire DHT can be sampled by
-a random walk to find all other participants. When building a distributed application, it
-is often desirable to restrict the search to participants which provide a certain service.
-A simple solution to this problem would be to simply split up the network and require
-participation in many smaller application-specific networks. However, such networks are
-hard to bootstrap and also more vulnerable to attacks which could isolate nodes.
-
-The topic index provides discovery by provided service in a different way. Nodes maintain
-a single node table tracking their neighbors and advertise 'topics' on nodes found by
-randomly walking the DHT. While the 'global' topic index can be also spammed, it makes
-complete isolation a lot harder. To prevent nodes interested in a certain topic from
-finding each other, the entire discovery network would have to be overpowered.
-
-To make the index useful, searching for nodes by topic must be efficient regardless of the
-number of advertisers. This is achieved by estimating the topic 'radius', i.e. the
-percentage of all live nodes which are advertising the topic. Advertisement and search
-activities are restricted to a region of DHT address space around the topic's 'center'.
-
-We also want the index to satisfy another property: When a topic advertisement is placed,
-it should last for a well-defined amount of time. This ensures nodes may rely on their
-advertisements staying placed rather than worrying about keeping them alive.
-
-Finally, the index should consume limited resources. Just as the node table is limited in
-number and size of buckets, the size of the index data structure on each node is limited.
-
-### Why should advertisers wait?
-
-Advertisers must wait a certain amount of time before they can be registered. Enforcing
-this time limit prevents misuse of the topic index because any topic must be important
-enough to outweigh the cost of waiting. Imagine a group phone call: announcing the
-participants of the call using topic advertisement isn't a good use of the system because
-the topic exists only for a short time and will have very few participants. The waiting
-time prevents using the index for this purpose because the call might already be over
-before everyone could get registered.
-
-### Dealing with Topic Spam
-
-Our model is based on the following assumptions:
-
-- Anyone can place their own advertisements under any topics and the rate of placing ads
-  is not limited globally. The number of active ads for any node is roughly proportional
-  to the resources (network bandwidth, mostly) spent on advertising.
-- Honest actors whose purpose is to connect to other honest actors will spend an adequate
-  amount of efforts on registering and searching for ads, depending on the rate of newly
-  established connections they are targeting. If the given topic is used only by honest
-  actors, a few registrations per minute will be satisfactory, regardless of the size of
-  the subnetwork.
-- Dishonest actors may want to place an excessive amount of ads just to disrupt the
-  discovery service. This will reduce the effectiveness of honest registration efforts by
-  increasing the topic radius and/or topic queue waiting times. If the attacker(s) can
-  place a comparable amount or more ads than all honest actors combined then the rate of
-  new (useful) connections established throughout the network will reduce proportionally
-  to the `honest / (dishonest + honest)` registration rates.
-
-This adverse effect can be countered by honest actors increasing their registration and
-search efforts. Fortunately, the rate of established connections between them will
-increase proportionally both with increased honest registration and search efforts. If
-both are increased in response to an attack, the required factor of increased efforts from
-honest actors is proportional to the square root of the attacker's efforts.
-
-### Detecting a useless registration attack
-
-In the case of a symmetrical protocol, where nodes are both searching and advertising
-under the same topic, it is easy to detect when most of the found ads turn out to be
-useless and increase both registration and query frequency. It is a bit harder but still
-possible with asymmetrical (client-server) protocols, where only clients can easily detect
-useless registrations, while advertisers (servers) do not have a direct way of detecting
-when they should increase their advertising efforts. One possible solution is for servers
-to also act as clients just to test the server capabilities of other advertisers. It is
-also possible to implement a feedback system between trusted clients and servers.
-
 # References
 
 - Petar Maymounkov and David Mazières.
@@ -451,5 +376,4 @@ also possible to implement a feedback system between trusted clients and servers
   <https://eprint.iacr.org/2018/236.pdf>
 
 [wire protocol]: ./discv5-wire.md
-[topic-based node index]: ./discv5-theory.md#topic-advertisement
 [node records]: ../enr.md
diff --git a/discv5/discv5-theory.md b/discv5/discv5-theory.md
@@ -191,13 +191,13 @@ pending when WHOAREYOU is received, as in the following example:
 
     A -> B   FINDNODE
     A -> B   PING
-    A -> B   TOPICQUERY
+    A -> B   TALKREQ
     A <- B   WHOAREYOU (nonce references PING)
 
 When this happens, all buffered requests can be considered invalid (the remote end cannot
 decrypt them) and the packet referenced by the WHOAREYOU `nonce` (in this example: PING)
 must be re-sent as a handshake. When the response to the re-sent is received, the new
-session is established and other pending requests (example: FINDNODE, TOPICQUERY) may be
+session is established and other pending requests (example: FINDNODE, TALKREQ) may be
 re-sent.
 
 Note that WHOAREYOU is only ever valid as a response to a previously sent request. If
@@ -334,196 +334,6 @@ the distance to retrieve more nodes from adjacent k-buckets on `B`:
 Node `A` now sorts all received nodes by distance to the lookup target and proceeds by
 repeating the lookup procedure on another, closer node.
 
-## Topic Advertisement
-
-The topic advertisement subsystem indexes participants by their provided services. A
-node's provided services are identified by arbitrary strings called 'topics'. A node
-providing a certain service is said to 'place an ad' for itself when it makes itself
-discoverable under that topic. Depending on the needs of the application, a node can
-advertise multiple topics or no topics at all. Every node participating in the discovery
-protocol acts as an advertisement medium, meaning that it accepts topic ads from other
-nodes and later returns them to nodes searching for the same topic.
-
-### Topic Table
-
-Nodes store ads for any number of topics and a limited number of ads for each topic. The
-data structure holding advertisements is called the 'topic table'. The list of ads for a
-particular topic is called the 'topic queue' because it functions like a FIFO queue of
-limited length. The image below depicts a topic table containing three queues. The queue
-for topic `T₁` is at capacity.
-
-![topic table](./img/topic-queue-diagram.png)
-
-The queue size limit is implementation-defined. Implementations should place a global
-limit on the number of ads in the topic table regardless of the topic queue which contains
-them. Reasonable limits are 100 ads per queue and 50000 ads across all queues. Since ENRs
-are at most 300 bytes in size, these limits ensure that a full topic table consumes
-approximately 15MB of memory.
-
-Any node may appear at most once in any topic queue, that is, registration of a node which
-is already registered for a given topic fails. Implementations may impose other
-restrictions on the table, such as restrictions on the number of IP-addresses in a certain
-range or number of occurrences of the same node across queues.
-
-### Tickets
-
-Ads should remain in the queue for a constant amount of time, the `target-ad-lifetime`. To
-maintain this guarantee, new registrations are throttled and registrants must wait for a
-certain amount of time before they are admitted. When a node attempts to place an ad, it
-receives a 'ticket' which tells them how long they must wait before they will be accepted.
-It is up to the registrant node to keep the ticket and present it to the advertisement
-medium when the waiting time has elapsed.
-
-The waiting time constant is:
-
-    target-ad-lifetime = 15min
-
-The assigned waiting time for any registration attempt is determined according to the
-following rules:
-
-- When the table is full, the waiting time is assigned based on the lifetime of the oldest
-  ad across the whole table, i.e. the registrant must wait for a table slot to become
-  available.
-- When the topic queue is full, the waiting time depends on the lifetime of the oldest ad
-  in the queue. The assigned time is `target-ad-lifetime - oldest-ad-lifetime` in this
-  case.
-- Otherwise the ad may be placed immediately.
-
-Tickets are opaque objects storing arbitrary information determined by the issuing node.
-While details of encoding and ticket validation are up to the implementation, tickets must
-contain enough information to verify that:
-
-- The node attempting to use the ticket is the node which requested it.
-- The ticket is valid for a single topic only.
-- The ticket can only be used within the registration window.
-- The ticket can't be used more than once.
-
-Implementations may choose to include arbitrary other information in the ticket, such as
-the cumulative wait time spent by the advertiser. A practical way to handle tickets is to
-encrypt and authenticate them with a dedicated secret key:
-
-    ticket       = aesgcm_encrypt(ticket-key, ticket-nonce, ticket-pt, '')
-    ticket-pt    = [src-node-id, src-ip, topic, req-time, wait-time, cum-wait-time]
-    src-node-id  = node ID that requested the ticket
-    src-ip       = IP address that requested the ticket
-    topic        = the topic that ticket is valid for
-    req-time     = absolute time of REGTOPIC request
-    wait-time    = waiting time assigned when ticket was created
-    cum-wait     = cumulative waiting time of this node
-
-### Registration Window
-
-The image below depicts a single ticket's validity over time. When the ticket is issued,
-the node keeping it must wait until the registration window opens. The length of the
-registration window is 10 seconds. The ticket becomes invalid after the registration
-window has passed.
-
-![ticket validity over time](./img/ticket-validity.png)
-
-Since all ticket waiting times are assigned to expire when a slot in the queue opens, the
-advertisement medium may receive multiple valid tickets during the registration window and
-must choose one of them to be admitted in the topic queue. The winning node is notified
-using a [REGCONFIRMATION] response.
-
-Picking the winner can be achieved by keeping track of a single 'next ticket' per queue
-during the registration window. Whenever a new ticket is submitted, first determine its
-validity and compare it against the current 'next ticket' to determine which of the two is
-better according to an implementation-defined metric such as the cumulative wait time
-stored in the ticket.
-
-### Advertisement Protocol
-
-This section explains how the topic-related protocol messages are used to place an ad.
-
-Let us assume that node `A` provides topic `T`. It selects node `C` as advertisement
-medium and wants to register an ad, so that when node `B` (who is searching for topic `T`)
-asks `C`, `C` can return the registration entry of `A` to `B`.
-
-Node `A` first attempts to register without a ticket by sending [REGTOPIC] to `C`.
-
-    A -> C  REGTOPIC [T, ""]
-
-`C` replies with a ticket and waiting time.
-
-    A <- C  TICKET [ticket, wait-time]
-
-Node `A` now waits for the duration of the waiting time. When the wait is over, `A` sends
-another registration request including the ticket. `C` does not need to remember its
-issued tickets since the ticket is authenticated and contains enough information for `C`
-to determine its validity.
-
-    A -> C  REGTOPIC [T, ticket]
-
-Node `C` replies with another ticket. Node `A` must keep this ticket in place of the
-earlier one, and must also be prepared to handle a confirmation call in case registration
-was successful.
-
-    A <- C  TICKET [ticket, wait-time]
-
-Node `C` waits for the registration window to end on the queue and selects `A` as the node
-which is registered. Node `C` places `A` into the topic queue for `T` and sends a
-[REGCONFIRMATION] response.
-
-    A <- C  REGCONFIRMATION [T]
-
-### Ad Placement And Topic Radius
-
-Since every node may act as an advertisement medium for any topic, advertisers and nodes
-looking for ads must agree on a scheme by which ads for a topic are distributed. When the
-number of nodes advertising a topic is at least a certain percentage of the whole
-discovery network (rough estimate: at least 1%), ads may simply be placed on random nodes
-because searching for the topic on randomly selected nodes will locate the ads quickly enough.
-
-However, topic search should be fast even when the number of advertisers for a topic is
-much smaller than the number of all live nodes. Advertisers and searchers must agree on a
-subset of nodes to serve as advertisement media for the topic. This subset is simply a
-region of the node ID address space, consisting of nodes whose Kademlia address is within a
-certain distance to the topic hash `sha256(T)`. This distance is called the 'topic
-radius'.
-
-Example: for a topic `f3b2529e...` with a radius of 2^240, the subset covers all nodes
-whose IDs have prefix `f3b2...`. A radius of 2^256 means the entire network, in which case
-advertisements are distributed uniformly among all nodes. The diagram below depicts a
-region of the address space with topic hash `t` in the middle and several nodes close to
-`t` surrounding it. Dots above the nodes represent entries in the node's queue for the
-topic.
-
-![diagram explaining the topic radius concept](./img/topic-radius-diagram.png)
-
-To place their ads, participants simply perform a random walk within the currently
-estimated radius and run the advertisement protocol by collecting tickets from all nodes
-encountered during the walk and using them when their waiting time is over.
-
-### Topic Radius Estimation
-
-Advertisers must estimate the topic radius continuously in order to place their ads on
-nodes where they will be found. The radius mustn't fall below a certain size because
-restricting registration to too few nodes leaves the topic vulnerable to censorship and
-leads to long waiting times. If the radius were too large, searching nodes would take too
-long to find the ads.
-
-Estimating the radius uses the waiting time as an indicator of how many other nodes are
-attempting to place ads in a certain region. This is achieved by keeping track of the
-average time to successful registration within segments of the address space surrounding
-the topic hash. Advertisers initially assume the radius is 2^256, i.e. the entire network.
-As tickets are collected, the advertiser samples the time it takes to place an ad in each
-segment and adjusts the radius such that registration at the chosen distance takes
-approximately `target-ad-lifetime / 2` to complete.
-
-## Topic Search
-
-Finding nodes that provide a certain topic is a continuous process which reads the content
-of topic queues inside the approximated topic radius. This is a much simpler process than
-topic advertisement because collecting tickets and waiting on them is not required.
-
-To find nodes for a topic, the searcher generates random node IDs inside the estimated
-topic radius and performs Kademlia lookups for these IDs. All (intermediate) nodes
-encountered during lookup are asked for topic queue entries using the [TOPICQUERY] packet.
-
-Radius estimation for topic search is similar to the estimation procedure for
-advertisement, but samples the average number of results from TOPICQUERY instead of
-average time to registration. The radius estimation value can be shared with the
-registration algorithm if the same topic is being registered and searched for.
 
 [EIP-778]: ../enr.md
 [identity scheme]: ../enr.md#record-structure
@@ -532,6 +342,3 @@ registration algorithm if the same topic is being registered and searched for.
 [PING]: ./discv5-wire.md#ping-request-0x01
 [PONG]: ./discv5-wire.md#pong-response-0x02
 [FINDNODE]: ./discv5-wire.md#findnode-request-0x03
-[REGTOPIC]: ./discv5-wire.md#regtopic-request-0x07
-[REGCONFIRMATION]: ./discv5-wire.md#regconfirmation-response-0x09
-[TOPICQUERY]: ./discv5-wire.md#topicquery-request-0x0a
diff --git a/discv5/discv5-wire.md b/discv5/discv5-wire.md