Peer Discovery in Harmony Network
Apr 06 2019
Peer discovery is one of the essential mechanisms in a decentralized network for new nodes to join the network. A blockchain network is a decentralized network with thousands of nodes. And it is not uncommon for nodes to join or leave the network at any time. In this post, we review the peer discovery mechanism in Bitcoin and Ethereum blockchain network in details.
Here, we present the peer discovery protocol in Harmony blockchain based on our current design in the whitepaper.
Peers discovery in BTC network
The primary way to discover peers in the bitcoin network is to connect to a list of BTC nodes that are previously connected. However, for the initial connection, the node has to use a publicly known DNS feed to retrieve a list of IP addresses of long-running stable nodes. The fallback mechanism is to use a list of hard-coded IP addresses in the Bitcoin node software. For example, seed.bitcoinstats.com domain name resolves to a whole list of IP addresses. http://bitcoinstats.com/network/dns-servers/ maintains a list of DNS servers run by volunteers.
The bitcoinstats.com domain is registered by Dynadot, LLC on January 15th, 2011. The following example lists a few well-known bitcoin nodes.
Every bitcoin node has to handle a “getaddr” message. The node will respond to the “getaddr” message with a 23% of randomly chosen IP addresses known by the node based on their recentness in the last 3 hours. And the maximum is 2,500 IP addresses. Theoretically, a node may know all of the IP addresses of all the nodes in the bitcoin network by repeated querying its neighbors. Bitcoin already addressed this issue by only sending one “getaddr” response per connection.
New bitcoin node first establishes a TCP connection on port 8333. It starts a “handshake” process by transmitting a “version” message, which contains basic identifying information, including: protocol_version, local service, nTime, addrYou, addrMe, subversion, startingHeight, etc. (Github: https://github.com/bitcoin/bitcoin/blob/e74649e95122c9c61aadf607461cf701c3953f88/src/net_processing.cpp#L334). The peer node responds with “verack” to acknowledge and establish a connection, and optionally sends its own “version” message to reciprocate the connection and connect back as a peer. Once one or more connections are established, the new node will send an addr message containing its own IP address to its neighbors. The neighbors will, in turn, forward the addr message to their neighbors, ensuring that the newly connected node becomes well known and better connected.
Peer discovery in ETH network
Peer discovery in Ethereum network uses a similar mechanism as in the Bitcoin network. There are a few hardcoded bootnodes in the Ethereum node software, for example, https://github.com/ethereum/go-ethereum/blob/master/params/bootnodes.go#L23-L31
The peer discovery algorithm in Ethereum is based on the KademliaDistributed Hash Table (DHT) protocol. Different from the original Kademlia, it uses only the “PING” and “FIND_NODE” RPCs of Kademlia protocol on a 256 bits of IDs. Every node maintains a hash table of neighbor nodes using the XOR metric. Node discovery protocol v4 provides details on the Kademila-like DHT that stores information about Ethereum nodes. The Kademlia structure was chosen because it yields a topology of low diameter. Rchain has a good wiki write-up about the ETH node discovery protocol at here.
Similarly, RapidChain claims to use the Kademlia routing mechanism for inter-committee communication. That is probably useful to avoid leader-based or client-driven cross-shard transaction.
Peer Discovery in Harmony Network
As with Bitcoin and Ethereum, peer discovery is needed when new nodes are trying to connect to the Harmony Network.
There are two scenarios. For initial connection, the new node has to use DNSfeed to get a list of IP addresses of well-known nodes to join the network. For re-join, the node may use a list of previously connected nodes to start.
To connect to peers, nodes have to do a “handshake”. The handshake is a PING message. The ping message includes the protocol version, IP address of the node, public key (256 bits of Hash value). The peer node responds with a PONG message to confirm the connection with requester and send back a list of other peers as well. The peer node may also send a PING message to create a connection with the new node. A question here is how to avoid every new node trying to connect to the same peer node. The answer is peer node may reject the connection after responding with the pong message if the peer connection exceeds a certain number, say, 20. Eventually, each node in the network will maintain 20 concurrent connections to neighbor nodes, and they form the p2p network. Another unanswered question here is how to manage the p2p network within one committee, instead of a global level p2p network.
Since Harmony is a sharded blockchain, new nodes will be placed into an active node list at first after finishing a PoS verification via the beacon chain. The verification process starts when a new node create a transaction to deposit some coins to system accounts. The transaction is routed to the beacon chain after the new node establishes the first connection. Beacon chain nodes run consensus and verified the transaction to approve it. Once the transaction is approved, the new node can join in the network in a waiting list. The beacon chain will run VRF/VDF to generate unbiased randomness in order to do committee configuration. The re-sharding algorithm will place new nodes into different committee based on the randomness and the Bounded Cuckoo Rules. There is a higher probability for new nodes to join the committee with more nodes according to the Cuckoo Rules, thus to avoid the attack on committees with fewer number of nodes.
For testnet launch, since we mainly care about spot instance offline and rejoin, we may implement a simple hash algorithm to place the node into committee. We will simplify the entire process. We will use a separate node to represent the beacon chain. The leader nodes launches and contacts the beacon chain. For the first N nodes contacting the beacon chain, they are used as the leaders in each committee.