Breaking changes to extend support for other networks and address formats

ol · Dec 11, 2018

I decided to continue work on modular network backend for Dash Core software. The idea is to provide modular framework that allows Dash nodes to communicate using different network protocols. New protocols can be added as modules for this modular framework. The initial goal is to move current TCP-based protocol into a module and then write a module for I2P, an anonymous overlay network.

I've prepared a pre-proposal for Dash governance system where I describe rationale for this project in more details. But to proceed further with this project some possibly breaking changes are needed for Dash protocol to support address formats other than IPv6 address and port that is currently used. For example, I2P address is SHA2-256 hash.

So, I would like to start a discussion about what exactly these changes are and whether Dash Core team is willing to accept these changes to Dash Core software.

I've identified the following places in Dash protocol/software where TCP address/port is serialised/deserialised. Please let me know if I missed something.

"version" message.
It contains two addresses:

address of message receiver as seen from message sender; this is used to adjust scores of externally visible node addresses to decide which ones to advertise;

address of sender; this is not used anymore because it can leak private IP addresses; zero address (::) is always sent.

Format of this message can be easily extended in compatible way. As this message is used to advertise an address of an endpoint of a connection that was already established using some particular network backend, we can use address format specific to this network backend. If connection is established using existing TCP-based protocol, addresses have the same TCP address/port format as they gave now. For I2P addresses will be 256-bit SHA2-256 hashes.

"addr" message.
This message is used to advertise addresses of other known nodes.
It contains number of addresses (in compact size format) followed by address records.
Each address record contains:

timestamp;
bitmask of flags describing services provided by the node;
128-bit IPv6 address;
16-bit TCP port number.

Format of this message needs to be changed to accommodate network backend label and backend-specific address. See below for discussion of universal address serialisation format.
I see two possibilities here:

just change address format to universal one starting with some protocol version and read address in old format from nodes that advertise older protocol version;
add a new message type for advertising extended addresses.

I'd prefer first solution, but I'd like to know if there are other opinions.

Masternode addresses.

Currently there is a requirement for a masternode to have public routable IPv4 address. There are multiple places in the protocol where this address is sent (but still in IPv6-mapped format). I don't expect this requirement to be lifted right now, but it would be beneficial to be prepared for further extension. Masternodes that have multiple addresses in different networks will significantly improve resilience of Dash network.

It makes no sense to change address format in legacy code that implements pre-deterministic masternode functionality, because it will be removed soon. Hence, I'd like to concentrate on functionality that implements deterministic masternodes. And there is one big place where masternode address is used: TRANSACTION_PROVIDER_REGISTER transaction type.

Currently TRANSACTION_PROVIDER_REGISTER transaction contains a single IPv6(-mapped) address and TCP port. This can be extended to contain a vector of addresses for further extension. But currently a check enforcing that this vector contains a single IPv6-mapped IPv4 address can be added.

I see the following possibilities to introduce extended address vector to TRANSACTION_PROVIDER_REGISTER transaction.

switch format with a bump of transaction version;
introduce another transaction type.

I'd prefer first solution, but I'd like to know if there are other opinions.

File formats.

There are several data files that contain addresses. Format of these files have to be extended in incompatible way. Files with serialised data have a version in their header, so a new version has to be introduces. After software upgrade, file of an old version will be read in compatible way, but it will be saved using new version and extended addresses.

These are the following files.

banlist.dat — banned addresses and subnets;
peers.dat — known nodes;
mncache.dat — known masternodes;
netfulfilled.dat — fulfilled synchronisation requests; this file contains no version; it can be probably just removed on upgrade.

There is also evodb file, but it's in leveldb format. Probably, new keys should be added to this file on upgrade.

Universal address serialisation format.

The whole idea of modular network backend is to be able to have multiple backends and be able to add new backends without bumping protocol version number. Hence, it's possible that some nodes will announce addresses that other nodes don't understand. In this case, nodes should just ignore addresses they don't know how to handle (but they may still relay these addresses in "addr" messages).

I propose a universal address serialisation to be in TLV (type, length, value) format.

Type is a numerical label that is assigned to network backend by a central registry (Dash Core team). Size of this label is up to discussion, but I think that 32 bits should be enough. There also should be a range allocated for experimental (unstable) extensions.
Length is necessary because a network backend label (and address size) can be unknown to a node, but it should be still able to handle it. Again, size of this field is up to discussion. Probably, 16 bits will be enough. Or, even better, compact size format can be used: it occupies 1 octet for sizes up to 252 octets (2016 bits).
Value is raw backend specific data. For TCP-based backend it will be 18 octets containing 128-bit IPv6 address and 16-bit TCP port. For I2P backend it will be 32 octets containing SHA2-256 hash.

Concusion.

As you see, changes required for modular network backend are not so dramatic, and they can be introduced in compatible way to not break old nodes.

I would like to hear from Dash Core team whether there are any objections to introducing these protocol changes and to accepting pull requests implementing them.

You can watch the current state of my work here:
https://github.com/OlegGirko/dash/commits/modular_net_backend
But be careful if you check out this branch: I'm going to rebase it a lot before submitting pull requests.

Update 1. Corrected stupid mistake about estimated size of length field of universal address serialisation format.

ol · Dec 14, 2018

Following up on questions at Discord regarding using libp2p as network backend and multiaddr as universal address serialisation format.

I see the following problems with using multiaddr as address serialisation format.

It was not as thoroughly designed as necessary. For example, extensible data exchange format should allow to find out sizes of unknown pieces of tagged data to skip them safely, whereas multiaddr tags provide no length information. The only way to know length of data following each tag is prior knowledge from reading description of all tags registered in CSV file in GitHub repo. Hence, multiaddr binary data can not be used without additional length field, bringing us back to my proposal of TLV (type, length, value) format for addresses. This allows to allocate specific type for multiaddr binary address if necessary.
It has a different purpose than we need. The main use of universal address serialisation format is to announce endpoint addresses of known peers, whereas multiaddr specifies the whole path used to connect. A node endpoint address is just IPv6 address with TCP port number (or I2P hash), and it remains the same whether we connect there directly or through SSH tunnel over Tor.

I see the following problems with libp2p.

There is no viable C++ implementation of libp2p, so we can't use it for Dash Core software right now. Implementing libp2p would require much more effort than writing network backends as in my proposal. However, the framework I'm proposing allows writing a backend for libp2p in the future once libp2p is implemented in C++.
From what I learned about libp2p so far, it's more like generic interface to different protocols for transport, routing, peer discovery, tunneling etc. Dash doesn't need peer discovery and routing at all: all messages are broadcast, so libp2p is too redundant for Dash purposes. The purpose of "addr" messages is not peer discovery, but rather simulating multicast inside of unicast network. If IPv4/IPv6 network had working multicast, there would be no need to announce peers using "addr" message at all.

Breaking changes to extend support for other networks and address formats

ol

New member

ol

New member