Decentralized Torrent storage in DHT / Habr

The DHT system has existed for many years now, and torrents along with it, which we successfully use to get any information we want.

Together with this system, there are commands to interact with it. There are not many of them, but only two are needed to create a decentralized database: put and get. This is what will be discussed below...

Logically, everyone can understand. Put is to put. And Get is to get. So you can reliably put up to 1000 bytes of any information with the command Put. And reliably this information will be stored in DHT for about one hour. Get is to get what you put in. It's simple.

The Put command has two types. The first is put without the possibility of change. The second is put with the ability to change. This means that you can put 1000 bytes into DHT and change them as you want.

To put or change the mutable data needs 2 ed25519 keys. A public key and a private key. And everyone who has these two keys can change the data as he wants.

remind how private and public keys work

Information can be encrypted with a private key and decrypted with a public key. That's why these keys are paired. It is impossible to guess the private key with the public key.

On this it is possible to build a decentralized database

There are a lot of variants. It's a matter of your imagination. Consider, for example, these:

Variant 1.

All users = peers have a public and a private key.

User 1 wants to change the database. He gets the contents of the DHT using Get and the public key. In the received data, it can be anything. For us it will be the sha1 link of the torrent where the torrent can be downloaded. It will be about 20 bytes in size. Using this link, it downloads the torrent in which the database is located. Modifies the database. Generates a torrent (obtaining a sha1 key) and distributes it. Publishes with Put obtained sha1 key, which can be used to download a new, modified database.

User 2 wants to change DB.... in the same way...

To keep the data in DHT up-to-date, it needs regular synchronization of peers. Synchronization is Get with public key via udp. It's very low overhead. And if the data has changed then download it. This is a little more expensive, but here you can also put a limit on the download speed, for example.

If the data in the DHT is missing, then the peer which wants to sync just does Put with the last data it has, or with clean data.

Now a little bit about why it's a database and not a file exchange

You can have anything you want in a 1000 bytes. And there is an infinite amount of information in the database being transferred. Since the user doesn't know the public and private key, he can't publish anything himself. Only the program can do that. And the program, depending on the data in those 1000 bytes, can do anything. It can get the user's block and set it to read-only mode. It turns out it is a database with access rights to change certain information. Especially, it will be like a DB, if downloaded data is encrypted and stored in one file with no extension.

Of course, there may be some problems here, but all of them can be solved.

Now to the disadvantages and problems

For example, if user 1 publishes a new hash of the modified database, but does not distribute it. In DHT you can store 5 sha1 records, it is 100 + bytes in total and if any user could not download the modified version, he downloads the next of 5 and then republishes the one that is downloaded. Correspondingly, other users will not try to download what is not downloaded and get the DB faster. Than more users, than faster and better the database will work.

The main problem of this system is the waiting time. Publication (Put) takes from 20 to 60 seconds + time for someone to download this data. If the user does not turn off the program, it is only 20-60 seconds. But receiving data as it is torrents - is done at maximal speed of device with use of many torrent technologies and protocols. Downloading only modified data? I think it is possible, but don't ask how.

The second important problem is that each user has a program in which the public key and the private key are stored. And if you crack the program, you can get these keys and use them to publish your false data. Here you can make up a lot of ways to cope with it. Perhaps the comments will suggest some more. But for example: you can store the key in encrypted form and decrypt it only when you use it. It is possible to encrypt the database with a separate key, so that an attacker needs to calculate this key too. It is also possible to store the keys in DHT and change them periodically or at the site. There are as many ways as your imagination dictates. Therefore, it is impossible to say for sure that this is a vulnerability.

Variant 2.

A user generates a couple of keys. A private one and a public one. Contacts with the manager (human) and exchanges public keys with him (e.g. via Bluetoth, Wifi). Then the user has a public key to get the database and his own pair of keys to publish the changes in the database. User publishes the changes to the database with his key pair. The manager polls all the user's public keys. Receives the users' changes, adds them to the database and publishes them. The users get a new database and so on in a circle.

This system can't be hacked, because the keys are not stored in the program, like in the previous example. But there is a small disadvantage here too. The number of users that manager needs to poll affects the speed of updating the database. However, not significantly. You can poll about 1000 users per second and get the answers within 30 seconds, as usual. I think realistically polling 1000 users (=1000 UDP requests) will take about a minute. To increase the speed, it is possible to think about a distributed database. For example, each manager will have 1000 users and managers can exchange public keys for information exchange (between databases) and publish changes for another manager under a special public key. In this way a loss of performance due to the number of users will be avoided.

This results in a decentralized, but manager-managed database.

Such a system can already be used even for billing. The difference from blockchain systems here will be that the manager can manage the contents of the database. That is, if users use the database to store money, they must trust the manager to keep transactions secret. There can be more than one manager to be sure. The more of them, the faster the database will be updated and the more reliable the data in the database will be, but also the more people who know about the transactions. However, instead of a manager can be just a computer, which everyone trusts :)

Since all published data are public, the same public and private keys for text encryption can be used to transfer transactions from users to the manager.

To increase the trust of users to the manager - it is possible to use a closed system of economy, in which at the transaction of one user's balance is reduced, and the other is added and thus the sum of all money is always equal to zero. This would be a great substitute for unmanaged blockchain systems. Where the money is gone forever and you can't undo the transaction, especially if it's a robbery.

Variant 3.

It is possible to make a decentralized internet.

For example, if managers will store DB not with billing, but with DNS names and public keys. Then users can add their own DNS names and can download the database with DNS names or access the database via a public ip.

And then, opening the site, users will refer to the public key of the site, which will receive information on how to download the contents of the site. They will download the site and will be able to get site updates about once every half a minute. At the same time, the browser can send information to the site's server using the public and private keys that will come with the site. There will be a decentralized two-way communication, the speed of which will grow with the number of site servers and users, because all will distribute the content of the site. And receiving changes to the site is possible in file-by-file mode.

To work such an Internet it is necessary to develop a special torrent-browser and server part.

Technically it is possible to do this based on any torrent library. For example Libtorrent. It weighs only 2.5Mb after compilation, it is written in C++ and works as fast as possible. There is some technical information about Put.

A similar system is used in my "Media Library" app, for publishing playlists. I already have even an admin interface for moderation. Everything works successfully. Enjoy it!

Because the karma system on the site does not work properly, I can't comment on my own articles. Therefore, you may consider that comments on the article are disabled. Write your questions in personal messages. There, perhaps, I will answer.

Decentralized Torrent storage in DHT

On this it is possible to build a decentralized database

{{ titleHtml }}

{{ titleHtml }}