Have you ever wanted to share files at light speed but did not find such a service? No!? I mean... really!? Is your bandwidth too low? Well, too bad! That's what MP2P aims to do.
MP2P is an online object storage. It aims to enable you to upload and download your files by harnessing all the bandwidth you have, this way you share your files faster.
Objectives of MP2P are as follows:
- Faster transfers
- Being redundant
- Being scalable
- Implementing a custom protocol
- Provide easy-to-use binaries
How does it work?
MP2P is divided into 3 applications:
- The "Storage application" which store some chunks of each file
- The "Master application" which handle download/upload requests and store metadata in a database
- The "Client application" which is used to upload and download files
On MP2P, each file is stored as multiple chunks. As we want to be redundant and blazingly fast, we put multiple replications of each chunk on different storages. Hence, when the client app. uploads and downloads, it opens as many connexions as it needs to store and get the file's chunks, consequently it uses its whole bandwidth. We consider that MP2P has enough servers to handle high bandwidth connexions.
When a client wants to upload a file to MP2P, the client app. asks the master app. how many chunks this file must be split into and where to upload each of them. Those chunks are then uploaded to multiple storage apps. When a storage app. successfully received a chunk, it notifies the master app. which updates metadata about this file and notifies the client that the transfer was fine.
When a client wants to download a file from MP2P, the client app. asks the master app. for file's chunks locations and download those from corresponding multiple storage apps.
In order to realize this project, we needed to create our own protocol. Also, we needed to store metadata about each file consisting of chunks locations, the hash of each chunk (sha1), file name and file size: we needed a database! We choose Couchbase to try out a NoSQL database and because it is providing a really simple way to do database replication. We choose to use C++ for the network (with Boost::asio) and hashing performances (using OpenSSL/sha), but also because we were all comfortable with it.
Here are the details about the protocol and database usage depending on file size (with replication = 3) :
How did we test?
As being simple students that do not have access to powerful machines with Gigabit Ethernet, we thought about an easier solution which only consists of Raspberry Pis as Storages servers. Take a simple Raspberry Pi v1 or v2, the maximum bandwidth you can get is limited to approx. 45Mbps. So now let's say we have 5 of them, if MP2P is working well, then we should get 5*45~=225Mbps. We also need a Gigabit switch, a Master server with the Database (we used one laptop) and a client (we used another laptop). We used one laptop as the network router.
As expected, results showed a real improvement in transfer speed:
We also monitored the client app's CPU & RAM usage, the laptop (i7-3632QM) was running it inside an Archlinux virtual machine:
We also discovered that when we were using MP2P with a small file, it is not as fast as with a larger one. This is simply due to the multiple requests needed to create and retrieve metadata from the master app.:
What improvements could be made?
The development is currently stopped, but here are a few things we thought could be implemented to improve our project:
- Compress transfers and store chunks
- Use a Distributed Hash Table (used by torrents)
- Use SSL for transfers
- Add authentication
- Replace our handmade protocol by Google Protocol Buffers
The programming language used is C++ 14. We like C++ for its performance and we’re always looking forward to learning how to write clean C++.
The Boost.Asio library is the second most-used tool for MP2P. Boost.Asio allows us to use a C++ approach for networking. This means, writing clean code, and keeping the performances that we could have achieved using C.
Couchbase: a NoSQL database, featuring a Master-Master duplication.
Catch: our unit-test framework.
Cmake: our build system.