Moritz, Benni and me attended the lecture Telematics last in winter semester 2012/2013 at KIT. Students could attend in labs, to define a network protocol and then program a client, that implements that protocol. We have chosen the P2P file sharing lab.
The basic idea is, that multiple teams, each of three people, think of a draft for a protocol. After that, the teams have to discuss a common protocol. Each team develops a client, based on that protocol. The clients should work in an interop-test at the very end of the lab.
All teams agreed on the scenario, where there are clients in the same broadcast/multicast domain. It’s likely that the set of available files changes frequently, like at a LAN-party or a lecture, where users want to exchange without any configuration. We called the protocol Buschtrommel (German word for bush drums) and our team implemented the Bongo client.
I want to describe the protocol itself as well as the discussion with the other teams. The protocol itself can be found in the Git repository as xml and here as txt. There are quiet some errors, which no one wanted to fix, after the project was closed :-(
We decided to use the simplest approach to discover hosts. We use a dedicated UDP port (4747) to send HI and YO messages. These messages include the nick and a TCP port, on which the client handles file requests. Each new client sends one HI message to the multicast group. Clients respond with a YO message, typically via unicast. It is possible to send the YO message to all other clients (multicast) to send some kind of “I am alive” signal. There is a BYE message, which invalidates all offered files of a client.
There were some hot discussions (shit storm included), whether we should go for the push- or the pull-way.
Alternative A - Filelists
Most teams in our lab proposed the idea of filelists. A filelist describes all files, that are offered by one client. Clients can request the filelist of other clients in order to discover available files. This requires clients to pull the filelist frequently of all clients in order to provide the most recent files. Different mechanisms were described, to pull only filelists, if it was changed after a certain timestamp or diff-versions of filelists would be possible. Most students did not have a high courage to implement (so called) complicated features :-/ They just wanted to get some credit points for the lab, without investing so much time.
Alternative B - File Announcements
Our team purposed a a different way, which scales better after all clients are online: As soon as a new file is available, the offering client sends a UDP multicast message (file announcement) to the group. Thus we have only one message in the whole network, in order to inform all clients of the update, in contrast to |number of clients| * filelist-pull. One problem is, that a client will blast a lot of update messages, when it comes online. To avoid this, we suppress these announcements and broadcast only changes of the initial set. Other clients have to pull this initial set via TCP (filelist). Clients should send a filelist request to every client that comes online (indicated by a HI message).Files can be announced for a specific time (e.G 3 minutes), whereas this time can be set to “infinity”. A file announcement includes the SHA1-hash of the file, an alias of the file, so that the user can identify the desired file, the size of the file as well as a meta field (not used).
Filelist as well as file transfers are done via TCP. Every client chooses a (random) port at it’s start and announces this port via the HI message. Other clients open a TCP connection to this port and request either a file or a filelist (the filelist is just a concatenation of all file announcements that are currently valid). File requests do contain the SHA1-hash of the requested file, as well as an offset and the length that is requested. By this we can easily implement a multi source transfer (requesting client can identify multiple sources of a file through it’s hash and request different chunks from different clients).Every host (client that offers the file) response with a status code (OK, temporally failure, permanent failure), followed by the expected number of bytes, that will follow. After that follows the requested content. Client and host can (but should not) abort the transfer at any given time. The host is stateless and don’t has to store some data in order to provide a resume, because the client can choose a different offset to provide a resume of the transfer.
- IPv4 / IPv6 dual mode: We are unable to detect, which IPv4 address belongs to which IPv6 address, so hosts appear twice, if they use both IP versions
- no checksums for partial transfers. The host should send a hash of the chunk that is sent, which is different, if only a part of a file is requested. If a multi source download fails, the downloading client does not have any clue, which part contains the error.
- Possible loss of HI message leads to unavailability of files, because no TCP port is known to request the file.
The Bongo Client
We developed a client using java, because we were used to it (except the GUI, as you maybe can see at the desperate commits in our repository).
We decided to strictly distinguish between the implementation of the Buschtrommel protocol and our client. So we created the Buschtrommel class, which serves as a facade for all calls from the GUI to the protocol layer (like create a new share, start a download, …). The protocol access the GUI through callback methods that are defined in an interface.
The whole project is a MVC project, whereas models and controllers are provided by the protocol layer.
The Buschtrommel class creates instances of NetCache and ShareCache. Both represent states of the whole system: NetChache has knowledge about all clients that are online an which file is offered by whom (different file offers can have different TTLs and different display names or meta data). ShareCache stores all shares, that are offered by the client itself. We have two different network adapters. One for UDP and one for TCP. The TCP network adapter handles incoming file requests independently, because it has a reference to the ShareCache do check if a requested hash can be provided or not. The UDPAdapter creates message objects, that are handed to different observers (Observer pattern). By this we can create a chain of callbacks to the GUI to update the interface in realtime.
We have a class for every different kind of message. Every message object can serialize itself to it’s string representation to be sent over the network interface. A static class, the MessageDeserializer takes these strings to reconstruct message objects. All encoding decisions as well as changes of object-to-bytestream mapping can be changed by exchanging these Serialize / Deserialize methods.
A Share instance represents a file, that is either offered by the client itself or by other clients. One Share can be offered by multiple hosts with different TTL_s and different _displayName_s, so we created the _ShareAvailability class, that holds these fields for every host-share-tuple. We store all sources (a.k.a. hosts) for every file and we store a list of all offered files of every host. Thus the GUI-designer can whether he wants a file- or host-centric approach.
Benjamin developed a nice GUI with Swing, which caused a lot of pain an frustration, because no one of us ever made a real GUI ^_^
I want to add, that the DMG file is a trial version of final cut pro :-P
Users can choose between IPv4 and IPv6 (or both) and set some other settings, like the download path for incoming downloads.
Users can create shares, to offer files to other clients. After the user clicks on the activate button, a FileAnnouncementMessage will be sent.
Other clients can see all activated shares of other clients and start a download.
File transfers can be aborted by the user at any time.
The host of a file can abort the file transfer as well.
We replaced the GUI with a simple class, that starts a new download, for every unknown hash, that flies into the NetCache. After a successful download, the file is re-shared. With a few lines of code, we have a Mirror bot :)