CS580 File Sharing Project Protocol Description
Version 1.0
March 15, 2004
Contents
File Meta-Information
Messages
Sample Interactions
Modification History
This protocol description is loosely based on the Bittorrent protocol. Modifications were made to bittorrent to make a smaller system for course assignments. The modifications should not be considered improvements to Bittorrent,
The protocol uses a client and a server. The server stores files of interest. A client can upload files to and download files from the server. Clients do not talk to each other as they do in Bittorrent, which does limit the usefulness of this system.
File Meta-Information
Information about files is bencoded. The information is a bencoded dictionary, containing the keys listed below. All string values are UTF-8 encoded.
length: length of the file in bytes (integer)
name: the filename of the file (string)
pieceLength: number of bytes in each piece (integer). This should be a power of 2.
pieces: a list consisting of all 20-byte SHA1 hash values, one per piece (a list of bencoded raw binary strings)
md5Sum: a 32-character hexadecimal string corresponding to the MD5 sum of the file
keywords: list of keywords for the file, keywords are provided by a client when they submit a file to the server. (list of strings)
id: string given to the file by the server which is use to identify the file. (string)
Note that this is not the same as the BitTorrent file meta-information. When a file is transmitted between client and server (or peers in BitTorrent) it is broken in to small parts. To ensure that the data is transmitted without error and that the data has not been altered the file meta-information contains both the md5Sum of the file and SHA1 of pieces of the file. The SHA1 allows one to validate parts of the file. If one piece is invalid one does not have to download the entire file, but just that piece.
Messages
In general messages in this protocol have the structure of <length><id><payload>. Length is a number indicating the total length (number of bytes) of the id and payload, what is the number of bytes in the rest of the message. The length is a four byte big-endian integer. The id is a one byte integer indicating which message it is. The payload is message specific.
The protocol contains a sequence of bytes, which can be difficult to represent at text. This document will use #[ ] to represent a sequence of bytes, so #[5 11] will represent a two byte sequence, the first byte being 5 and the second byte 11. When a byte represents an printable ascii character the character will be used rather than the numerical value of the byte. A $ will proceed the character value to avoid confusion. So #[5 11 $a] will represent a three-byte sequence with the last byte being 97, which is the ascii value for the character a.
<length> number of bytes in this message. The value is 1 (for the id) plus the length of the version string.
<id> is 1
<version> A string representing the version of the protocol, for example: 1.0 or 2.34. This document describes version 1.0 of the protocol.
When a client connects to a server it first send this message to the server to indicate the highest version of the protocol the client supports. The server responds with a message of the same structure, indicating the highest version of the protocol equal to or less than what the client supports. This version of the protocol is what the client and server will use. If the client does not support that version of the protocol it will send the close message (see below) and close the connection. If the client sends any message before this, the server will respond with an error message and close the connection.
Example. Given the version number “1.0”, a handshake message is really #[0 0 0 4 1 49 46 48], but for readability we will write it as #[0 0 0 4 1 $1 $. $0]
The keep-alive message is sent to keep the connection open. The server can close a connection if it is inactive for a certain period of time. The client can send this message to maintain the connection. A keep-alive message in generally sent once every two minutes. There is no id or payload. The server responds with a keep-alive message.
Example. Using the notation described above the keep-alive message is #[0 0 0 0]
Sent by client to politely end the connection. A server should respond with an end-connection message and then close the connection.
Example. The end-connection message is #[0 0 0 1 2]
This message is sent the server when a client wishes to search for a file. <searchDictionary> is a bencoded dictionary. The dictionary has two possible keys: fileName and keywords. The values of these keys are used to search for a file. The value at the key fileName is used to search for files whose name matches the value. The value may contain the wild card character *, which matches any number of characters. The value at the key keywords is a list of keywords. The files that have all the listed keywords will match. The wild card character * can also be used in the keywords. If both the keywords and fileName are given both are used to identify a file, that is the file has to have the given name and keywords. X is the length of searchDictionary in bytes. The server responses with a searchResult message.
Example. A search request with fileName of “foo” is:
#[0 0 0 18 3 $d $8 $: $f $i $l $e $N $a $m $e $3 $: $f $o $o $e]
Replacing the characters with their byte value gives:
#[0 0 0 18 3 100 56 58 102 105 108 101 78 97 109 101 51 58 102 111 111 101]
Server response to a search request. <resultList> is a bencoded list of the Meta-Information for all files that matched the client’s search criteria. If no files match an empty list is returned. X is the length of resultList in bytes.
This message is used by a client to request a piece of a file from the server. <fileId> is eight-bytes long interpreted as a UTF-8 string. If the fileId is not eight bytes long it is padded on the left with space characters before adding to this message. <pieceIndex> is a four byte big-endian integer. This integer indicates the piece of the file requested. The first piece of a file has index 0. The server responds with a piece message if the request file and piece exist, otherwise it responds with an error message. It is the client’s responsibility to request all pieces of the file one at a time until it has all pieces needed.
Example. The following byte sequence represents a request for the first piece of a file with id “aaaaaaaa”.
#[0 0 0 13 5 $a $a $a $a $a $a $a $a 0 0 0 0 ]
This message is used to transfer a piece from the server to the client. It is sent to the client in response to a request message. <fileId> is eight-bytes long interpreted as a UTF-8 string. If the fileId is not eight bytes long it is padded on the left with space characters before adding to this message. <pieceIndex> is a four byte big-endian integer. This integer indicates the piece of the file requested. The first piece of a file has index 0. <block> is X bytes long and is the piece requested. If the piece SHA value does not match that of the SHA value given in the metadata for the file, the piece should be rejected. If the piece can not be attained with the correct SHA value the entire file should be discarded.
It can also be used to transfer a piece to the server as we will see below.
Example. Below is the byte sequence of a piece message for the first piece of a file with id aaaaaaaa. The actual piece is 256 bytes long and is shown as block.
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 0 <block> ]
This message is sent to the server by a client when the client wishes to add a file to the server’s file repository. <metaData> is bencoding of the dictionary containing the meta-information of the file to be added to the server. X is the length in bytes of <metaData>. The server with the have message. The have message contains the file id of the new file. A client can not add a file with the same name and keywords as an existing file on the server. The server indicates that metaData information is a duplicate by returning a have message with its bitString set to all 1’s.
This message is sent to the client in response to the uploadMetaData and piece messages. <fileId> is eight-bytes long interpreted as a UTF-8 string. <status> indicates the status of the file on the server. A value of 0 indicates it is a new file on the server. A value of 1 indicates it is a pending file. That is the file is does not have all its pieces. As a result it will not appear in any search request. A value of 2 indicates that the file is complete. This means that the file has all its pieces and is available for downloading by clients. <bitString> is a string representing the pieces of the file that the server already has. The string contains one character for each piece in the file. If the n’th character in the string is 0, the server does not have the n’th piece of the file. If the n’th character in the string is 1, the server does have the n’th piece of the file. So the string 010 indicates that the server has one piece (index 1) of three pieces of the file.
Example. Below is the byte sequence of a piece message for a file with id aaaaaaaa with three pieces, of which the server has only the second piece.
#[0 0 0 12 8 $a $a $a $a $a $a $a $a 0 0 0 1 $0 $1 $0 ]
This message is sent to the client on a error. The <errorMessage> is an UTF-8 encoded string that attempts to describe the error.
Example. The following represents the byte sequece of an error message with the <errorMessage> of “A foobar occured”.
#[0 0 0 17 9 '$A $ $f $o $o $b $a $r $ $o $c $c $u $r $e $d ']
The actual sequence of bytes is given by:
#[0 0 0 17 9 65 32 102 111 111 98 97 114 32 111 99 99 117 114 101 100]
Sample Interactions
Single session download. The client makes a request of a file named foo with id aaaaaaa and downloads it in one session. The file has three pieces.
|
|
|
|
|
|
|
|
|
|
|
|
|
#[0 0 0 18 3 $d $8 $: $f $i $l $e $N $a $m $e $3 $: $f $o $o $e] |
|
|
#[0 0 X X 4 <resultList>] |
|
|
#[0 0 0 13 5 $a $a $a $a $a $a $a $a 0 0 0 0 ] |
|
|
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 0 <block> ] |
|
|
#[0 0 0 13 5 $a $a $a $a $a $a $a $a 0 0 0 1 ] |
|
|
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 1 <block> ] |
|
|
#[0 0 0 13 5 $a $a $a $a $a $a $a $a 0 0 0 2 ] |
|
|
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 2 <block> ] |
|
|
|
|
|
|
|
|
|
|
Multiple session download. The client makes a request of file named foo with id aaaaaaa and downloads it in two sessions. The first session could end for a number of reasons: the user could quit the client application or the internet connection could broken. In the latter case the client could not send the end connection message to the server.
|
|
|
|
|
|
|
|
|
|
|
|
|
#[0 0 0 18 3 $d $8 $: $f $i $l $e $N $a $m $e $3 $: $f $o $o $e] |
|
|
#[0 0 X X 4 <resultList>] |
|
|
#[0 0 0 13 5 $a $a $a $a $a $a $a $a 0 0 0 0 ] |
|
|
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 0 <block> ] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#[0 0 0 13 5 $a $a $a $a $a $a $a $a 0 0 0 1 ] |
|
|
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 1 <block> ] |
|
|
#[0 0 0 13 5 $a $a $a $a $a $a $a $a 0 0 0 2 ] |
|
|
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 2 <block> ] |
|
|
|
|
|
|
|
|
|
|
Client Upload. The client uploads a file named foo which the server assigns an id of aaaaaaa in one session. The file has three pieces.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#[0 0 0 12 8 $a $a $a $a $a $a $a $a 0 0 0 1 $0 $0 $0 ] |
|
|
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 0 <block> ] |
|
|
#[0 0 0 12 8 $a $a $a $a $a $a $a $a 0 0 0 1 $1 $0 $0 ] |
|
|
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 1 <block> ] |
|
|
#[0 0 0 12 8 $a $a $a $a $a $a $a $a 0 0 0 1 $1 $1 $0 ] |
|
|
#[0 0 1 13 6 $a $a $a $a $a $a $a $a 0 0 0 3 <block> ] |
|
|
#[0 0 0 12 8 $a $a $a $a $a $a $a $a 0 0 0 2 $1 $1 $1 ] |
|
|
|
|
|
|
|
|
|
|
Modification History
March 15. Modified the format of the Have message. Added a status field.