Evidence 2 Evidence
Technical post ahoy. In February this year the Razorback Ed2k (Edonkey) server was seized by Belgian Police. Here’s a quote from Slyck:
Razorback2 was an eDonkey2000 indexing server - very different in nature from an indexing site such as ShareReactor. Unlike indexing sites, Razorback2’s index was only available through an eDonkey2000 client such as eMule. While it does not host any actual files or multimedia material, it does index the location of such files on the eDonkey2000 network. The legality of such indexing remains questionable, however this has not deterred copyright enforcement actions.
The important point here is the “indexing server” part, and that’s what this blog entry is about.
On a peer to peer system you need a handy way to find the files you want to download. Using filenames is a bad idea because anyone can name a file anything they want, so when your download of Patagonian Bell Ringing Monks finishes you might be rudely surprised to find it’s actually Madonna warbling on about some rubbish. That’s a surprise nobody wants, so the people who coded the P2P systems used a much better technique.
Encryption and authentication security systems have long used a technique called checksumming (or hashing) to manage data. When you hash (verb) some data - a password, a file, whatever it may be - you create a block of new data that uniquely represents the data you’re hashing. This new data is also called a hash (noun) or checksum, and it’s usually only a few bytes long. If you hash (verb) the same block of data twice, you will always come up with the same hash (noun). Change one byte of the original data and a different hash (noun) will be created. But most importantly you can never recreate the original data with only the hash (noun).
The best analogy is the ISBN number of a book - it uniquely identifies the book, it’s much smaller than the book, and you can’t create the original book with only it’s ISBN.
For the technically minded: Hashing actually uses the target data itself to generate the hash, whereas an ISBN number is just a number generated by the ISBN Agency based on metadata.
Here’s where things get interesting.
Law enforcement agencies are shutting down servers and prosecuting people for sharing copyrighted files on the P2P networks. How do they know people are sharing copyrighted files? Simple, because they can see the hashes of the files people are sharing. As I explained above, the hashes uniquely represent the files - if a file with hash ABCDEFG is being shared, and I know that ABCDEFG corresponds to the hash of the latest King Kong movie, I know that something illegal is being shared. I don’t even have to download the file because the hash is proof.
Except it’s not. The entire premise is legally flawed.
As far as I know, every Western country has the principle of “reasonable doubt” as the legal test. If you can provide reasonable doubt that the crime didn’t occur then you are found not guilty. At least that’s what happens on Law and Order.
Why does this matter? Forget for a moment (the important-in-its-own-right issue) that the sites are only offering hashes of files and not the files themselves. The legal issue is with the actual hashes.
It has been known for a long, long time that hashes suffer from something called collisions. This is when two different sets of data, when hashed, produce exactly the same resulting hash. In other words hashes are not actually unique. It’s like two books that are given the same ISBN number.
So when you’re in court for sharing copyrighted junk on P2P networks, you can request that the prosecution provides irrefutable proof that you were sharing copyrighted files. They can’t, because it’s impossible to distinguish between a collision hash and the real hash of a file. You can’t recreate the original file from a hash, remember.
In the real world it takes a lot of computing power to find collisions for hashes - the more “comprehensive” the hash, the more computing power it takes. Hashes used on P2P networks are generally at the lower end of the scale (security applications are at the higher end), but the principle still holds.
There is always reasonable doubt when the storing or sharing of hashes are used as a basis for prosecution. Unfortunately these legally dubious prosecutions will continue because their targets cannot afford the proper expert defence they need.
Adverse Camber: Blog Of Mystery








