Wednesday, January 25, 2012

A crazy probably not original idea--distributed assurance of service

Now that hosting is free, we can consider non-peer-to-peer ad hoc content distribution networks.

Facts on the table:
a) It is trivially easy to have a CONTENT HOST (blog or image or video host, or DNS, or Google Docs) with a programmable interface.

b) Each CONTENT HOST may store tens of GB of data, if not more, for free.

c) Bazillions of people operate or have named or anonymous accounts on bazillions of CONTENT HOSTs.

d) Forcing removal of content through regulatory means (like a DMCA takedown notice) requires costs and efforts at least linearly proportional to the number of sites over which a piece of content is distributed.

e) Any ORIGINAL CONTENT FILE may be losslessly divided into smaller files, or encapsulated into any other kind of file, with or without redundancy.

f) Any ORIGINAL CONTENT FILE may encode any other file of the same or smaller size--say, by XORing against a one-time padknown file or other similar two-way mathematical device--to create ENCODED PIECES.

g) A bit, or byte, word, dword, ... are not copyrightable material.

h) Collections of facts are not copyrightable material.

i) Algorithms are not copyrightable material.

Approach (probably not original):
1) Encode and divide an ORIGINAL CONTENT FILE into many smaller ENCODED PIECES. Spread the ENCODED PIECES over many CONTENT HOSTs in many jurisdictions.

2) In an automated manner, keep an INDEX of which ENCODED PIECES belong to which ORIGINAL CONTENT FILE and how the ENCODED PIECES have been distributed to which CONTENT HOSTs.

Spreading a 704 MB ORIGINAL CONTENT FILE, say a high-resolution image of the Constitution, over 1,000 images, blogs, story comments, etc. would prevent the ORIGINAL CONTENT FILE from being lost due to the failure of any small set of hosts storing ENCODED PIECES.

3) To retrieve the ORIGINAL CONTENT FILE, simply retrieve and assemble all the ENCODED PIECES using the INDEX.

4) Use ENCODED PIECES as one-time pads to encode a NEW CONTENT FILE--say, a video of mom making soup--recording in an INDEX how those new ENCODED PIECES mathematically relate to previous ENCODED PIECES.

5) Then distribute those new ENCODED PIECES in the same manner as above. This provides a limited ability to recover both the ORIGINAL CONTENT FILE and NEW CONTENT FILE should many ENCODED PIECES of the ORIGINAL CONTENT FILE become lost.

This requires keeping one or more INDEXes to track relationships among which ENCODED PIECES have been used to encode which other CONTENT FILES. This also requires using modestly more than the amount of storage than each CONTENT FILE to be stored: the existing ENCODED PIECES in the network, the new ENCODED PIECES created, the INDEX tracking their relationships, and some extra for redundancy.

This creates a situation with the following properties:
i) No ENCODED PIECE can be determined to be independently determined to be illegal to posses or distribute, because the meaning of each ENCODED PIECE depends on which of possibly several INDEXes refers to it. (An ENCODED PIECE may be simultaneously authorised and not authorised.)

ii) As long as the total amount of unique information in the system keeps expanding by (mostly) uniformly using existing ENCODED PIECES to create new ENCODED PIECES, information cannot be actively removed from the system. (Unlike most torrents which have no seeders by six months after release...)

iii) Deleting all ENCODED PIECEs of a CONTENT FILE is prohibitively expensive in the general case. (Removing the ability to regenerate a target CONTENT FILE would require deleting all other CONTENT FILEs whose pieces depend on the target CONTENT FILE's pieces.)

iv) If INDEXes of ENCODED FILEs were themselves encoded and distributed as above, INDEXES would be similarly resilient.

v) Free CONTENT HOSTs are individually relatively low bandwidth, but many CONTENT HOSTs in parallel have far greater bandwidth than any one CONTENT HOST alone. Of consequence, such a system could fall into general use to host content files that are not vulnerable to disappearance.

There are other potentially beneficial and harmful properties to this approach as well. For example, using ENCODED PIECEs as an input to encrypt data required to be stored under HIPAA or SarbOx or other legislation could result in interesting comparative red-taping...

Comments please.

No comments:

Post a Comment