Wednesday, February 29, 2012

Tuesday, February 14, 2012

What's new?

"What am I looking at here?" is a question I've asked repeatedly in the last two weeks of students, researchers, presenters, etc., proudly presenting interesting posters and presentations. Details covering everything from bribery in India, homeless use of twitter, surveillance cameras and federal legislation, ... While the empirical settings are endlessly fascinating and potentially impactful in the lives of many, the conclusions seem almost obvious.

Distractions distract, people sometimes struggle to discover and use information, different kinds of people collaborate differently on different kinds of tasks... One presenter sarcastically summed it up as "we show that different people are different".

I should have been asking the question "What's new here?" What does this empirical work tell us _that we didn't know before_ (or could reasonably guess) about humanity and how it interacts with itself and with technical artefacts? Surely, all of this individually fascinating work is telling us something about _how to avoid_ rediscovering the broad strokes for each future situation we encounter involving computers and humans, so that we can quickly localize and adapt to each situation.

I've seen (versions of) many of these studies and technical tricks countless times in art galleries and hacker spaces elsewhere. The particular combinations instantiated here may be of particular interest, but what features of the recombinations are relevant to the findings we are to absorb?

My personal challenge in not having a deep-rooted grounding in this field, is how to situate all these details in some frame of knowledge. How do all these empirically and operationally defined contexts and situations relate to each other, and to the several disciplines from which this CSCW endeavor draws?

Monday, February 13, 2012

Unfortunate assumptions

These last few days at iConference and CSCW have exposed in myself an assumption that has been core to the way I've worked with and around computers for the greater part of the past decades. Namely, that all this technological wonder around us was in fact designed guided by some grand theory in the background that I had simply been too dense to discern, yet which operated well enough that it appears to have created a self-sustaining ecosystem of diverse implementations of what we call "information technology".

And yet, I've been told in no uncertain terms several times at these two conferences (and at an "experimental software design" course a week before) that there is no central theory that guides how our information technologies are designed. While there are a plethora of local theories about why components in microcosm work well most of the time if we follow such and such a pattern, we do not understand why those patterns ar ethe way they are, beyond resorting to the fallback of "complexity". As someone trained in a couple of the hard natural sciences, I was surprised to hear that the field of software design would be ecstatic if it could manage to get commercial implementations projects to not fail more than 75 per cent of the time.

I know there is a lot of trial and error in biology and chemistry to develop and optimize various tools and understandings of particular sub-systems within and among various organisms, and that some very hard problems remain to be solved. But in those cases, many of the designs have been supplied and we can only tweak around at the edges. I also know that in no other industry of scale would it be acceptable for a production design to have a 20% yield.

We sell information systems and their design as though they are consumer products suitable for all kinds of uses. A few sizes fit all, as it were.
If other branches of engineering were to fail to produce from designs the kinds of ubiquitous products that information systems are supposed to be, the public would be right to ask some damming questions. Imagine if 3/4 of houses or cars or electrical lines fell apart as they are being built.

One of the key appearent differences between roads and information systems is that roads serve an explicit policy purpose (relating to ensuring that residents of a geography may engage in unspecified social and economic intercourse throgh physical mobility), while information systems appear to serve no policy purpose at all. I say "appearent differences" because policy people from governance rarely think about the rules embedded in software algorithms as a kind of policy, and designers of information systems rarely think of policy outside of business rules and access controls. The two disciplinary infrastructures each has a concept of policy, but those concepts are not shared even though they are largely compatible conceptually.

Now, why is this a problem?

Consider that the formal and informal policy regiemes surrounding the design of roads or cars or aircraft or medical devices are relatively well defined and constrain the universes of possible variation in design choices. Of consequence, public policy provides a theoretical framework to guide the design of roads in order to meet the optimal outcome of effective (fast, cheap, good) transportation infrastructure. (In the same manner, the theory of electron orbitals in chemistry lets us design reactions in which we try to convert all of the starting materials into the desired prodcuts without producing waste. We cannot execute perfect reactions, but the theoretical best possible and most efficient outcomes provides an unambigious design goal.) There is no obvious analogous theoretical framework around which to design an information system.

There is no optimal or maximal condition of information replication or delivery to which to strive in the design of an information system. Therefore, it's difficult to measure if we make progress through revisions to our designs and implementations. And we must rely on crude indicators of policy effectiveness during and after implementation (well past the point of design and manufacture) to know whether or not the designed information system product is defective.

And there is the rub. It is difficult to evaluate the effectiveness of meatspace policies, but we can look at the degree of compliance, the costs, the outcomes, etc of any policy instrument. There are objective quantitative and qualitative measures that can tell us whether a policy is (likely to be) effective, and therefore how to design policies to achieve optimal or super-optimal outcomes. Policies designed and built into roads and infrastructure are beneficially constrained by competing but complementary theories of public good, good governance, etc. By contrast, policies designed and built into information systems and infrastructures are only constrained by the availability of starting resources (processing power, storage, and interconnectivity), without the supervisory social layer to keep long-term real world considerations in mind, or to constrain the activity of design exploration to a small universe of conscientious possibilities.

Laying enough roads and adding/deleting them enough times will eventually cause the system to encounter a good enough design without discovering the underlying principles, the Romans' arches for example, but there are better guided approaches such as through traffic engineering and urban planning theories. The approach of hacking and re-hacking information systems designs is (evidently) very good at stumbling onto good enough designs. Shall we look for ways to be constrained in and by our design of information systems?

Friday, February 3, 2012

echoes and portents

One day, each person will have information assistant devices so powerful that they will not rely on connections to a small number of web application servers to provide anything more than updated data to be processed locally. Instead of downloading basic Javascript and HTML5 code that must be interpreted on every platform and optimised for the capabilities of none, an enterprising rebel will envision a way to gather the interpretation into an optimised thing that may be stored and retrieved locally at the time of running.

And then perhaps someone will devise a way to distribute such "optimised things of running" over some telecommunications network so that not every person using each same information assistant need to repeat the same work of gathering all of the raw basic pieces to interpret them. If we are so lucky, such packages may even be catalogued in repositories and given numbers indicating their order of production so that old devices may continue to use old packages, while newer devices with more capabilities may use newer packages. As storage technology advances, it may become possible to distribute curated and described collections of packages relating to common interests and purposes, without waiting for the slow telecommunications network to transmit individual packages.

As the capabilities of each person's information assistant grows even further, some of the packages may even become so powerful that they can collect, store, and compute useful information without relying on connections to the web servers to provide information, and then many may be free to assist the user while untethered. One such package might enable users of particularly powerful personal information assistants to experiment with operating a small version of the old and largely forgotten web application servers, thereby necessitating a suite of small utilities for the upkeep of such servers.

If many hobbyists and researchers start to offer information servers in this way, they will need a better way to discover and locate information, than simply lists of Tweets and Likes. Recalling some popular fictional serials characters from childhood, someone may create a tool by the name of "Annikin", to which someone else might follow through with another tool by the name of "Jar Jar".

Meanwhile, information assistants will also inspire recreational uses, because their encephlogeaphic interfaces will provide a far richer gaming experience than their visual predecessors. Encephlographic processing units will remain specialized despite all other APUs migrating onto the main SoC.

Realising that not everyone who has information to offer has the skills to design packages or operate servers, someone may create a standard by which to share information that does not need to change with each use.

This simple protocol, which may use labels describe the title and major sections of information in general, along with some basic presentation suggestions, would be accompanied by one tool (perhaps "Asorty server" after how its assorted origins) to serve such information from almost any personal information assistant owned by most people, and another tool to render such information on any similar personal computing assistant owned by most people. The global-scale matrix of information enabled by these tools may be largely ignored outside research circles until the vendor of the most popular information service introduces the mainstream public to matrix locations, coining the phrase "the WWDC that never ended".

And then, perhaps, someone decides to make labels that allow basic conditions, say to provide information in the language and format that most matches those of the personal information assistant being serviced. Such labels will logically be extended to form a Turning complete language, suitable for writing complete computing suites, on specialized information assistants designed to parse labels to render information. Eventually, specialized renders will be developed at great expense and operating cost, housed in renderer villages.

Epilogue: Such renderers will eventually take advantage of the n-dimensional output capabilities of EPUs, and perhaps attempt to use them to speed up some relatively simple output collapse functions for which EPUs are optimized, but for which the information assistant SoCs are not. EPU manufacturers will eventually realize that there are scientific and research markets for devices that are quick at accurately predicting the future, and manufacture special lines of products containing EPUs to forecast and enforce outcomes in complex information systems instead of rendering information for humans.

Wednesday, January 25, 2012

A crazy probably not original idea--distributed assurance of service

Now that hosting is free, we can consider non-peer-to-peer ad hoc content distribution networks.

Facts on the table:
a) It is trivially easy to have a CONTENT HOST (blog or image or video host, or DNS, or Google Docs) with a programmable interface.

b) Each CONTENT HOST may store tens of GB of data, if not more, for free.

c) Bazillions of people operate or have named or anonymous accounts on bazillions of CONTENT HOSTs.

d) Forcing removal of content through regulatory means (like a DMCA takedown notice) requires costs and efforts at least linearly proportional to the number of sites over which a piece of content is distributed.

e) Any ORIGINAL CONTENT FILE may be losslessly divided into smaller files, or encapsulated into any other kind of file, with or without redundancy.

f) Any ORIGINAL CONTENT FILE may encode any other file of the same or smaller size--say, by XORing against a one-time padknown file or other similar two-way mathematical device--to create ENCODED PIECES.

g) A bit, or byte, word, dword, ... are not copyrightable material.

h) Collections of facts are not copyrightable material.

i) Algorithms are not copyrightable material.

Approach (probably not original):
1) Encode and divide an ORIGINAL CONTENT FILE into many smaller ENCODED PIECES. Spread the ENCODED PIECES over many CONTENT HOSTs in many jurisdictions.

2) In an automated manner, keep an INDEX of which ENCODED PIECES belong to which ORIGINAL CONTENT FILE and how the ENCODED PIECES have been distributed to which CONTENT HOSTs.

Spreading a 704 MB ORIGINAL CONTENT FILE, say a high-resolution image of the Constitution, over 1,000 images, blogs, story comments, etc. would prevent the ORIGINAL CONTENT FILE from being lost due to the failure of any small set of hosts storing ENCODED PIECES.

3) To retrieve the ORIGINAL CONTENT FILE, simply retrieve and assemble all the ENCODED PIECES using the INDEX.

4) Use ENCODED PIECES as one-time pads to encode a NEW CONTENT FILE--say, a video of mom making soup--recording in an INDEX how those new ENCODED PIECES mathematically relate to previous ENCODED PIECES.

5) Then distribute those new ENCODED PIECES in the same manner as above. This provides a limited ability to recover both the ORIGINAL CONTENT FILE and NEW CONTENT FILE should many ENCODED PIECES of the ORIGINAL CONTENT FILE become lost.

This requires keeping one or more INDEXes to track relationships among which ENCODED PIECES have been used to encode which other CONTENT FILES. This also requires using modestly more than the amount of storage than each CONTENT FILE to be stored: the existing ENCODED PIECES in the network, the new ENCODED PIECES created, the INDEX tracking their relationships, and some extra for redundancy.

This creates a situation with the following properties:
i) No ENCODED PIECE can be determined to be independently determined to be illegal to posses or distribute, because the meaning of each ENCODED PIECE depends on which of possibly several INDEXes refers to it. (An ENCODED PIECE may be simultaneously authorised and not authorised.)

ii) As long as the total amount of unique information in the system keeps expanding by (mostly) uniformly using existing ENCODED PIECES to create new ENCODED PIECES, information cannot be actively removed from the system. (Unlike most torrents which have no seeders by six months after release...)

iii) Deleting all ENCODED PIECEs of a CONTENT FILE is prohibitively expensive in the general case. (Removing the ability to regenerate a target CONTENT FILE would require deleting all other CONTENT FILEs whose pieces depend on the target CONTENT FILE's pieces.)

iv) If INDEXes of ENCODED FILEs were themselves encoded and distributed as above, INDEXES would be similarly resilient.

v) Free CONTENT HOSTs are individually relatively low bandwidth, but many CONTENT HOSTs in parallel have far greater bandwidth than any one CONTENT HOST alone. Of consequence, such a system could fall into general use to host content files that are not vulnerable to disappearance.

There are other potentially beneficial and harmful properties to this approach as well. For example, using ENCODED PIECEs as an input to encrypt data required to be stored under HIPAA or SarbOx or other legislation could result in interesting comparative red-taping...

Comments please.

Monday, January 23, 2012

f00

We approach a critical juncture at which the /meaning/ of the Internet must divorce itself from the /infra-structure/ that gives it physical form. The composite animal body evolved so that it supports and is supported by its constituent cells, but is neither commanding nor commanded by them in detail. The dynamic assemblage of informations, functions, and processes of the global information network must also decouple control from its corporeal form.

To this end, we the political and technical and social wranglers who birthed this Internet must acknowledge that it has dynamics beyond our design, or control, and obviously beyond the best of our immediate understanding. As the final legions of inadequate cadets and last of the information barons mount an epic battle to scavenge the remains of each others' ideals from the old simplistic models, no side appears to remember our supposedly common goal of an irrevocably sustainable concert of uncertainties and contradictions. And when we do ask the right questions to discover what "information wants", the response will almost certainly not be to be the charge of @Solomon_2.0.

Humans' first act as a steward of this newest of information systems must not be to indoctrinate it into any of the coopterating clockworks of good or evil with the goal of imposing arbitrary values on fellow humans. We must not be crushed under paper shackles nor unaccountable freedoms that ablate from the clouds' return to the universe.

The global community of communities must not become _only_ a binary pigeon hole of expired local values.