limitations of taxonomy

Posted by hyperradix
on Saturday, January 26

This elliptical rant about a failed taxonomy for computer users gets me thinking. We (as in, those of us who have been exposed to Western metaphysics) have noted the failure of taxonomic structures for a long time now. While it *is* sometimes useful to see things in terms of hierarchical relationships, this is likely a relic of our primate ancestry, and it is clearly a kludgy shortcut in terms of understanding the universe.

I would say that Linnaeus’ attempt at classifying all living things unravelled completely when the structure of DNA was finally elucidated, and when the mechanism of replication was proven. From this foundation, molecular biology was born. We quickly proved quite conclusively that Linnaeus’ classification system—based as it was mostly on macroscopic observations—was grossly inadequate for trying to figure out how species were actually related to each other. And while parent-child or parent cell-daughter cell relationships are easily modeled as hierarchies, the later part of the 20th century showed what an inadequate simplification this was with regards to the transfer of genetic material. (We even learned that the so-called Central Dogma of biology was not the whole story, in the horrific manner of the AIDS epidemic.)


But the taxonomic Tree of Life persists because it does partially model the universe at large, specifically, the fact that multicellular organisms come from other multicellular organisms. In theory, there is a line of descent that can be traced all the way back to the first primordial cell. (At least, this is the mainstream theory that I learned as an undergrad. There are at least some people who are thinking about the possibility of life arising multiple times, and then crossing to form hybrids. This is, for example, an alternate theory for why eukaryotes and prokaryotes are different. And, with regards to hybrids, consider the endosymbiotic theory in this light.)

Most other hierarchies are artificial constructs created by human beings. As I mentioned, this is likely a relic of a primate adaptation. Hierarchy is how we outsmarted the lions and cheetahs on the savannah. It allows us to weld together into a cohesive unit and coordinate action (which frequently precludes independent thought—hence the notion that ”a person is smart, but people are stupid.”)

But in a post-agricultural, post-industrial, and post-informational world, I would argue that the utility of hierarchy—in terms of organizing human behavior, and certainly in terms of organizing information—is greatly diminished in importance compared to 100,000 years ago.


The Beez points out that the Internet is in many ways an anti-hierarchy. Not completely, because it relies on the DNS hierarchy, but the idea is that every network on the Internet is co-eval. There is no head (excepting the DNS hierarchy, but that’s why you should learn the IP addresses of your favorite sites) that you can decapitate to make it all stop. Even if you took out several large server farms, a lot of net traffic would continue unimpeded, routing around the damage.

In one of the few actual scenarios in which the free market may have had a hand, Internet users chose the non-hierarchical HTTP protocol rather than the hierarchy-based Gopher protocol.

Information doesn’t obey hierarchies, plain and simple. Hyperlinks are the way to go. Having to navigate up and down a taxonomy is painful.


But this is really not that surprising. Despite our primate heritage, and despite the fact that the brain does appear to have something of an organizational hierarchy and structure, nuclei and ganglia function more like networks on the Internet. The “lower” parts of the brain (the brain stem, the midbrain, the pons) can preempt the “higher” parts of the brain (the cortex.) But if you train yourself (like a Shaolin monk, for example), you can actually get your cortex to manipulate your autonomous nervous system.

The non-hierarchical nature of the brain is most dramatically displayed when someone suffers a stroke. While massive strokes will permanently incapacitate and possibly kill you, a lot of stroke patients can regain function. This is not because we can regenerate the neurons that were destroyed by the stroke, but because the brain has quite a bit of plasticity, and it can compensate quite effectively for many types of injury.

But my argument is that perhaps the highly developed cortex of the human is an adaptation to the fact that information is not hierarchical. The machine language of our neuronal circuitry does not exist in binary trees. There are no unequivocal 1’s and 0’s in there. What we have, instead, are various, ill-defined regions in the brain that will light-up in response to various stimuli. These various regions are overlapping and often non-isolatable, and while they are probably attached to each other due to the similarity of these stimuli, there is certainly nothing like a hierarchy to organize them. This is probably the reason why most people have “Eureka” moments when they figure things out, instead of gradual, systematic realizations as their mind navigates up and down hierarchical structures of ideas.


This is not to say that hierarchies are not useful organizational tools. Given that most of us have a 7-item limit with regards to our short term memory, one of the most common ways to learn complicated processes is to chunk them into tree-like structures. Outlines, specifically. But we most always keep in mind that these are shortcuts, mnemonic devices, that are abstracted from the messy clusters of reality, more than one degree separated from actual reality (since the sensory regions of the brain process raw stimuli and transmit information that is already abstract by one level.) Slavish attention to hierarchy tends to lead to, at best, inefficiency, at worst, catastrophic failure.

Since hierarchies, while instinctual to primates, are not natural, the only way for an informational hierarchy to be successful is to be well-disseminated. Again, this is the reason why Linnaeus’s taxonomy is still extant—all biologists have been exposed to a version of it at some point in their education (although the particular taxonomy we now use is heavily modified by advances in unraveling genomes of the multitudes of species.)

So if you’re intent on crafting a taxonomy, you have to hope that (1) it actually at least partially models some degree of reality, (2) that people will be able to use it to make useful predictions, and (3) people will actually take the time to learn it because every other way is just too hard.

Ad hoc, one-off taxonomies are almost guaranteed to be useless, because they will never satisfy all three criteria off the bat. Give it a few hundred years, and maybe, just maybe, you’ll succeed just as well as Linnaeus did.

central dogma 0

Posted by hyperradix
on Sunday, December 09

Of course, I suppose I really should’ve searched Google before trying to coin a phrase. Other people have already used the analogy of the mechanisms of life to the mechanisms of computer programming and information technology.

While the notion of objects (in a programming sense) being self-contained entities consisting of both executable code and inert data is accurately descriptive of cellular mechanisms, the idea of software above the level of a single device being analogous to multicellular organisms hasn’t been quite addressed.


For this, we need to discuss the dominant paradigm in cellular biology, ostentatiously called the central dogma: DNA→RNA→protein. This fits well with the usual flow of code: source→raw object code/byte code→machine language. This also matches the trickle-down concept of the current World Wide Web: you download stuff from the Web onto your computer, and you then transfer digital music or videos to your device.

But in biology, the discovery of retroviruses proved that the dogma wasn’t quite that strict. In this case, RNA→DNA. In fact, it is becoming more accepted that life may have started out with RNA rather than DNA. And proteins aren’t left out of the flow of information: certainly ribosomes and histones affect the expression of DNA, not to mention the flow of nuclear receptors, as well as the transcription and replication mechanisms that copy DNA→RNA and DNA→DNA.

While it is unknown whether or not life started as RNA, code definitely started on single-cell computers. I’m not sure where to put the old mainframe servers into the paradigm, but most modern servers are essentially single-cell computers or networks of single-cell computers. In the current incarnation, interpreted languages such as perl, PHP, python, and ruby, not to mention javascript, are the “duct tape that holds the Web together.” Java and C# also fit into this schema, although there is an extra level of abstraction in the form of a virtual machine. These languages all correspond to RNA: partly structural, partly functional; executable code that isn’t raw machine language. Meanwhile, compiled languages like C/ObjC/C++ correspond to DNA: pure source code that needs to be transcribed to object code, which then needs to be translated to machine language.


But the idea of multicellular computing applying to software above the level of a single device is less concrete than this. DNA is the content that sits on the servers. RNA is the software that manages the content: browsers, media players, sync software, iTunes, RSS aggregators, but it also applies to the OS. Protein is when the content is actually used/activated/consumed: when an MP3 file is listened to, when an AVI is watched, when a Flash or Java application is deployed on a mobile device.

The Open Source paradigm makes it readily apparent that DNA, that is, content—source code—can be turned into RNA. You no longer have to buy RNA directly from the software developer. You can build it yourself.


While security practices are meant to prevent the inadvertant running of untrusted code, it’s going to happen anyway. In fact, it’s meant to happen. You can cut-and-paste scripts from the Web and deploy them willy-nilly. Scripts will mutate, reproduce, metastasize. The evolution of the Web is dependent on the flow of information to and fro.