Tuesday, September 23, 2008

Consciousness as a hierarchical clustering of models

This is a simple way to think about consciousness.

Most people behave like animals. Animalistic behavior is the default model of a human, unless a more specific model has been learned.

More specific mental models of the world can be learned, however. These models tend to form hierarchies — models tend to build on each other, and more sophisticated models are usually less general, suitable only for restricted environments.

Because it takes a huge amount of cognitive effort to develop a new mental model for part of the world, and people are cognitive misers, people almost exclusively borrow cognitive models rather than develop new ones, and almost exclusively take actions that are implied by some cognitive model. The lack of cognitive energy that people have available to expend thus immediately implies that people use a hierarchy of cognitive models, as it's easier to modify an existing model than develop a new one, and due to the same cognitive limitations, elements of the hierarchy are shared largely among people. Unique models are rare, because they "cost" a lot.

When pressed for an action, a human performs a nearest neighbor type matching to find the model most appropriate to the current situation, and takes action based on logical or propositional inference in the model.

This explains a lot of peoples' irrationalities. For example, most people don't pursue their rational long-term interests. But this makes sense, because in a hierarchical sense consciousness looks like:
 - Animal
- Basic human, interacting socially, short-term goals
- Human pursuing long term self-interest
- Human pursuing intellectual topics, causing less
long term self-interest (e.g. decreased pay in
academia)
Analogous examples can be found throughout the cognitive space. For example, religions exhibit this tree-like structure. When models are transmitted between people in the religion, the more general concepts — the root node — of the religious model must be transferred first, or else the general right actions won't be taken by adherents to the religion. The more specific theological issues and distinctions are learned later on as sub-nodes, down to the hair-splitting details.

For other examples, consider any academic course, where general introductory material is taught first. More specific material comes later in the course, as more of the hierarchy has been communicated. The specific material is less generally applicable, was first discovered later in time, and the people who discover more specific material build upon and relate it to the seminal discovery. Prime examples can be found in music, physics, mathematics, biology, art, philosophy, ethics, politics, and I argue all human cultural and intellectual movements.

This meta-model also implies that many of the cognitive errors people make will be due to selecting the right part of the tree, but choosing slightly too specific or too general of a model in the tree. This is because of a non-overlapping property of models — because it costs a lot to develop a new model, and it's cheapest to develop a new model by specifying a well-articulated domain in which it differs from its parent model — it follows that models as they develop historically only exhibit small, well-specified regions of contradiction, have a clear tree structure, and have contradiction resolution rules. It would seem possible for a human to hold a forest of mutually contradictive models in his head, but — in order to act — he must create a resolution model, specifying what to do when models contradict. So a wholly disconnected forest structure is impossible. By cognitive miserliness, it's extremely difficult for individuals and societies to knit together trees of knowledge, due to the interaction terms. If two branches of models contradict, a human finds it easiest to reject one branch, for example see the debates of religion vs. science, or else — and this is more rare because it takes much more energy — learn a contradiction resolution model for all the interactions between the two tree branches (the easiest solution being NOMA). Because teaching proceeds from the root of the tree, it's also easy for an individual to simply not be aware of or intentionally ignore any tree of knowledge that contradicts with what he currently knows.

One last issue remains in this theory. Are models parametric, or non-parametric? Clearly, non-parametric models when applied in a brain store far too much information for the model plus the information to be communicated effectively. Thus heavy amounts of learning are necessary when using non-parametric models. Non-parametric models also have a disadvantage among social creatures — consequences of the model cannot be used in argumentation, as the model depends on the specific training stimuli. Thus non-parametric models probably find application where fast inference is needed, but the correctness of thinking is less important, for example when linking mental concepts, social bonding, use of language, or performative activities such as sports or music. Non-parametric models are heavily subject to individual experience, and thus have less use in the hard sciences.

Great novelists tend to write their best works at an old age, because novel writing uses non-parametric, highly experiential models. Great physicists make contributions at a young age, because they use parametric models, which can be learned more quickly. Thus the common belief that "mathematics and physics is a young man's game:" young people learning parametric models are much more likely to see contradictions (or in the case of mathematics, inconsistencies that bar unified theories) between the models as they are learned during youth. By resolving these contradictions, young people cause math or physics to progress substantially. Older people have learned the subject already, and so tend to assume what they "already know," and don't see inconsistencies. Cognitive decline may play a role too, but there are certainly mathematicians and physicists who have made substantial contributions at an old age.

The extent to which theories can be parametrized and formalized helps make a distinction between the humanities and the sciences. It also gives a slightly different perspective on the question of why mathematics is unreasonably effective for describing the physical universe. The answer is that if humans try to describe the physical universe using non-parametric, or experiential models, then there will be no hope of precision and falsifiability. Non-experiential models must be used; however, there's no reason that mathematics — using a maximally parametric model, with few assumptions and infinite precision — should be favored for physical theories, over say the intentional reasoning and hierarchies of special cases in biology.

In conclusion, instead of consciousness being profound, mystical, and hard to grasp, it is clear that people are very orderly about their decision processes. They simply apply hierarchical mental models. Almost always, a human's models have been copied from his fellows. Because of intellectual limitations, large groups of people share the same models and even the same tree structured hierarchy of models. Novel models are very rare — they happen about as frequently as intellectual revolutions. This is because most of the effort is in building a new model, and once this happens there are social benefits for communicating the model.

Consciousness clearly consists of parameter fitting for widely disseminated models.

---

Postscript: Viewing consciousness in terms of parametric and non-parametric models also helps explain why people feel like they're terribly unique and have very unique experiences, while not having any startlingly unique thoughts or intellectual insights. This is because thoughts are communicated via parametric models. People are unique, but individuation can't occur in a parametric setting, because the models aren't flexible enough to encode individual experience. Experiences can be encoded in non-parametric models. Communicating this "gut" learning either involves telling stories (hence human affinity for stories) or transforming the non-parametric knowledge into a parametric model, which takes an immense amount of intellectual effort, so it's rather rare.

For example, most people simply experience their lives, but occasionally one encounters a person like Freud, who encoded a large number of human experiences into a parametric model. Freud's models are rather bizarre, yet much of psychology related back to Freud, because he was a root node in the model-space of psychology.

This theory also makes predictions on the possible total number of parametric models. Because people are cognitive misers, any theory that has been widely communicated — by whatever means — will tend to be "baked in" to the tree of knowledge, while other theories will be ignored, or described as perturbations of the commonly accepted theory. Thus while many people before the time of Freud may have communicated theories similar to Freuds' (e.g. the inventors of Gestalt psychology, the Würzburg School, or philosophers such as Nietzsche), Freud communicated his theories more effectively, so his theories are remembered, and other theories are considered perturbations on Freud's model. Another example is the Cooley-Tukey FFT, discovered by Gauss in 1805 and re-discovered by Cooley and Tukey in 1965. The FFT remains attributed to Cooley and Tukey, and is considered a milestone in the development of computer science, despite its invention by Gauss one and a half centuries earlier — when computers didn't even exist! (The FFT was also discovered by Lanczos, prior to the re-discovery of Cooley and Tukey; Lanczos also discovered van Stockum dust before van Stockum!)

A global limit on the number of parametric models can be found based on limits on the depth of the tree, and the speed at which any given model can be learned. For example, it's difficult to have a working knowledge of quantum field theory before the age of 20, because simpler theories have to be learned first: arithmetic, algebra, calculus, classical physics, relativity, and quantum mechanics. If it takes 20 years to learn 7 paradigms, then naive extrapolation and a lifespan of 80 years gives a maximum tree depth of 28. After exceeding this maximum tree depth, it won't be possible for humans to do any further learning in the depthwise direction, and all future theories will have to create new branches in the horizontal direction. This is primarily a limitation due to lifespan and learning rate, not total cognitive capacity.