von Neumann’s Nightmare

Via the always excellent Stephen Hsu – apparently the concept of Technological Singularity can be traced back to von Neumann’s nightmare:
One night in early 1945, just back from Los Alamos, von Neumann woke in a state of alarm in the middle of the night and told his wife Klari:

“… we are creating … a monster whose influence is going to change history … this is only the beginning! The energy source which is now being made available will make scientists the most hated and most wanted citizens in any country.

The world could be conquered, but this nation of puritans will not grab its chance; we will be able to go into space way beyond the moon if only people could keep pace with what they create …”

He then predicted the future indispensable role of automation, becoming so agitated that he had to be put to sleep by a strong drink and sleeping pills. Source: Von Neumann, Morgenstern, and the Creation of Game Theory: From Chess to Social Science, 1900-1960.

In his obituary for John von Neumann, Ulam recalled a conversation with von Neumann:

[about the] “ever accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.”

It is only fitting that the (co-)father of both game theory and computation also “discovered” their common endpoint.

As you recall the concept of intelligence explosion is attributed to I. J. Good and the term technological singularity to Vernor Vinge.


Against Beauty II

This blog just recently celebrated its first year anniversary!

In my first post, Against Beauty, I’ve argued that beauty is not likely a good criterion for scientific theories.

It is just telling, that now a year later I came across a quote from one of my heroes – Ludwig Boltzmann:

If you are out to describe the truth, leave elegance to the tailor.

— Ludwig Boltzmann

Incidentally, Lisa Randall talks along similar lines in a recent episode of the On Being podcast:

[…] you can frame things so that they seem more beautiful than they are, or less beautiful than they are. For science to be meaningful, you want to have as few ingredients as possible to make as many predictions as possible with which you can test your ideas. So I think that’s more the sense — I think that’s what people are thinking of. And simplicity, by the way, isn’t always beauty.

While I agree with the beauty part, I’m of different opinion on the ultimate role of simplicity in evaluating scientific theories (understood here as the Kolmogorov-Chaitin complexity of the given model).

As we learn more about the universe we necessarily will have to abandon effective theories that are “human readable”. The world is too complex, to be describable by human-mind-sized models.

For every complex problem there is an answer that is clear, simple, and wrong.

— H. L. Mencken

Beauty and simplicity are often conflated – I’m also not always clear on which one I mean).

My current thinking is that if we consider beauty of a theory to be the ratio of the model’s explanative (predictive) power to its Kolmogorov length, then this will remain a relevant model selection criterion.

But I think, that ultimately we will have to say goodbye to the notion that the Kolmogorov complexity of models is not allowed to cause a stack overflow in human brains.

We have stumbled into the era of machine psychology

Emergent science in an emergent world

When describing complex emergent systems, science has to switch from lower to higher level descriptions. Here is a typical example of such transitions:

  1. We go from physics to chemistry when we encounter complex arrangements of large number of atoms/molecules.
  2. Complex chemistry in living systems is then described in terms of biology.
  3. Complex (neuro-) biology in human brains finally gives raise to the field of psychology.

Of course, the world is not discrete and the transitions between the fields are fuzzy (think about the chemistry-biology shoreline of bio-macromolecules and cytology). And yes, the (mostly infertile) philosophical wars on the ontology of emergence are still being waged. Yet, nobody would deny the epistemic usefulness of higher level descriptions. Every transition to higher order description brings its own ‘language’ describing the object as well as a suite of research methods.

In this game, it is however very easy to miss the forest (high-level) for the trees (low-level). One interesting example I’ve noticed recently is in the field of machine learning. When studying deep neural networks (DNNs), we have already unknowingly stumbled into such a transition. Historically, most of the research has been done on the “biology” of the DNNs – the architecture of the networks, activation functions, training algorithms etc. (and yes, saying “biology” is bio-chauvinistic on my part. We should find a better word!)

Recently however, we are more and more tapping into the “psychology” of the neural networks.

Machine psychology

The deep architectures that are now in use aren’t reaching anywhere near the complexity of human brains, yet. However, with connections in the billions (here is an early example), they are too complex, too opaque, for a low-level description to be sufficient for their understanding. This lead to a steady influx of research strategies, that shift the approach from the bottom-up understanding of “machine biology” to a more top-down, “input-output”, strategy typical for psychology.

Of course, neural networks are commonly, though not quite deservedly, described as “black boxes”. And historically, parts of psychology had its flirtations with cybernetics. But it is only recently that the we see a curious methodological convergence between these two fields as machine learning is starting to adopt methods of psychology.

The interesting distinction between machine and human psychology is that we have a direct access to “brain” states of the network (inputs and activation of each neuron). With machine psychology, we are now shifting attention to their “mental” states, something that is accessible only with higher order, indirect methods.

Psychology of machine perception

A first example of the convergence comes from the psychology of perception.

Deep neural networks have revolutionized the field of computer vision by crushing competing approaches in all benchmarks (see e.g. last year’s ImageNet competition). Yet a deeper intuition for how the DNNs are actually solving image classification requires techniques similar to those used in psychology of perception

As an example: recently, an “input-output” strategy yielded an attack on neural network image classification developed by Szegedy et al. 2013. In this work, they took correctly classified images. modified them imperceptibly, so that the trained network got completely confused (see Fig 1a. below). While on the surface level such a confusion seems alarming, one should just remind oneself of the many quirks of the human visual cortex (Fig 1b.)


Fig 1a: Example from Szegedy et al. 2013: Image on the left is correctly classified by a neural net as school bus. On the right side the imperceptably modified image is however classified as an ostrich. The middle panel shows the pixel difference of the two images magnified 10x.


Fig 1b: Your visual cortex classifies the colors of fields A and B as distinct. They are the same.

Nguyen et al. 2014, then turned this game around and used genetic algorithms to purposely evolve abstract images that well trained neural networks confound with real objects. Again examples for a DNN and human visual cortex below (Fig. 2a and 2b).

Image evolved so that a neural network miss-classifies it as a guitar.

Fig 2a: Image evolved so that a neural network miss-classifies it as a guitar.

A lamp miss-classified by your dirty mind.

Fig 2b: An image of a lamp miss-classified by your dirty, dirty mind.

Gestalt psychology for hierarchical feature extraction?

These confounding attacks on classifiers are very important, since deep neural nets are being increasingly employed in the real world. Better understanding of machine perception is required to make the algorithms more robust to avoid fraud (some examples here).

The reason why image classification works so well with deep architectures is the ability to automatically extract hierarchies of features from images. To make them more robust to attacks, requires an improvement in integration of these hierarchies into “global wholes”, well summarized by the mantra of gestalt psychology by Kurt Koffka, “The whole is other than the sum of the parts” (not “The whole is greater than the sum of its parts”).

Psychometrics of neural networks

The cross-fertilization of machine learning by psychology doesn’t end with perception theory.

Measurement of psychological traits are the bread-and-butter of psychometry and the crown jewel is of course intelligence testing. This is even more salient for the field of artificial intelligence. In an early example, Wang et al. 2015 made just recently headlines (e.g. here) by claiming to beat average Amazon Mechanical Turk performance on a verbal IQ test.

Oddly enough, I haven’t yet found a reference using deep nets on Raven’s progressive matrices. This seems like a very obvious application for deep networks as Raven’s matrices are small, high-contrast images and successful solution requires extraction of multi-level hierarchies of features. I expect that DNNs should very soon blow humans out of water in this test.

Raven’s matrices are the go to test for human intelligence with g-loading around 0.8 and virtually no cultural bias. Such an experiment would likely show, that the nets to achieve IQ 200+ in a very vivid illustration of the relationship between proxies for g and the actual “general intelligence” – the holy grail of artificial general intelligence (AGI) research.

Here is then a nice summer project: put together a DNN for solving Raven’s matrices. I even recall a paper on machine generation of test examples so enough training data will not be a problem!

Deep nets and Raven’s progressive matrices are made for each other.

Machine psychotherapy, creativity and aesthetics

On a joking note – if there is machine psychology, could there be also machine psychotherapy? How could a venerable Freudian help his DNN clients?

There are some very playful examples done with generative models (based on recurrent deep networks), e.g. text generation à la Shakespear/Graham/Wikipedia. A machine therapist will definitely be able to use their good old tools of word association games and automatic writing to diagnose whatever will be the digital equivalent of Oedipus complex of his machine patients.


Did you again dream about electric sheep Mr. Android?

Even the good old cliché of dream interpretation can be brought out of retirement.
Geoffrey Hinton spoke about machine dreams long time ago. And the psychologist are already picking up on this:

One of the areas that I’ve been looking at recently is machine dreaming, the question whether AI systems are already dreaming. There’s little question that they meet our criteria for what a dream is, they meet all our definitional criteria. There’s better evidence really that machines, AI systems, are dreaming, than there is that animals are dreaming that are not human.

— Associate Professor of Psychology, James Pagel on the “All in the Mind” podcast.

The excellent paper by Google researchers, Inceptionism: Going Deeper into Neural Networks,  shows beautiful demonstrations of DNN fantasies, dreams and pareidolia. The psychology of digital psychedelic experience is close too.

What deep neural nets dream about actually.

What deep neural nets dream about actually.

This section is of course tongue in cheek, but its aim is to illustrate, that already now, the state-of-the-art DNNs can achieve very rich “mental” states.

Sidenote: speaking of machine therapy, the other way around, i.e. machines being therapist to humans, is a promising researched field. Indeed they seem to come a long way since the command line therapist and the `M-x doctor` (for the Emacs fans out there).

Machine ethology. Machine sociology. Machine etiquette. Machine politics.

Machines are already talking to each other a great deal: think of the internet, communication networks, or the budding world of internet of things. For now, the conversation is only between agents of low sophistication using simple, rigid protocols. We could perhaps already talk about machine ethology, maybe even nascent sociology. TCP/IP is an example of simple machine etiquette.

But the real deal will come when the artificial agents get more sophisticated (i.e. DNNs) and their communication bandwidth increases.

The final step is achieved, when the agents start to create mental self-models and also models of the other agents they are communicating with. The gates of social psychology, sociology and politics will be then pried wide open for our machine comrades.

Future of hard science is soft science?

Will your AI team soon have to hire a machine psychologist? Maybe so.
It is fascinating, that the hardest of hard fields – mathematics/statistics/AI research/software engineering in the areas of AI converges on methods from soft science.

Soft-sciences, mind you, not humanities.

The uncertainty around Knightian uncertainty

Definitions are due

Knightian uncertainty is a proposition that an agent can have a completely unknowable and incalculable uncertainty about an event. This type of uncertainty goes far beyond the colloquial meaning of “uncertainty”, i.e. an event with subjective probability 0<p<1, by refusing to ascribe any probability distribution to a given proposition.

While the little devil of common sense sitting on your shoulder might wisely nod in approval, the bayesian angel on the other shoulder screams: “Impossible!”. A proper bayesian agent is infinitely opinionated and can serve you a probability distribution for any proposition. Anything short of that leads to an exploitable flaw in your decision theory.

So are there fundamentally unknowable events, or is this just sloppy thinking? Are non-bayesian decision theories leaving money on the table, or are bayesians setting themselves up for a ruin via a black swan.

Knightian uncertainty in humans

Let’s start with something uncontroversial: humans, even at their best, are only very weak approximations to a bayesian reasoner and therefore it might not surprise us that they could legitimately exhibit fundamental uncertainty. A good summary, as usually, can be found at Yudkowsky’s When (not) To Use Probabilities – humans are inherently bad at reasoning with probabilities and thus open to Dutch book exploits due to inconsistencies. While some see it as a failure, others say a prudent thinker can rightfully be stubborn and refuse to stick out his neck.

As I side note, we don’t have to require a bound reasoner to literally have a distribution for every event. But shouldn’t he/she be able to compute one when pushed hard enough?

For humans, claiming Knightian uncertainty can be a crude but useful heuristic to prevent playing games, where we might be easy to exploit. Does the concept generalize beyond the quirks of human psychology?

The luxury of a posterior

The role of a decision theory of an optimizing agent is to help him to maximize his utility function. The utility at any given time is also dependent on the environment and therefore it might not be surprising, that under certain conditions it can be beneficial to tailor the decision theory of the agent to the specifics of a given environment.

And some environment might be more hostile to cognition than others. Evolutionary game theory simulations often have bayesian reasoners getting beaten by simpler agents, that dedicate resources to aggressive expansion instead of careful deliberation (I’m quite sure I have this from Artem Kaznacheev, but for my life can’t find the link). Similar situation occurs also in iterated prisoner dilemma tournaments.

While these simulations are somewhat artificial, we might approach these harsh-for-cognition situations in e.g. high frequency trading, where constructing careful posteriors might be a luxury and a less sophisticated, but faster algorithm might win out. As an example, we have a quote (unsourced) from Noah Smith:

Actually, there are deep mathematical (information-theoretical) reasons to suspect that lots of HFT opportunities can only be exploited by those who are willing to remain forever ignorant about the reason those opportunities exist.

Interestingly, a sort of “race-to-the-cognitive-bottom”, might play out in a multipolar artificial intelligence take-off. While a singleton artificial intelligence might nearly optimally allocate part of its resources to improving its decision theory, in a multipolar scenario (fragile as it might be), the winning strategy can be slimming down the cognitive modules to its barest minimum necessary to beat the competition. A biological mirror image to such a scenario is the breakdown of the Spiegelman Monster discovered by Eigen and Oehlenschlager.

Apart from these concerns, another motivation of Knightian uncertainty in the algorithmic trading can be a split between internal and actionable probabilities in some market making algorithms as a protection from adverse selection (more on here).

In summary, not constructing a posterior for a proposition could be a reasonable strategy even for a much wider class of reasoners than quirky humans especially under resource/computation time bounded scenarios. After all, there are no free lunches, including for bayesians.

While these all sounds reasonable, it still does leave me unclear about a general framework to select decision theories when a bayesian approach is too expensive.

Substrate level Knightian uncertainty

There is still one more possible step – moving the uncertainty out of the cranium of agents into the wild world, into the physical reality itself. Scott Aaronson’s fascinating paper The Ghost in the Quantum Turing Machine, is built on the thesis of “Knightian freedom”, an in-principle physical unpredictability that goes beyond probabilistic unpredictability, that is inherent to the quantum nature of physics. As a poor bound cogitor, I’ll proclaim here my own Knightian uncertainty and refuse to fabricate opinions on this thesis [1].

[1] Ok, I found the paper very interesting, but I don’t agree with most of it. Nonetheless, I also don’t feel nowhere near knowledgeable enough to go into a much deeper critique.

MOOCs in an hourglass economy

Last time we discussed Alexandre Borovik‘s analysis of the crisis of the mathematical education and its socio-economic roots and impacts.

While I fully agree with Borovik’s analysis, I do miss one factor that can be important for the future of education – Massive open online course (MOOCs). They have several limitations in their current iterations and it is almost comic to see the awkward monetization schemes that many providers are currently experimenting with. However, I think that they do have a disruptive potential on higher education and it is only a question of time that we will figure out how to do them better.

The promise of MOOCs

MOOCs can address several of Borovik’s requirements for a better education. They provide wide access to the best mentors, allowing access to some elements of the “deep education” to a larger audience of pupils. Some personalization of the content is possible, though they can’t be as deeply personal as say a Zunft system. In my opinion, however, they provide a very acceptable trade-off point on the availability – personalization axis.

While I was lucky enough to have several excellent teachers during my education, it never was tutoring on individual level. I think the impact of individual mentorship might be generally overestimated, with an exception on the extreme high-end of achievement spectrum. A MOOC could there at least give the mentor access to the highest performers to spin-off deeper, smaller circle education.

Cognitive inequality and the hourglass economy

Borovik talks about an hourglass economy: in a technologically advanced society, there is no market demand for “middle” level mathematical skills. The largest fraction of population requires only rudimentary arithmetic for its everyday life (using a calculator or a spreadsheet at best). On the high-end there is a very small group of high-skill workers required to develop and implement the newest technological advances.

MOOCs do not solve the disappearing middle problem. In fact they might be driving an even larger wedge between high- and low- ability ends of the distribution. This is because they rely more on self-motivation and therefore profit strongly those that have not only high aptitude, but also a high “appetite” for knowledge. This contributes strongly to the growing cognitive inequality, but by tapping into a wider population, it might be sufficient to fill the upper bulb of the hourglass economy.

The cognitive inequality gap will be a very important factor in the near future (it is showing already now). It runs very deep into our cores – indeed in our genetic essence – and we do not have simple mechanisms to alleviate it, like taxation in case of wealth inequality.

MOOCs, assuming they stay free and internet access continues to spread in developing countries, have here at least the upside, that they rely purely on self-selection. The burgeoning cognitive elite doesn’t receive its status from an “entitled” institution or similar but is self-selected by its own virtue of putting time and effort into self-education. The system can be also more meritocratic than most of its alternatives.

Late end-game?

Lot’s of questions and not many solutions. Ultimately, however, it might be also simply too late to worry. If development of an artificial general intelligence is maximum a few decades in the future, human knowledge, may it be as deep as it wishes, will soon be completely left behind. The ultimate limit is clear and independent of the exact timing – as illustrated by Greenspan’s quote in Borovik’s paper:

While there is an upside limit to the average intellectual capabilities of population, there is no upper limit to the complexity of technology.

Biological boot loaders

Elon Musk a few months back summarized Nick Bostrom’s book Superintelligence: Paths, Dangers, Strategies:

While I find this issue indeed existentially important, I was a bit disappointed by the book because Bostrom turned out to be himself a biological boot loader – this time for Yudkowsky’s and Hanson’s memeplexes.

While these are by far not the worst memeplexes to be infected by, still in this important problem space one would like as many independent high-caliber search paths as possible.

Against beauty

I’ve come across a quote by the great Buckminster Fuller:

When I’m working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is beautiful, I know it is wrong.
-Richard Buckminster Fuller

I felt satisfied – I always thought that requiring beauty in our solutions (in mathematics, physics, design or elsewhere) is very parochial. If your solution is beautiful, it only means the problem was too easy.

This is not to diminish the aesthetic pleasure of a beautiful proof, particularly clever experiment setup or elegant line of reasoning. Even after so many years, I still remember when I first heard the Cantor’s diagonal argument [1] and I can re-live the sheer excitement of it.

Buckminsterfullerene, a particularly beautiful configuration of 60 Carbon atoms.

On the other end of the beauty spectrum we can put the proof of the Four-color theorem. It was derived in the 70s using a computer and because of that it was (and still is) considered inelegant and problematic. Similarly, in physics solutions derived based on simulations are often deemed intellectually unsatisfactory.

However, beauty is a just a heuristic criterion telling us that a description of the system was found, which not only has a high-compression factor (sign of a good theory), but that this compression is in fact high enough to make the model of the system conveniently mind-sized, i.e. that it fits into the very limited memory and processing capacity of a human brain.

This also means, that a more capable cogitor (presumably an artificial intelligence) would have an aesthetic sense that extends far beyond the reaches of human minds and most of its creations would be deemed ugly by our standards. For such a system the difference between the Cantor’s diagonal and the reduction to 1936 submaps in the Four Color Problem might be only a tiny step down in the “beauty” department.

In the general problem space, problem that have solution we deem beautiful occupy only a vanishingly small subvolume. What worries me is that for many of our most important problems (scientific, technological, societal, ecological) there might be no solutions that we will considered beautiful. If we’re looking for beauty we might miss the correct (or at least satisficing) solutions.

One final lesson too: to echo the original quote – if you think you found a citation that beautifully demonstrates your idea, you know it is wrong.

Why? It turns out, that originally I misread the quote and it should in fact read:

When I’m working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong.
-Richard Buckminster Fuller

Emphasis mine.

So while Fuller doesn’t calls for looking for beautiful solutions, he still does use it as a correctness criterion. Let’s just hope we are not dismissing a satisfactory solutions, by chasing a mirage of a (non-existing) beautiful solution.

[1] If you haven’t seen it yet, I almost envy you that you have the experience ahead of you. Do yourself a favor and check it out!