About

I’m an extrophysicist and datanomer. An ex-astrophysicst and data scientist. The purpose of this blog is to quietly converse with myself, because a pathology becomes a charming eccentricity, when done publicly. Rigth?

But really I just enjoy learning about the Universe. Here is a non-exhaustive list of things, I’m interested in and might discuss in one form or another on the blog:

  • Artificial Intelligence – both narrow and general
  • Probability theory, statistics, bayesian epistemology
  • Machine learning and data science
  • Physics & Cosmology
  • Rationality
  • Decision and game theory
  • Effective altruism
  • Economics, finance, econometrics, econophysics
  • Cognitive sciences, neuropsychology
  • Genetics
  • Philosophy and history of science – particularly physics and statistics
  • Linguistics
  • Distributed systems
  • Complex systems, chaos
  • Literature, poetry, music… and their relationship to science
  • Mindfulness and meditation
  • Meta-ethics – particularly of the consequentialist/utilitarian kind
  • Learning and education
  • and too many other things….

And I’m trying to learn about writing. By doing it.

Albrecht Dürer, Melancholia

Advertisements

Synapses are cheap, experiences are expensive

Here is a simple back-of-the envelope calculation based on Hinton’s talk on deep learning:

You live about 102 years. A year is π x 107 seconds, so your life is ~109 seconds. Let’s say you receive 10-100 “experiences” (impressions) per second (brain activity is between few tens to few hundreds of Hz).

Thus, your life is about 1010-11 experiences. Blink an eye and it’s gone.

Your brain has 1011 neurons, with average connectivity of almost 104. That is a total of 1015 synapses.

You have thus about ~104-5 synapses per experience. There is no way the brain could fit a proper model (in the statistical sense), since Nparameters >> Ndata. Instead it has to strongly rely on regularization and sparsity.

Experiences are exceedingly expansive, a synapse is very cheap.

A synapse. Ugh, look at it… so cheap!

This puts all the tired blank slate arguments to grave – brains have to come somewhat pre-trained/regularized via genetics.

Also relevant – the argument is actually just a reformulation and generalization of the poverty of the stimulus argument by Noam Chomsky in the field of linguistics (a field now incidentally totally dominated by deep learning).

The Inconvenient Truth About Data Science

Posted by Kamil Bartocha, a few excellent points. I agree with every single one of them:

  1. Data is never clean.
  2. You will spend most of your time cleaning and preparing data.
  3. 95% of tasks do not require deep learning.
  4. In 90% of cases generalized linear regression will do the trick.
  5. Big Data is just a tool.
  6. You should embrace the Bayesian approach.
  7. No one cares how you did it.
  8. Academia and business are two different worlds.
  9. Presentation is key – be a master of Power Point.
  10. All models are false, but some are useful.
  11. There is no fully automated Data Science. You need to get your hands dirty.

What I learned this week: 18/2015

This was a very busy week, co-organising a workshop, travelling and then an extended weekend.

Statistics, Probability, Machine Learning, Data Science

Xgboost

I played with Xgboost, a parallelized gradient boosting machine implementation. I managed to install it on both a Windows and a Linux machine and it really is fast. I didn’t test it yet directly against the standard GBM implementation so I can’t say if the advantage is purely speed (claim is up to 20x) or if you can get extra predictive power per computational cycle.

Kaggle

After following the fora since quite some time I decided to actually try it (made just 2 submissions on 1 competition). I don’t expect it to use it competitively, and it is definitely a bit of stylized/artificial approach to data analysis/machine learning, but I think it is an interesting endeavor.

It is a martial arts kata excercise to the dirty, non-linear street fighting of the day-to-day data science. Where, as you know, street fighting is 50% of knowing when to avoid fight, 10% actual fight and 40% administration and sitting in meeting. Wait, somehow my metaphor broke.

Kaggle Higgs boson search post-mortem

Read up mostly Motl’s point of view (1,2, and the xgboost solution 3) and the Kaggle forum. Turned out to be a bit less interesting and enlightening than I thought, but it fit with the xgboost theme of the week.

General Science / Misc.

ER = EPR? 

Quanta has a series on recent developments here: 1, 2, 3. Being a total dilettante, but this just feels so very right. God knows, I’ve been in the past excited about many results, that then just went away. But this time it is different (I always say that…). The firewall problem was just a precursor, this is the real deal.

The quantum entanglement wormhole octopus is my new favourite animal.

So you want to be a consultant…?

Excellent article. Focused on freelence IT, but it is interesting also for other areas. And even if you don’t want to be a freelance consultant. Or a consultant.

The days are long but the decades are short

Sam Altman turns 30. Here is the wisdom from the elder. (The article is alright, it just strikes funny to get a life advice from a 30 year old).

Prescriptions, Paradoxes, and Perversities

An alarming analysis by Scott Alexander on the state of pharmaceutical.

Management Myth

An entertaining piece on MBA education and management.

Or: we hope this article will compensate you with a smug feeling of superiority, because although you have the hard science doctorate, we’ll pay far more for the fresh MBA graduate manager :).

Videos / Lectures

The Knowledge – Lewis Dartnell | Authors at Google

  • have his book on my “to read” pile. The talk wants me to move it up in the queue.

Podcasts

Phil Rosenzweig on Leadership, Decisions, and Behavioral Economics

  • Strongly recommended – lot’s of new ways to look at familiar experimental results and their (non) implications in practice. Highlight of the week.

Triple H on Pre-Fight Rituals, Injury Avoidance, and Floyd Mayweather, Jr.

  • after enjoying the Schwarzenegger episode and even (gasp!) the Glen Beck one, I’m hardly surprised about a wrestler coming out as a very reasonable, driven and articulate man. Ferris is an excellent interviewer and his podcast is very good time filler when I’m too tired for other stuff.