What I learned this week: 18/2015

This was a very busy week, co-organising a workshop, travelling and then an extended weekend.

Statistics, Probability, Machine Learning, Data Science

Xgboost

I played with Xgboost, a parallelized gradient boosting machine implementation. I managed to install it on both a Windows and a Linux machine and it really is fast. I didn’t test it yet directly against the standard GBM implementation so I can’t say if the advantage is purely speed (claim is up to 20x) or if you can get extra predictive power per computational cycle.

Kaggle

After following the fora since quite some time I decided to actually try it (made just 2 submissions on 1 competition). I don’t expect it to use it competitively, and it is definitely a bit of stylized/artificial approach to data analysis/machine learning, but I think it is an interesting endeavor.

It is a martial arts kata excercise to the dirty, non-linear street fighting of the day-to-day data science. Where, as you know, street fighting is 50% of knowing when to avoid fight, 10% actual fight and 40% administration and sitting in meeting. Wait, somehow my metaphor broke.

Kaggle Higgs boson search post-mortem

Read up mostly Motl’s point of view (1,2, and the xgboost solution 3) and the Kaggle forum. Turned out to be a bit less interesting and enlightening than I thought, but it fit with the xgboost theme of the week.

General Science / Misc.

ER = EPR? 

Quanta has a series on recent developments here: 1, 2, 3. Being a total dilettante, but this just feels so very right. God knows, I’ve been in the past excited about many results, that then just went away. But this time it is different (I always say that…). The firewall problem was just a precursor, this is the real deal.

The quantum entanglement wormhole octopus is my new favourite animal.

So you want to be a consultant…?

Excellent article. Focused on freelence IT, but it is interesting also for other areas. And even if you don’t want to be a freelance consultant. Or a consultant.

The days are long but the decades are short

Sam Altman turns 30. Here is the wisdom from the elder. (The article is alright, it just strikes funny to get a life advice from a 30 year old).

Prescriptions, Paradoxes, and Perversities

An alarming analysis by Scott Alexander on the state of pharmaceutical.

Management Myth

An entertaining piece on MBA education and management.

Or: we hope this article will compensate you with a smug feeling of superiority, because although you have the hard science doctorate, we’ll pay far more for the fresh MBA graduate manager :).

Videos / Lectures

The Knowledge – Lewis Dartnell | Authors at Google

  • have his book on my “to read” pile. The talk wants me to move it up in the queue.

Podcasts

Phil Rosenzweig on Leadership, Decisions, and Behavioral Economics

  • Strongly recommended – lot’s of new ways to look at familiar experimental results and their (non) implications in practice. Highlight of the week.

Triple H on Pre-Fight Rituals, Injury Avoidance, and Floyd Mayweather, Jr.

  • after enjoying the Schwarzenegger episode and even (gasp!) the Glen Beck one, I’m hardly surprised about a wrestler coming out as a very reasonable, driven and articulate man. Ferris is an excellent interviewer and his podcast is very good time filler when I’m too tired for other stuff.

What I learned this week: 17/2015

Statistics, Probability, Machine Learning, Data Science

1. Correlation coefficients beyond Pearson/Spearman/Kendall

  • I keep switching to Spearman from Pearson during exploratory data analysis (more robust to outliers and a bit better on non-linearities). This week I decided to look around for some further options, and who knew, there is indeed work being done in the area. Particularly, Maximal Information Coefficient seems very promising (although it is computationally intensive and does have some problematic properties). However, I’m just looking for something to help me quickly orient myself in sets with many predictors and this looks up to the task for non-linearities. Will definitely try it next time. Good overview here (pdf).

2. Generalized additive models

  • Played a little bit with GAM’s this week. On the one use case the performance didn’t improve over my other models. Plus they are too complicated to implement on a SQL server… at the moment I’ll keep them on the backburner. Good practical review here (pdf).

3. Thinking about predicion intervals and metrics

  • Lot’s of things here. I’ll need to digest it a bit more.

General Science / Misc.

1. Crispr/cas9 edited human embryos

The big news of this week. When Crispr/Cas9 first came out some time ago, some have theorized that the chinese will go forward with applications to human germ lines. Turns out they did, but it’s less alarming than it might sound. The results are very cautionary, but still very promising. The West can not win anything by carpet-banning research in this area.

2. Shift in the String wars

3. Where Are The Big Ideas in Neuroscience?

4. The Wolf of Wall Tweet

  • Algorithmic trading based on news items is not new, but apparently somebody made a killing in the last weeks in nigh-expired stock options, trading within 1s of the newswire publication. Read all about it in a badly written article, with an annoying “I have a friend…” structure (and the friend is annoying too), which despite its title has nothing to do with twitter, actually.

Papers

Smil,V. 2015. The visionary energy solution: Triple windows. IEEE Spectrum March 2015:28

  • Yeah, short ditty, but I like Smil.

Generalized Additive Models

Comparing Measures of Dependency

  • I accidentally stumbled on Michael Clark’s page 2x this week in 2 different topics. Very nice, practical reviews.

Videos / Lectures

Model Thinking @ Coursera

  • Gruellingling long, but finally over. The pros: definitely worth knowing many of the modes. Cons: too long. Large chunks of the videos are working out simple arithmetic. If you can, go on your own pace, skipping a lot of content. For me being formally signed up helps to finish the courses, so I had to grin and bear.

Robin Hanson – Attitudes to the Future

  • Not much new, but I always enjoy Robin Hanson. Particularly interesting the part on the hype cycle around artificial general intelligence. As somebody who was interested in AGI befor it was cool, I symphatetize (and this is a half joke, since the field predates me by a few decades 🙂 )

Podcasts

1. What is Transhumanism – Review the Future
2. Duggan on Strategic Intuition – Econtalk
3. Sustein on Infotopia – Econtalk
4. Moynihan: What I’ve learned losing a million dollars – Tim Ferris Podcast

Books: Non-Fiction

Books: Fiction

What I learned this week: 16/2015

Statistics, Probability, Machine Learning, Data Science

1. Multiple comparisons

Finally had a look into multiple comparisons beyond Bonferroni’s correction. Didn’t yet get around to read Gelman’s Why We (Usually) Don’t Have to Worry About Multiple Comparisons (pdf).

2. Robust regression, Quantile Regression

Do you know Warren Buffet’s adage “You get, what you incentivize for”? Well, in machine learning:

You get what you optimize for.

After weeks of arguing for MAE instead of RMSE for model evaluation in a project, I finally had to eat my own dog food: not only to evaluate a model on MAE, but actually optimize for it. This opened a new world for me with Robust versions of regression (Huber loss functions) and quantile regression.

There is a ton to learn here, looking forward to it.

3. Sorting through my thoughts on Knightian uncertainty

Here. Still some work left to do on extensions of expectation maximization.

General Stuff

Kerfuffle around Growth mindset

The Shortest-Known Paper Published in a Serious Math Journal

Anti-market opportunities

  • Good formulation – something to live by, if I ever return to academia.

Holacracy

  • Oh, the sweet, sweet naivety. I’d enjoy working like this, but one should read up on suboptimisation.

The button

Productivity, Life Advice

Summary of Ikigai by Sebastian Marshall 1

Summary of Ikigai by Sebartian Marshall 2

  • Not sure if I’ll read the book, but the summaries have good formulations of some well known thoughts.

What You’ll Wish You’d Known by Paul Graham 

  • Indeed.

How to Stick With Good Habits Every Day by Using the “Paper Clip Strategy”

  • Habbit building strategies are dime a dozen, but this one I haven’t heard before. It sounds cute and I can absolutely imagine it works for ugh-mine-fields.

Papers

Evaluating Trading Strategies

  • Pretty good overview of the multiple comparison corrections

Videos / Lectures

Randy Pausch Last Lecture: Achieving Your Childhood Dreams

  • No, I just got some mote in my eye.

Conversations with Tyler: Peter Thiel on the Future of Innovation

  • Cowen did a good job with the questions and steering the conversation. Highly recommended.

Books: Non-Fiction

The Undercover Economist Strikes Back: How to Run–or Ruin–an Economy

Books: Fiction

Schild’s Ladder: A Novel