Thoughts on data science, statistics and machine learning.

Open World Games and the Myth of Sisyphus

To the memory of Kevin Conroy. There was only ever one true Batman.


You have been playing for months. Slowly and steadily, you have harvested every collectable - making yourself stronger and stronger until you can kill the toughest enemies. Every enemy defeated, every monster slain. No side quest worth doing remains. Those not worth doing are also done because you are a completionist (which is a dignified way of saying that you have no life). No part of the open world remains unexplored - you have climbed the tallest mountain and dived the deepest sea. All of this, you have done on the highest difficulty level. All the DLC is exhausted, too. The strongest creature in the world is you.

Read more...

Effective Train/Test Stratification for Object Detection

TL;DR: Here’s a talk based on this post:


There’s an unavoidable, inherent difficulty in fine-tuning deep neural networks, which stems from the lack of training data. It would seem ridiculous to a layperson that a pretrained vision model (containing millions of parameters, trained on millions of images) could learn to solve highly specific problems. Therefore, when fine-tuned models do perform well, they seem all the more miraculous. But on the other hand, we also know that it is easier to move from the general to the specific, than the reverse. Specialization after generalization is easier than the reverse.

Read more...

A Process for Readable Code

I took a course on data structures and algorithms over the last few months. It is being offered as a part of IIT Madras’ Online Degree Program in Data Science and Programming, taught by Prof Madhavan Mukund. The program is a MOOC in a true sense, with tens of thousands of students enrolling each year. The DSA course itself is offered every trimester, and sees an average of ~700 enrollments every time. It is easy to see how communication becomes critical in running a MOOC at this scale. There is, of course, the operational and logistical communication that goes into the smooth running of the course. But communicating the content of the course is more relevant (the course is also highly interactive - in addition to weekly office hours, there are Discourse forums where students, TAs and faculty are active).

Read more...

Book Review: A World Without Email - Cal Newport

This book is a good refresher on Cal Newport’s central thesis which shows up in both Deep Work and Digital Minimalism, but with email as the central device. The same essential theorem, but a lot of new stories to go with it as corollaries. Of course, it’s not email technology that the book contests, but the hyperactive hive-mind that are enabled by people’s email habits.

But here’s the only thing I want to leave a note of: I was mildly annoyed by Newport’s invocation (or perhaps, misappropriation) of Claude Shannon’s information theory. He gives four “principles” for a world without email, the third of which he calls The Protocol Principle, which is as follows:

Read more...

Page 4 of 9