Thoughts on data science, statistics and machine learning.

Effective Train/Test Stratification for Object Detection

TL;DR: Here’s a talk based on this post:


There’s an unavoidable, inherent difficulty in fine-tuning deep neural networks, which stems from the lack of training data. It would seem ridiculous to a layperson that a pretrained vision model (containing millions of parameters, trained on millions of images) could learn to solve highly specific problems. Therefore, when fine-tuned models do perform well, they seem all the more miraculous. But on the other hand, we also know that it is easier to move from the general to the specific, than the reverse. Specialization after generalization is easier than the reverse.

Read more...

A Process for Readable Code

I took a course on data structures and algorithms over the last few months. It is being offered as a part of IIT Madras’ Online Degree Program in Data Science and Programming, taught by Prof Madhavan Mukund. The program is a MOOC in a true sense, with tens of thousands of students enrolling each year. The DSA course itself is offered every trimester, and sees an average of ~700 enrollments every time. It is easy to see how communication becomes critical in running a MOOC at this scale. There is, of course, the operational and logistical communication that goes into the smooth running of the course. But communicating the content of the course is more relevant (the course is also highly interactive - in addition to weekly office hours, there are Discourse forums where students, TAs and faculty are active).

Read more...

Book Review: A World Without Email - Cal Newport

This book is a good refresher on Cal Newport’s central thesis which shows up in both Deep Work and Digital Minimalism, but with email as the central device. The same essential theorem, but a lot of new stories to go with it as corollaries. Of course, it’s not email technology that the book contests, but the hyperactive hive-mind that are enabled by people’s email habits.

But here’s the only thing I want to leave a note of: I was mildly annoyed by Newport’s invocation (or perhaps, misappropriation) of Claude Shannon’s information theory. He gives four “principles” for a world without email, the third of which he calls The Protocol Principle, which is as follows:

Read more...

Book Review: Travels with Charley - John Steinbeck

It’s the end of 2020, and when you’ve been stuck at home for a year, with only your dog as your constant companion, Travels with Charley is a good book to read.

But this book is a lot more about Steinbeck’s road trip than about the dog.

Steinbeck romanticises everything. If so much as a tree sheds a leaf in front of him, he bursts forth with pages of ideas, thoughts and memories. Scholars have mentioned that Travels with Charley is clearly not non-fiction. And Steinbeck himself doesn’t pretend that it is non-fiction. They say he knew he was dying, and was hit with an irresistible wanderlust. With almost everything he encounters - places, people and politics alike - he stresses that these were memories that were uniquely his. And he admits that any of his opinions could be cancelled out by a single counterpoint - and of those, as many could be found as there are travellers. He never took any notes. He let mulled his memories of the road trip over well before he wrote the book.

Read more...

Page 4 of 9