Thoughts on data science, statistics and machine learning.
The PlotCaptions Dataset: Automating the Narration of Visual Analytics
I’ve always been interested in how we narrate visual analytics. The hardest task in dataviz is not analysis or visualization, but figuring out what to say about it. I used to believe that a well designed chart does not need a narration. That may be valid, but over the years I’ve realized that it is the narrative that turns something that is merely pretty and insightful into something that is viral.
Notes on Optimizing Torch Models
ML researchers are from Mars and the ML engineers responsible for deploying models are from Venus. The two have vastly different motivations. The ML researcher’s job, given a dataset and some compute, is to find the lowest possible loss on a task. In this pursuit, no engineering cost is too high. No tech debt is too large. Worse still, if they get published, they must include their code in their paper.
The Bridge of Asses: Learning Coding with Novices
Over the last few years, I have been deeply involved with the IIT-M Programme in Data Science & Applications - as a student, a mentor and an analytics consultant. The programme provides diplomas and bachelor’s degrees in data science and applications. I’m often asked why I’m so invested in the programme - especially since I’m already an experienced data scientist. At least three people are mad at me for being in the programme.
Book Review: Invisible Women by Caroline Criado Perez
I learnt long ago that throwing data at people doesn’t change their opinions. After reading this book, I’m inclined to think that I might have been wrong. I wish I could carry around several hardbound editions of this book and throw them at anyone who says or does anything sexist. A well-aimed hardback to the bridge of the nose could work magic. And it would count as throwing data, too. Of the 400 pages of this book, 70 are just the endnotes.
Bayesian Storytelling
I launched a newsletter yesterday. So far, the feedback has been good. A few readers said that they felt drawn in by the writing. In any case, the purpose of the first few posts is simply to get myself warmed up. Any extra flutter the posts generate is a bonus. Amit Varma recommends not looking at the stats for a couple of years. It taught me quite a few things. Particularly that I need to be smart about the data analysis.
More Pixels
Someone must have thought, likely with good reason, that more pixels on larger screens is a good idea. Then, someone else must have thought that if it’s a good idea somewhere, it must be a good idea everywhere. It’s probably for the same reason that I can’t find a new car with tactile switches anymore. Everything’s got touch buttons. In my car, nothing other than the steering wheel has any physical feedback.
Beer on the Mekong - Growth, Discovery and Creativity in 2025
It’s easy to dismiss New Year’s Day as just another revolution around the sun - an arbitrary checkpoint which carries no inherent meaning. But then all checkpoints and milestones are arbitrary. Who’s to say that 17 year and 11 months old teenager is significantly less mature than an 18 year-old adult? We have milestones because they’re convenient. So in the spirit of benchmarking convenience, perhaps it is not such a bad idea to stick to resolutions, goals and plans.
Do Feed the Trolls
I’m trying to build a habit of writing about things that trigger me. For one, I’m sensitive, so there’s an infinite supply of things to write about. Moreover, writing helps clarify what you’re really triggered by. People who have a regular journaling habit say that it’s revelatory and therapeutic. This is about the recent Kamra-Aggarwal debate spat. It’s not about who’s in the wrong - we all know the answer to that.
Misconceptions about OCR Bounding Boxes
Over the last year, I have been working on an application that auto-translates documents while maintaining the layout and formatting. It has many bells and whistles, from simple geometric tricks to sophisticated gen-AI algorithms and microservices. But basically, the app performs the simple task of identifying text in documents, machine-translating them, and reinserting them such that the output document “looks” like the input. Most documents that my app has to process are PDFs.
Essays of Revolt - Jack London
My company uses an e-HRM system. The system is why my colleagues and I never forget to wish each other on birthdays and anniversaries. Systems like these save us from the embarrassment of appearing indifferent. Other systems like smartphones ensure that the most halfhearted birthday greeting appears sincere and colourful. All you have to do is type “Happy” and the autocomplete does the rest - it composes the shortest message needed to show how much you care.