Thoughts on data science, statistics and machine learning.
Resting on Your Laurels
He was short and chubby, only 13 or 14 years old. He was preparing tandoori rotis. He’d tear a handful of flour from a large white mass in front of him, and start rolling and pressing it between his palms. After a few seconds of rhythmic kneading, he’d slap the ball of dough on a pan. Then he’d press it flat with his knuckles, roll it up, knead it some more and slap it down again. After doing this a few times, he would put the roti on a cloth pad and stick it to the inside of the tandoor.
Remembering Peru
Knowing as I arrived that this would be the last time I would see her, I had reminded myself that a dying Carmen Callil was still more Carmen Callil than she was dying.
—Julian Barnes, Departure(s)
It is tempting to interpret this deceptively simple sentence as a description of a good death. If dying does not erode your identity, it is perhaps a good death. Barnes’ last (i.e. his latest, and also literally his last) novel constantly circles the themes of identity, memory and death. He keeps saying that memory is identity. And, as with Carmen Callil, he also seems to claim a small victory over death on her behalf—as long as people remember her for who she really was.
The PlotCaptions Dataset: Automating the Narration of Visual Analytics
I’ve always been interested in how we narrate visual analytics. The hardest task in dataviz is not analysis or visualization, but figuring out what to say about it. I used to believe that a well designed chart does not need a narration. That may be valid, but over the years I’ve realized that it is the narrative that turns something that is merely pretty and insightful into something that is viral. Imagine one of Hans Rosling’s talks without him in the picture.
Notes on Optimizing Torch Models
ML researchers are from Mars and the ML engineers responsible for deploying models are from Venus. The two have vastly different motivations. The ML researcher’s job, given a dataset and some compute, is to find the lowest possible loss on a task. In this pursuit, no engineering cost is too high. No tech debt is too large. Worse still, if they get published, they must include their code in their paper. Reproducible research only means that everyone be able to reproduce benchmarks and charts from the original paper. It usually has very little to do with production.