There’s an unavoidable, inherent difficulty in fine-tuning deep neural networks for specific tasks. Primarily, it stems from the lack of training data. To a layperson it would seem ridiculous that a pretrained vision model (containing millions of parameters, trained on millions of images) could learn to solve highly specific problems. Therefore, when fine-tuned models do perform well, they seem all the more miraculous. But on the other hand, we also know that it is easier to move from the general or the abstract to the specific, than the reverse.
This post is a follow up to my talk, Practical Image Classification & Object Detection at PyData Delhi 2018. You can watch the talk here: and see the slides here. I spoke at length about the different kinds of problems in computer vision and how they are interpreted in deep learning architectures. I spent a fair bit of time on instance and semantic segmentation (for an introduction to these problems, watch Justin Johnson’s lecture from the Stanford CS231 course here).