Neural networks (NN) and computer vision models in particular are known to perform well in specific tasks, but often fail to generalize to tasks they have not been trained on. A model that performs well on a food data may perform poorly on satellite images.
A new model from OpenAI named CLIP claims to close this gap by a large margin. The paper Open AI wrote presenting CLIP demonstrates how the model may be used on a various classification datasets in a zero-shot manner.
In this article, I will explain the key ideas of the model they proposed and show you the code to use it.