🤗
 
🖼️
 HuggingPics
Fine-tune Vision Transformers for anything using images found on the web.
Check out the video below for a walkthrough of this project! 
Usage
Click on the link below to try it out:
How does it work?
1. You define your search terms
2. We download ~150 images for each and use them to fine-tune a ViT
3. You push your model to HuggingFace's Hub to share your results with the world
Your auto-generated model repo will look something like this. Pretty cool, eh? 
😎
 
Examples
| nateraw/rare-puppers | nateraw/pasta-pizza-ravioli | nateraw/baseball-stadium-foods | nateraw/denver-nyc-paris | |
|---|---|---|---|---|
| term_1 | samoyed | pizza | cotton candy | denver | 
| term_2 | shiba inu | pasta | hamburger | new york city | 
| term_3 | corgi | ravioli | hot dog | paris | 
| term_4 | nachos | |||
| term_5 | popcorn | 
You can see a full list of model repos created using this tool by clicking here





 
![[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification](https://github.com/raoyongming/CAL/raw/master/figs/intro.png) 
 
 
 

 
