OpenAI's CLIP model using a ViT-L/14 image encoder, trained contrastively on 400 million image-text pairs from the internet. It aligns image and text in a shared embedding space, enabling zero-shot image classification by comparing image embeddings against text label embeddings. The ViT-L/14 variant offers higher accuracy than the smaller ViT-B/32 at greater compute cost.
25,187,308 ↓ · 2,000 ♡
OpenAI's CLIP model using a ViT-B/32 image encoder, the smaller of the two widely deployed CLIP variants. Trained contrastively on 400 million image-text pairs, it aligns image and text representations in a shared embedding space for zero-shot classification and retrieval. The B/32 variant sacrifices accuracy versus ViT-L/14 for faster inference.
21,261,234 ↓ · 931 ♡
OpenAI CLIP ViT-L/14 at 336×336px input resolution, a higher-resolution variant of the standard ViT-L/14 CLIP model. The larger input patch size reduces information loss during tokenization, improving performance on classification tasks requiring fine-grained visual detail. Otherwise shares the same contrastive training on 400M image-text pairs as the base ViT-L/14.
14,075,831 ↓ · 304 ♡
CLIP-ViT-B-32-laion2B-s34B-b79K is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
3,115,049 ↓ · 139 ♡
fashion-clip is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
2,707,371 ↓ · 279 ♡
siglip-so400m-patch14-384 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
2,153,425 ↓ · 673 ♡
siglip-base-patch16-224 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
2,009,831 ↓ · 83 ♡
CLIP-ViT-B-16-laion2B-s34B-b88K is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
2,003,198 ↓ · 38 ♡
clip-vit-base-patch16 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
1,660,532 ↓ · 160 ♡
marqo-fashionSigLIP is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
915,516 ↓ · 74 ♡
BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
873,113 ↓ · 402 ♡
siglip2-so400m-patch14-384 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
608,901 ↓ · 80 ♡
CLIP-ViT-L-14-laion2B-s32B-b82K is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
591,315 ↓ · 63 ♡
siglip2-base-patch16-224 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
568,902 ↓ · 96 ♡
CLIP-convnext_base_w-laion2B-s13B-b82K-augreg is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
507,265 ↓ · 8 ♡
PE-Core-L14-336 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
495,943 ↓ · 52 ♡
siglip2-base-patch16-naflex is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
458,924 ↓ · 27 ♡
CLIP-ViT-H-14-laion2B-s32B-b79K is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
453,740 ↓ · 454 ♡
siglip2-so400m-patch16-naflex is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
363,111 ↓ · 67 ♡
vit_base_patch16_plus_clip_240.laion400m_e31 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
329,695 ↓ · 1 ♡
TinyCLIP-ViT-8M-16-Text-3M-YFCC15M is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
318,508 ↓ · 12 ♡
siglip2-base-patch16-512 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.
292,194 ↓ · 39 ♡