HuggingFace ↗

zero shot image classification models

22 models · ranked by HuggingFace downloads

clip-vit-large-patch14

OpenAI's CLIP model using a ViT-L/14 image encoder, trained contrastively on 400 million image-text pairs from the internet. It aligns image and text in a shared embedding space, enabling zero-shot image classification by comparing image embeddings against text label embeddings. The ViT-L/14 variant offers higher accuracy than the smaller ViT-B/32 at greater compute cost.

25,187,308 ↓ · 2,000 ♡

clip-vit-base-patch32

OpenAI's CLIP model using a ViT-B/32 image encoder, the smaller of the two widely deployed CLIP variants. Trained contrastively on 400 million image-text pairs, it aligns image and text representations in a shared embedding space for zero-shot classification and retrieval. The B/32 variant sacrifices accuracy versus ViT-L/14 for faster inference.

21,261,234 ↓ · 931 ♡

clip-vit-large-patch14-336

OpenAI CLIP ViT-L/14 at 336×336px input resolution, a higher-resolution variant of the standard ViT-L/14 CLIP model. The larger input patch size reduces information loss during tokenization, improving performance on classification tasks requiring fine-grained visual detail. Otherwise shares the same contrastive training on 400M image-text pairs as the base ViT-L/14.

14,075,831 ↓ · 304 ♡

CLIP-ViT-B-32-laion2B-s34B-b79K

CLIP-ViT-B-32-laion2B-s34B-b79K is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

3,115,049 ↓ · 139 ♡

fashion-clip

fashion-clip is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

2,707,371 ↓ · 279 ♡

siglip-so400m-patch14-384

siglip-so400m-patch14-384 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

2,153,425 ↓ · 673 ♡

siglip-base-patch16-224

siglip-base-patch16-224 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

2,009,831 ↓ · 83 ♡

CLIP-ViT-B-16-laion2B-s34B-b88K

CLIP-ViT-B-16-laion2B-s34B-b88K is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

2,003,198 ↓ · 38 ♡

clip-vit-base-patch16

clip-vit-base-patch16 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

1,660,532 ↓ · 160 ♡

marqo-fashionSigLIP

marqo-fashionSigLIP is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

915,516 ↓ · 74 ♡

BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

873,113 ↓ · 402 ♡

siglip2-so400m-patch14-384

siglip2-so400m-patch14-384 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

608,901 ↓ · 80 ♡

CLIP-ViT-L-14-laion2B-s32B-b82K

CLIP-ViT-L-14-laion2B-s32B-b82K is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

591,315 ↓ · 63 ♡

siglip2-base-patch16-224

siglip2-base-patch16-224 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

568,902 ↓ · 96 ♡

CLIP-convnext_base_w-laion2B-s13B-b82K-augreg

CLIP-convnext_base_w-laion2B-s13B-b82K-augreg is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

507,265 ↓ · 8 ♡

PE-Core-L14-336

PE-Core-L14-336 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

495,943 ↓ · 52 ♡

siglip2-base-patch16-naflex

siglip2-base-patch16-naflex is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

458,924 ↓ · 27 ♡

CLIP-ViT-H-14-laion2B-s32B-b79K

CLIP-ViT-H-14-laion2B-s32B-b79K is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

453,740 ↓ · 454 ♡

siglip2-so400m-patch16-naflex

siglip2-so400m-patch16-naflex is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

363,111 ↓ · 67 ♡

vit_base_patch16_plus_clip_240.laion400m_e31

vit_base_patch16_plus_clip_240.laion400m_e31 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

329,695 ↓ · 1 ♡

TinyCLIP-ViT-8M-16-Text-3M-YFCC15M

TinyCLIP-ViT-8M-16-Text-3M-YFCC15M is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

318,508 ↓ · 12 ♡

siglip2-base-patch16-512

siglip2-base-patch16-512 is an open-source zero-shot-image-classification model available on HuggingFace. Details are sourced from the public model registry.

292,194 ↓ · 39 ♡