GLM-OCR is a multilingual OCR and document understanding model from ZhipuAI, built on the GLM architecture and supporting text recognition across Chinese, English, French, Spanish, Russian, German, Japanese, and Korean. It treats OCR as a sequence generation task, enabling structured text extraction from document images and screenshots. MIT licensed.
8,366,555 ↓ · 1,700 ♡
blip-image-captioning-base is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,260,611 ↓ · 851 ♡
blip-image-captioning-large is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
804,981 ↓ · 1,473 ♡
trocr-base-printed is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
619,561 ↓ · 206 ♡
PP-OCRv5_server_det is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
602,114 ↓ · 61 ♡
blip2-opt-2.7b-coco is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
585,916 ↓ · 11 ♡
pix2text-mfr is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
451,632 ↓ · 54 ♡
UVDoc is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
417,047 ↓ · 8 ♡
PP-LCNet_x1_0_doc_ori is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
366,785 ↓ · 12 ♡
en_PP-OCRv5_mobile_rec is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
342,559 ↓ · 2 ♡
nougat-base is an open-source image-to-text model available on HuggingFace. Details are sourced from the public model registry.
308,646 ↓ · 189 ♡