AI Tools.

Search

sentence similarity

paraphrase-multilingual-mpnet-base-v2

Multilingual MPNet embedding model from the sentence-transformers library, producing 768-dimensional vectors across 50+ languages. Uses an MPNet backbone extended to multilingual training for higher-quality multilingual embeddings than the lighter MiniLM multilingual variant. Suitable when the 384-dim paraphrase-multilingual-MiniLM-L12-v2 is insufficient in accuracy.

Last reviewed

Use cases

  • Multilingual semantic search requiring 768-dim precision
  • Cross-lingual similarity scoring across 50+ language pairs
  • Multilingual clustering where embedding quality matters more than size
  • Cross-lingual paraphrase detection in translation quality workflows
  • Multilingual RAG pipeline embedding where BGE-M3 is over-resourced

Pros

  • MPNet backbone produces higher-quality embeddings than MiniLM at equivalent multilingual coverage
  • 768-dim outputs over 50+ languages in a single model
  • Apache 2.0 license; sentence-transformers library compatible
  • Better accuracy than paraphrase-multilingual-MiniLM-L12-v2 on STS benchmarks

Cons

  • 768-dim doubles storage cost vs. 384-dim MiniLM multilingual models
  • Slower inference than MiniLM variants at equivalent hardware
  • 50+ language coverage, not 100+ like BGE-M3 or multilingual-e5
  • No instruction prefix support — asymmetric retrieval queries may underperform
  • English still outperforms low-resource languages despite multilingual training

FAQ

What is paraphrase-multilingual-mpnet-base-v2 used for?

Multilingual semantic search requiring 768-dim precision. Cross-lingual similarity scoring across 50+ language pairs. Multilingual clustering where embedding quality matters more than size. Cross-lingual paraphrase detection in translation quality workflows. Multilingual RAG pipeline embedding where BGE-M3 is over-resourced.

Is paraphrase-multilingual-mpnet-base-v2 free to use?

paraphrase-multilingual-mpnet-base-v2 is an open-source model published on HuggingFace. License terms vary by model — check the model card for the specific license.

How do I run paraphrase-multilingual-mpnet-base-v2 locally?

Most HuggingFace models can be loaded with transformers or the appropriate framework library. See the model card for framework-specific instructions and hardware requirements.

Tags

sentence-transformerspytorchtfonnxsafetensorsopenvinoxlm-robertafeature-extractionsentence-similaritytransformerstext-embeddings-inferencemultilingualarbgcacsdadeelen