Sentiment Analysis
Collection
A Thai sentiment analysis model fine-tuned from multilingual-e5-large for classifying sentiment in Thai text into positive, negative, neutral, and que • 7 items • Updated
How to use ZombitX64/MultiSent-E5-Pro with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="ZombitX64/MultiSent-E5-Pro") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("ZombitX64/MultiSent-E5-Pro")
model = AutoModelForSequenceClassification.from_pretrained("ZombitX64/MultiSent-E5-Pro")MultiSent-E5-Pro is a fine-tuned sentiment analysis model based on intfloat/multilingual-e5-large, specially optimized for Thai with support for multilingual contexts. The model classifies text into four categories: Positive, Negative, Neutral, and Question.
| Rank | Model | Accuracy | F1-Macro | Notes |
|---|---|---|---|---|
| 🥇 1st | MultiSent-E5-Pro | 84.61% | 84.61% | Best overall |
| 2nd | MultiSent-E5 | 80.62% | 80.62% | Baseline model |
| 3rd | sentiment-103 | 57.40% | 49.87% | Moderate baseline |
| Metric | Score |
|---|---|
| Accuracy | 84.61% |
| F1-Macro | 84.61% |
| F1-Weighted | 84.75% |
| Avg Confidence | 98.53% |
| Low Confidence Rate (<60%) | 0.96% |
| Class | Precision | Recall | F1 | Notes |
|---|---|---|---|---|
| Negative | 91.0% | 84.6% | 87.7% | Excellent |
| Positive | 83.0% | 94.3% | 88.3% | Excellent |
| Neutral | 71.9% | 81.6% | 76.4% | Moderate |
| Question | 94.4% | 79.0% | 86.0% | Good |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = "ZombitX64/MultiSent-E5-Pro"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForSequenceClassification.from_pretrained(model)
text = "ผลิตภัณฑ์นี้ดีมาก ใช้งานง่าย"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted = torch.argmax(probs, dim=-1)
labels = ["Question", "Negative", "Neutral", "Positive"]
print(f"Sentiment: {labels[predicted.item()]} (Confidence: {probs[0][predicted].item():.2%})")
| Application | Suitability |
|---|---|
| Product Reviews | 🟢 Excellent |
| Social Media | 🟢 Excellent |
| Customer Support | 🟢 Excellent |
| Content Moderation | 🟡 Good |
| Research Analysis | 🟡 Good |
| Config | Value |
|---|---|
| Base Model | multilingual-e5-large |
| Params | ~1.02B |
| Classes | 4 |
| Max Length | 512 |
| Training Time | ~27 min |
Data Summary:
@misc{MultiSent-E5-Pro-2024,
title={MultiSent-E5-Pro: Advanced Thai Sentiment Analysis},
author={ZombitX64, Janutsaha K., Saengwichain C.},
year={2024},
url={https://huggingface.co/ZombitX64/MultiSent-E5-Pro},
note={Hugging Face Model Card}
}
@article{wang2024multilingual,
title={Multilingual E5 Text Embeddings: A Technical Report},
author={Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu},
journal={arXiv preprint arXiv:2402.05672},
year={2024}
}
| Role | Name |
|---|---|
| Lead Dev | ZombitX64 |
| Data Scientist | Krittanut Janutsaha |
| Engineer | Chanyut Saengwichain |
Base model
intfloat/multilingual-e5-large