Instructions to use Misraj/Baseer__Nakba with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Misraj/Baseer__Nakba with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Misraj/Baseer__Nakba") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Misraj/Baseer__Nakba") model = AutoModelForImageTextToText.from_pretrained("Misraj/Baseer__Nakba") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Misraj/Baseer__Nakba with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Misraj/Baseer__Nakba" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Misraj/Baseer__Nakba", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Misraj/Baseer__Nakba
- SGLang
How to use Misraj/Baseer__Nakba with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Misraj/Baseer__Nakba" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Misraj/Baseer__Nakba", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Misraj/Baseer__Nakba" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Misraj/Baseer__Nakba", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Misraj/Baseer__Nakba with Docker Model Runner:
docker model run hf.co/Misraj/Baseer__Nakba
Baseer-Nakba HTR: A State-of-the-Art VLM for Arabic Handwritten Text Recognition
Overview
This repository contains the model weights and inference pipeline for our submission to the NAKBA NLP 2026 Arabic Handwritten Text Recognition (HTR) competition.
Our approach adapts the 3B-parameter Baseer Vision-Language Model (VLM) to effectively parse and recognize highly cursive, historical Arabic manuscripts. Through a progressive training pipeline, domain-matched data augmentation, and advanced checkpoint merging, this unified model mitigates the challenges of varying writer styles, age-related document degradation, and morphological complexity.
To try our Baseer model for document extraction, please visit: Baseer β Baseer is the SOTA model on Arabic Document Extraction.
π Competition Results
Our final model (Misraj AI) secured 1st place on the official Nakba hidden test set leaderboard.
| Rank | Team | CER | WER |
|---|---|---|---|
| π₯ 1st | Misraj AI | 0.0790 | 0.2440 |
| π₯ 2nd | Oblevit | 0.0925 | 0.3268 |
| π₯ 3rd | 3reeq | 0.0938 | 0.2996 |
| 4th | Latent Narratives | 0.1050 | 0.3106 |
| 5th | Al-Warraq | 0.1142 | 0.3780 |
| 6th | Not Gemma | 0.1217 | 0.3063 |
| 7th | NAMAA-Qari | 0.1950 | 0.5194 |
| 8th | Fahras | 0.2269 | 0.5223 |
| β | Baseline | 0.3683 | 0.6905 |
Training Methodology
Our model was trained using a multi-stage Supervised Fine-Tuning (SFT) curriculum.
- Data Augmentation: The Muharaf enhancement dataset was converted to grayscale to match the visual complexity and tonal distribution of the Nakba competition data.
- Decoder-Only SFT: We first trained the text decoder autoregressively on the structurally similar Muharaf dataset to condition the language modeling head.
- Full Encoder-Decoder Tuning: We subsequently unfroze the vision encoder and trained the full architecture on the Nakba dataset using differential learning rates β a key step that yielded a >5% improvement in WER over decoder-only tuning.
- Checkpoint Merging: To stabilize predictions and maximize generalization, we merged our top-performing checkpoints (Epoch 1 and Epoch 5) using SLERP interpolation.
Training Hyperparameters
All supervised experiments were conducted with standardized hyperparameters across configurations.
| Parameter | Value |
|---|---|
| Hardware | 2Γ NVIDIA H100 GPUs |
| Base Model | 3B-parameter Baseer |
| Epochs | 5 |
| Optimizer | AdamW |
| Weight Decay | 0.01 |
| Learning Rate Schedule | Cosine |
| Batch Size | 128 |
| Max Sequence Length | 1200 tokens |
| Input Image Resolution | 644 Γ 644 pixels |
| Decoder-Only Learning Rate | 1e-4 |
| Encoder Learning Rate | 9e-6 |
| Decoder Learning Rate (Full Tuning) | 1e-4 |
Image Examples
The model works reliably on images from the Nakba dataset and visually similar historical manuscripts.
Merge Method
This model was merged using the SLERP merge method.
Models Merged
Baseer_Nakba_ep_1Baseer_Nakba_ep_5
Configuration
merge_method: slerp
base_model: Baseer_Nakba_ep_1
models:
- model: Baseer_Nakba_ep_1
- model: Baseer_Nakba_ep_5
parameters:
t:
- value: 0.50
dtype: bfloat16
Citation
If you use this model or find our work helpful, please consider citing our paper:
@inproceedings{misrajai2026nakba,
title = {Adapting Vision-Language Models for Historical Arabic Handwritten Text Recognition},
author = {Misraj AI},
booktitle = {Nakba OCR Competition, NLP 2026},
year = {2026}
}
Links
- π€ Model weights: Misraj/Baseer__Nakba
- π» Inference pipeline: misraj-ai/Nakba-pipeline
- π Live demo: baseerocr.com
- π Competition: Nakba Codabench
- Downloads last month
- 1,729


