1 1 10

Oussema Harbi

Harbous

oharbi

AI & ML interests

None yet

Recent Activity

reacted to philipp-zettl's post with 👍 about 1 month ago

I've been cooking something neat over the past weeks 👨‍🍳 We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs. The big players use giant clusters of Nvidia H100s. But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something. To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯ But I can write rust and like building ML libraries. So I asked myself the question(s): - can I train SMLs at home on my hardware? - How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]? - how hard can it be to implement bf16 support? The answers are wild, trust me! Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM) Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM) The majority of my time went into the shared memory, but it's stable and I'm very excited! Here some debug logs, a la "trust me bro" ``` ---- Currently available: 1112735744, attempting to reclaim: 1073741824 --- VRAM STATE [backward pass] --- Driver Used: 6744 MB / 7805 MB Data on GPU: 1641 MB Grads on GPU: 3459 MB CPU Offloaded: 18230 MB --------------------------------- Currently available: 1079181312, attempting to reclaim: 1073741824 --- VRAM STATE [backward pass] --- Driver Used: 6776 MB / 7805 MB Data on GPU: 1561 MB Grads on GPU: 3279 MB CPU Offloaded: 18590 MB ----------------------------- ``` Final models get exported in `safetensors` format and are compatible with PyTorch and `transformers`, for accessibility. - [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory

reacted to RakshitAralimatti's post with 👍 4 months ago

Just built my entire AI Engineer portfolio by pasting 2 links (GitHub and LinkedIn) into https://huggingface.co/moonshotai Kimi 2.5. That's it. That's the workflow. Zero coding. Zero iteration. Zero "make the button bigger." See for yourself: https://rakshit2020.github.io/rakshitaralimatti.github.io/ The model: ✅ Scraped my GitHub repos automatically ✅ Pulled my experience from LinkedIn ✅ Designed an Aurora Glass theme ✅ Mapped every skill to projects ✅ Added animations I'd never code myself

reacted to kanaria007's post with 👍 4 months ago

✅ New Article: *Post-Transformer Decision Cores* (v0.1) Title: 🚀 Post-Transformer Decision Cores: Goal-Native Engines Beyond LLMs 🔗 https://huggingface.co/blog/kanaria007/post-tranformer-decision-cores --- Summary: Transformers are powerful—but in SI-Core they’re *not the essence of intelligence*. A *Decision Core* is anything that satisfies the *Jump contracts* (OBS/ETH/MEM/ID/EVAL + RML), and those contracts don’t require next-token prediction. This article sketches what “post-Transformer” looks like in practice: *goal-native, structure-aware controllers* that may use LLMs as tools—but don’t depend on them as the runtime brain. > Don’t relax the contracts. > Replace the engine behind them. --- Why It Matters: • Makes LLMs *optional*: shift them to “genesis / exploration / explanation,” while routine high-stakes Jumps run on structured cores • Improves boring-but-critical properties: *determinism (CAS), fewer inconsistencies (SCI), fewer ETH violations (EAI), better rollback (RBL/RIR)* • Enables gradual adoption via *pluggable Jump engines* and domain-by-domain “primary vs fallback” switching --- What’s Inside: • The architectural inversion: *World → OBS → SIM/SIS → Jump (Decision Core) → RML → Effects* (LLM is just one engine) • Three compatible post-Transformer directions: 1. *World-model + search controllers* (MPC/MCTS/anytime search with explicit GCS + ETH constraints) 2. *Genius-distilled specialized controllers* (distill structure from GeniusTraces; LLM becomes a “genesis tool”) 3. *SIL-compiled Decision Programs* (typed Jump entrypoints, compiler-checked invariants, DPIR/GSPU targeting) • A realistic migration path: LLM-wrapped → Genius library → shadow dual-run → flip primary by domain → SIL-compiled cores • How this connects to “reproducing genius”: GRP provides trace selection/format; this article provides the engine architectures --- 📖 Structured Intelligence Engineering Series

View all activity

Organizations

reacted to philipp-zettl's post with 👍 about 1 month ago

Post

2629

----
Currently available: 1112735744, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used:    6744 MB / 7805 MB
Data on GPU:    1641 MB
Grads on GPU:   3459 MB
CPU Offloaded: 18230 MB
---------------------------------
Currently available: 1079181312, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used:    6776 MB / 7805 MB
Data on GPU:    1561 MB
Grads on GPU:   3279 MB
CPU Offloaded: 18590 MB
-----------------------------

Final models get exported in safetensors format and are compatible with PyTorch and transformers, for accessibility.

- [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory

1 reply

reacted to RakshitAralimatti's post with 👍 4 months ago

Post

3064

Just built my entire AI Engineer portfolio by pasting 2 links (GitHub and LinkedIn) into

moonshotai Kimi 2.5.
That's it. That's the workflow.
Zero coding. Zero iteration. Zero "make the button bigger."
See for yourself: https://rakshit2020.github.io/rakshitaralimatti.github.io/

The model:
✅ Scraped my GitHub repos automatically
✅ Pulled my experience from LinkedIn
✅ Designed an Aurora Glass theme
✅ Mapped every skill to projects
✅ Added animations I'd never code myself

4 replies

reacted to kanaria007's post with 👍 4 months ago

Post

1718

✅ New Article: *Post-Transformer Decision Cores* (v0.1)

Title:
🚀 Post-Transformer Decision Cores: Goal-Native Engines Beyond LLMs
🔗 https://huggingface.co/blog/kanaria007/post-tranformer-decision-cores

---

Summary:
Transformers are powerful—but in SI-Core they’re *not the essence of intelligence*. A *Decision Core* is anything that satisfies the *Jump contracts* (OBS/ETH/MEM/ID/EVAL + RML), and those contracts don’t require next-token prediction.

This article sketches what “post-Transformer” looks like in practice: *goal-native, structure-aware controllers* that may use LLMs as tools—but don’t depend on them as the runtime brain.

> Don’t relax the contracts.
> Replace the engine behind them.

---

Why It Matters:
• Makes LLMs *optional*: shift them to “genesis / exploration / explanation,” while routine high-stakes Jumps run on structured cores
• Improves boring-but-critical properties: *determinism (CAS), fewer inconsistencies (SCI), fewer ETH violations (EAI), better rollback (RBL/RIR)*
• Enables gradual adoption via *pluggable Jump engines* and domain-by-domain “primary vs fallback” switching

---

What’s Inside:
• The architectural inversion: *World → OBS → SIM/SIS → Jump (Decision Core) → RML → Effects* (LLM is just one engine)
• Three compatible post-Transformer directions:

1. *World-model + search controllers* (MPC/MCTS/anytime search with explicit GCS + ETH constraints)
2. *Genius-distilled specialized controllers* (distill structure from GeniusTraces; LLM becomes a “genesis tool”)
3. *SIL-compiled Decision Programs* (typed Jump entrypoints, compiler-checked invariants, DPIR/GSPU targeting)
• A realistic migration path: LLM-wrapped → Genius library → shadow dual-run → flip primary by domain → SIL-compiled cores
• How this connects to “reproducing genius”: GRP provides trace selection/format; this article provides the engine architectures

---

📖 Structured Intelligence Engineering Series

reacted to Hellohal2064's post with 🔥 4 months ago

Post

1710

🚀 Excited to share: The vLLM container for NVIDIA DGX Spark!

I've been working on getting vLLM to run natively on the new DGX Spark with its GB10 Blackwell GPU (SM121 architecture). The results? 2.5x faster inference compared to llama.cpp!

📊 Performance Highlights:
• Qwen3-Coder-30B: 44 tok/s (vs 21 tok/s with llama.cpp)
• Qwen3-Next-80B: 45 tok/s (vs 18 tok/s with llama.cpp)

🔧 Technical Challenges Solved:
• Built PyTorch nightly with CUDA 13.1 + SM121 support
• Patched vLLM for Blackwell architecture
• Created custom MoE expert configs for GB10
• Implemented TRITON_ATTN backend workaround

📦 Available now:
• Docker Hub: docker pull hellohal2064/vllm-dgx-spark-gb10:latest
• HuggingFace: huggingface.co/Hellohal2064/vllm-dgx-spark-gb10

The DGX Spark's 119GB unified memory opens up possibilities for running massive models locally. Happy to connect with others working on the DGX Spark Blackwell!

4 replies

reacted to martinsu's post with 🔥 5 months ago

Post

2052

I wasted days on a GPU node on a bug that shouldn't exist

So I was fine-tuning TildeOPEN-30B and the outputs were... weird. Token ID 179 (<0x00>) kept appearing between almost every token pair. Took me a bit to figure out what was going on.

Turns out I used the fast tokenizer for training, but the model was trained on the slow one. Silent failure.

Well... long story short—TGI uses (forces) the fast tokenizer, no questions asked. And you'll have agile's kryptonite: silent failure. If the model was trained on slow, it's a silent disaster.

I got curious and wrote a quick script to check how common this is. Ran it on 6,014 LLM HF models overnight.

Roughly 10% of HF model downloads have mismatched tokenizers. Not all mismatches are catastrophic, but some are brutal — like chat template markers inflating from 1 token to 3, silently wrecking context windows and causing model act weird.

This wasn't rigorous research, but the drift is real. And the worst part? 968 models(out of 500+ downloads) have both fast and slow tokenizers present, but they still produce different outputs. No missing files, no errors — just silent degradation.

TGI defaults to the fast tokenizer, as does AutoTokenizer.from_pretrained(). If a fast tokenizer doesn't exist, it auto-generates one. If your model was trained on slow, you get silent degradation. Output looks fine; the model just performs worse. Sometimes really worse. You'd never know.

If model was trained on fast tokenizer, its fine, but how do You know?

The root cause? Either model authors run HF conversion and upload both without verifying, or users run TGI, which always forces(converts to) fast .

The result of this fight with tokenizers is martinsu/tildeopen-30b-mu-instruct

It's based on TildeOPEN-30B (a solid EU HPC multilingual base). Nothing fancy—just a proper instruction fine-tune where I didn't mess up the tokenizer this time.

Full article: https://github.com/martins-u/tokenmagedon

1 reply

liked a model about 1 year ago

XiaomiMiMo/MiMo-7B-RL

Text Generation • Updated Jun 5, 2025 • 85.5k • 276

reacted to ImranzamanML's post with 👍 about 1 year ago

Post

2921

🚀 New paper out: "Improving Arabic Multi-Label Emotion Classification using Stacked Embeddings and Hybrid Loss Function"
Improving Arabic Multi-Label Emotion Classification using Stacked Embeddings and Hybrid Loss Function (2410.03979)

In this work, we tackle some major challenges in Arabic multi-label emotion classification especially the issues of class imbalance and label correlation that often hurt model performance, particularly for minority emotions.

Our approach:

Stacked contextual embeddings from fine-tuned ArabicBERT, MarBERT, and AraBERT models.

A meta-learning strategy that builds richer representations.

A hybrid loss function combining class weighting, label correlation matrices, and contrastive learning to better handle class imbalances.

🧠 Model pipeline: stacked embeddings → meta-learner → Bi-LSTM → fully connected network → multi-label classification.

🔍 Extensive experiments show significant improvements across Precision, Recall, F1-Score, Jaccard Accuracy, and Hamming Loss.
🌟 The hybrid loss function in particular helped close the gap between majority and minority classes!

We also performed ablation studies to break down each component’s contribution and the results consistently validated our design choices.

This framework isn't just for Arabic it offers a generalizable path for improving multi-label emotion classification in other low-resource languages and domains.

Big thanks to my co-authors: Muhammad Azeem Aslam, Wang Jun, Nisar Ahmed, Li Yanan, Hu Hongfei, Wang Shiyu, and Xin Liu!

Would love to hear your thoughts on this work! 👇

replied to orasul's post about 1 year ago

Thanks for a great work so far.
It is nice to see someone using more basic ML models to work more efficiently instead of just relying on big models.
I believe one good use-case would be the automatic conversion of diagram images (for engineering or SW) to mermaid diagrams for examples, if the raw text and json outputs are both provided to good coding LLM
Let me know if you are interested in something like that.
This can be a good project - with possible business application - to allow enterprises to make their existing documentation AI ready by doing the conversion.

reacted to orasul's post with 👍 about 1 year ago

Post

2197

hi, it is deki, and now I am open sourced.

An Android AI agent powered by open-source ML model, 𝗱𝗲𝗸𝗶, was fully open-sourced.

It understands what’s on your screen and can perform tasks based on your voice or text commands.

Some examples:
* "Write my friend "some_name" in WhatsApp that I'll be 15 minutes late"
* "Open Twitter in the browser and write a post about something"
* "Read my latest notifications"
* "Write a linkedin post about something"

Currently, it works only on Android — but support for other OS is planned.

The ML and backend codes were also fully open-sourced.

Video prompt example:

"Open linkedin, tap post and write: hi, it is deki, and now I am open sourced. But don't send, just return"

License: GPLv3

You can find other AI agent demos or usage examples, like, code generation or object detection in github.

Github: https://github.com/RasulOs/deki

2 replies

upvoted a collection about 1 year ago

DIRA – Diraya Arabic Reasoning AI

Collection

This is an Arabic Reasoning LLM Collection designed for advanced logical inference and instruction-based reasoning in Arabic via datasets and models. • 7 items • Updated Nov 28, 2025 • 5

reacted to chansung's post with 👍 over 1 year ago

Post

1747

New look for AI powered paper reviews from the list by Hugging Face Daily Papers ( managed by the @akhaliq )

Bookmark the webpage along, check comprehensive reviews by Google DeepMind Gemini 1.5, and listen to audio podcast made by the same tech used in NotebookLM.

Link: https://deep-diver.github.io/ai-paper-reviewer/

This is not an official service by Hugging Face. It is just a service developed by an individual developer using his own money :)

liked a model over 1 year ago

openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5, 2025 • 341k • 1.29k

liked a Space over 1 year ago

Open Universal Arabic Asr Leaderboard

🥇

A benchmark for open-source multi-dialect Arabic ASR models

reacted to singhsidhukuldeep's post with 👍 over 1 year ago

Post

3284

Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG)

Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).

Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.

Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation

Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks

Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.