Dev Mode Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

1aurent authored a paper 22 days ago

Voxtral TTS

DongfuJiang authored a paper 27 days ago

EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning

DongfuJiang authored a paper 27 days ago

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

View all activity

victor

posted an update 7 days ago

Post

4737

Want to share my enthusiasm for zai-org/GLM-5.1 here too 🔥

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀

4 replies

fffiloni

posted an update 11 days ago

Post

3097

✨ PASD Magnify is back on Hugging Face Spaces

fffiloni/PASD

PASD isn’t recent, but still delivers strong results — worth restoring rather than replacing.

Getting it to run again wasn’t a simple dependency issue.
It relied on parts of diffusers that no longer exist, while moving to Gradio 6 forced a much newer HF stack — and I couldn’t modify the original source directly.

Recreating the old environment wasn’t practical.
So I patched the downloaded code at runtime before import and made it compatible with today’s stack.

That ended up being the only approach that held without forking or freezing everything to outdated versions.

If you’ve used it before (or are curious), feel free to give it another try.

fffiloni

posted an update 19 days ago

Post

2852

✅ Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 🚀

It lets you:
- 🎙️ Separate multiple speakers from an audio file
- 🎬 Extract each speaker directly from a video
- 🎧 Split audio into dialog, music, and sound effects (DnR)
- 🎥 Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here 👉 fffiloni/TIGER-audio-extraction

fffiloni

posted an update 20 days ago

Post

2246

AniDoc is back 🎉

I’ve fixed the Space and brought it back to life:
- ✅ Working again after being broken for a while
- ✅ Updated to Gradio 6
- ✅ Compatible with ZeroGPU
- ✅ Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc

Severian

posted an update 26 days ago

Post

4424

I’ve been working on a new mathematical approach to real-time video compositing and background removal, and I wanted to share a live demo.

Traditionally, real-time keyers either use 3D color-space bounding boxes (which struggle with semi-transparent hair and motion blur) or heavy Machine Learning models (which require massive GPU compute and often suffer from temporal "jitter" on the edges).

I wanted to see if I could solve this using purely deterministic math so it could run client-side in a standard browser.

The engine uses a custom mathematical framework I call CMT SRL SEFA. Instead of looking at raw color values or guessing semantics like an AI, it treats the video feed as complex-encoded sequences. It uses harmonic frequencies to map phase geometry and applies a "Stability Cost Function" to find the global minimum stability. In short: it isolates the foreground from the background by measuring signal complexity and structural contradictions.

Give it a try using your own messy plates and such. As I am not a VFX artist, I am curious to hear thoughts and what should be improved upon and made better

https://severian-cmt-sefa-realtime-vfx-keyer.hf.space/

2 replies

DongfuJiang

authored 3 papers 27 days ago

EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning

Paper • 2603.12698 • Published Mar 13 • 1

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Paper • 2603.19220 • Published Mar 19 • 66

OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

Paper • 2603.20278 • Published Mar 17 • 94

fffiloni

posted an update about 1 month ago

Post

4128

I brought DALL·E mini back to life 🤖🎨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model — still weird, still fun 😄

4 replies

Nymbo

posted an update about 1 month ago

Post

6652

We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.

3 replies

fffiloni

posted an update about 1 month ago

Post

494

A clearer demo for TADA (now multilingual) 🔊🌍

I improved the public demo for TADA — a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

• load the model
• prepare a reference voice (optionally with transcript or Whisper auto-transcription)
• generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

👉 fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)

xianbao

submitted a paper to Daily Papers about 1 month ago

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Paper • 2603.10444 • Published Mar 11 • 12

GeorgeBredis

authored a paper about 2 months ago

Next Embedding Prediction Makes World Models Stronger

Paper • 2603.02765 • Published Mar 3 • 20

GeorgeBredis

submitted a paper to Daily Papers about 2 months ago

Next Embedding Prediction Makes World Models Stronger

Paper • 2603.02765 • Published Mar 3 • 20

victor

submitted a paper to Daily Papers about 2 months ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published Feb 25 • 50

mariagrandury

authored 2 papers 2 months ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Paper • 2510.10159 • Published Oct 11, 2025 • 3

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Paper • 2511.04703 • Published Nov 3, 2025 • 8

victor

posted an update 3 months ago

Post

2728

Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it 👀

https://huggingface.co/blog/upskill

Nymbo

posted an update 3 months ago

Post

2875

Genuine recommendation: You should really use this AutoHotKey macro. Save the file as macros.ahk and run it. Before sending a prompt to your coding agent, press Ctrl + Alt + 1 and paste your prompt to any regular chatbot. Then send the output to the agent. This is the actual, boring, real way to "10x your prompting". Use the other number keys to avoid repeating yourself over and over again. I use this macro prolly 100-200 times per day. AutoHotKey isn't as new or hype as a lot of other workflows, but there's a reason it's still widely used after 17 years. Don't overcomplicate it.

; Requires AutoHotkey v1.1+

; All macros are `Ctrl + Alt + <variable>`

^!1::
    Send, Please help me more clearly articulate what I mean with this message (write the message in a code block):
return

^!2::
    Send, Please make the following changes:
return

^!3::
    Send, It seems you got cut off by the maximum response limit. Please continue by picking up where you left off.
return

In my experience the past few months, Ctrl + Alt + 1 works best with Instruct models (non-thinking). Reasoning causes some models to ramble and miss the point. I've just been using GPT-5.x for this.

Nymbo

posted an update 4 months ago

Post

2815

🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets

I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot

AI & ML interests

Recent Activity

Team members 145

dev-mode-explorers's activity