This repository contains the model presented in the paper UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface.

UFO unifies object-level detection, pixel-level segmentation, and image-level vision-language tasks into a single model by transforming all perception targets into the language space. It introduces a novel embedding retrieval approach that relies solely on the language interface to support segmentation tasks.

For more details, please refer to the original paper and the GitHub repository:

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for kanashi6/UFO

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

Paper • 2503.01342 • Published Mar 3, 2025 • 8