CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
• Updated • 1.54k • 1.11k
• 18
Updated • 127
• 6
rootsautomation/RICO-ScreenQA
Viewer
• Updated • 86k • 1.42k
• 11
rootsautomation/ScreenSpot
Viewer
• Updated • 1.27k • 3.09k
• 48
Viewer
• Updated • 1.27k • 973
• 8
Benchmark
• Updated • 5.1k
• 63
Preview
• Updated • 986
• 15
Preview
• Updated • 2.6k
• 25
Viewer
• Updated • 168k • 2.44k
• 5
Preview
• Updated • 17
osunlp/Multimodal-Mind2Web
Viewer
• Updated • 14.2k • 6.18k
• 94
Viewer
• Updated • 259 • 492
• 2
Viewer
• Updated • 253 • 9.55k
• 125
Viewer
• Updated • 7.74k • 201k
• 26
xlangai/ubuntu_osworld_file_cache
Updated • 1M
• 16
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
• 2409.08264
• Published • 48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
• 2405.14573
• Published
Viewer
• Updated • 1.21k • 157
• 6