AI & ML interests

Evaluating AI Agents on Continuous Tasks

Recent Activity

hyd2apse  published a dataset 1 day ago
EvoClaw-Bench/EvoClaw-log
hyd2apse  updated a dataset 19 days ago
EvoClaw-Bench/EvoClaw-log
hyd2apse  updated a Space about 1 month ago
EvoClaw-Bench/README
View all activity

Organization Card

EvoClaw Banner

Evaluate AI on Continuous Tasks

Website arXiv License: MIT

EvoClaw is a general-purpose evaluation harness for AI agents on continuous tasks, where milestones build on each other, dependencies interleave, and context accumulates over a long session. Unlike one-shot benchmarks, EvoClaw challenges agents to complete ordered sequences of tasks within a persistent environment, enabling fine-grained, per-milestone analysis.

models 0

None public yet