Monthly-SWEBench Collection A continuously updated benchmark evaluating AI coding agents on real-world software engineering tasks from GitHub issues. • 2 items • Updated 4 days ago • 1
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published Jun 29, 2024 • 36