Scale AI

company

Verified

https://scale.com/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

tu-trinh-scale authored a paper about 14 hours ago

A StrongREJECT for Empty Jailbreaks

tu-trinh-scale authored a paper about 14 hours ago

Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents

tu-trinh-scale authored a paper about 14 hours ago

Learning to Coordinate with Experts

View all activity

Papers

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

View all Papers

authored 3 papers about 14 hours ago

A StrongREJECT for Empty Jailbreaks

Paper • 2402.10260 • Published Feb 15, 2024

Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents

Paper • 2410.13886 • Published Oct 11, 2024

Learning to Coordinate with Experts

Paper • 2502.09583 • Published Feb 13, 2025

submitted a paper to Daily Papers 2 days ago

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

Paper • 2604.09408 • Published 9 days ago • 3

authored a paper 2 days ago

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

Paper • 2604.09408 • Published 9 days ago • 3

updated a bucket 15 days ago

ScaleAI/hil-bench-swe-images

in ScaleAI/audiomc 22 days ago

Judge prompt for ARS

#4 opened 23 days ago by

submitted a paper to Daily Papers 24 days ago

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

Paper • 2604.10718 • Published 26 days ago • 4

published a bucket about 1 month ago

ScaleAI/hil-bench-swe-databases

updated a bucket about 1 month ago

ScaleAI/hil-bench-sql-artifacts

updated a dataset about 1 month ago

ScaleAI/hil-bench

Viewer • Updated Mar 31 • 200 • 71 • 1

published a dataset about 1 month ago

ScaleAI/MultiChallenge

Viewer • Updated Mar 31 • 266 • 398 • 1

updated a dataset about 1 month ago

ScaleAI/MultiChallenge

Viewer • Updated Mar 31 • 266 • 398 • 1

mohit-raghavendra

in ScaleAI/SWE-Atlas-QnA about 1 month ago

Criterion Granularity Mismatch in Rubric (Example: Task 6905333b74f22949d97ba9f1, Criterion 1.11)

#2 opened about 2 months ago by

andrewpark-scaleai

updated a dataset about 1 month ago

ScaleAI/SWE-Atlas-QnA

Viewer • Updated Mar 31 • 124 • 273 • 15

andrewpark-scaleai

in ScaleAI/SWE-Atlas-QnA about 1 month ago

cannot pull images from today

#4 opened about 1 month ago by

mohit-raghavendra

updated a dataset about 1 month ago

ScaleAI/SWE-Atlas-QnA

Viewer • Updated Mar 31 • 124 • 273 • 15

updated a bucket about 1 month ago

ScaleAI/hil-bench-swe-databases

published a bucket about 1 month ago

ScaleAI/hil-bench-sql-artifacts

published a dataset about 1 month ago

ScaleAI/hil-bench

Viewer • Updated Mar 31 • 200 • 71 • 1