kedar kolluri

kktw

AI & ML interests

None yet

Recent Activity

updated a dataset 3 days ago

thoughtworks/agentic-coding-trajectories

published a dataset 3 days ago

thoughtworks/agentic-coding-trajectories

published an article 16 days ago

SpecJAX: A Speculative Decoding Library for TPUs

View all activity

Organizations

updated a dataset 3 days ago

thoughtworks/agentic-coding-trajectories

Viewer • Updated 3 days ago • 15k • 17

published a dataset 3 days ago

thoughtworks/agentic-coding-trajectories

Viewer • Updated 3 days ago • 15k • 17

published an article 16 days ago

Article

SpecJAX: A Speculative Decoding Library for TPUs

16 days ago

published an article 19 days ago

Article

We Pitted the Cheapest TPU Against an NVIDIA L4. Here's What 6 Experiments Revealed.

19 days ago

•

published an article 21 days ago

Article

1.37x Faster on Alibaba's 80B Code Model: EAGLE3 for Qwen3-Coder-Next

21 days ago

published an article 22 days ago

Article

1.7x Faster on a 218B Model: EAGLE3 Speculative Decoding for GLM-4.7

22 days ago

•

published an article 27 days ago

Article

2x Faster on a 229B MoE: EAGLE3 Speculative Decoding for MiniMax-M2.5

27 days ago

•

published an article 30 days ago

Article

Google Released Gemma-4 Four Days Ago. We Already Made It 1.72× Faster.

30 days ago

•

upvoted an article about 1 month ago

Article

Speculative Decoding in Practice: How EAGLE3 Makes LLMs Faster Without Changing Their Outputs

Apr 3

•

published an article about 1 month ago

Article

Speculative Decoding in Practice: How EAGLE3 Makes LLMs Faster Without Changing Their Outputs

Apr 3

•

kedar kolluri

AI & ML interests

Recent Activity

Organizations

kktw's activity

SpecJAX: A Speculative Decoding Library for TPUs

We Pitted the Cheapest TPU Against an NVIDIA L4. Here's What 6 Experiments Revealed.

1.37x Faster on Alibaba's 80B Code Model: EAGLE3 for Qwen3-Coder-Next

1.7x Faster on a 218B Model: EAGLE3 Speculative Decoding for GLM-4.7

2x Faster on a 229B MoE: EAGLE3 Speculative Decoding for MiniMax-M2.5

Google Released Gemma-4 Four Days Ago. We Already Made It 1.72× Faster.

Speculative Decoding in Practice: How EAGLE3 Makes LLMs Faster Without Changing Their Outputs

Speculative Decoding in Practice: How EAGLE3 Makes LLMs Faster Without Changing Their Outputs