Deqing Fu PRO
deqing
AI & ML interests
None yet
Recent Activity
updated a model about 8 hours ago
deqing/vanilla-llama-3.2-1B-dclm-100BT-v1 upvoted a collection about 12 hours ago
Convergent Evolution upvoted a paper about 13 hours ago
Pre-trained Large Language Models Use Fourier Features to Compute
AdditionOrganizations
Convergent Evolution (Addition)
-
deqing/convergent-llama-300M-muon-addition_3digit
Text Generation • 0.3B • Updated • 407 -
deqing/convergent-llama-300M-muon-addition_3digit_seed123
0.3B • Updated • 334 -
deqing/convergent-llama-300M-muon-addition
Text Generation • 0.3B • Updated • 449 -
deqing/convergent-llama-300M-adamw-addition_3digit
Text Generation • 0.3B • Updated • 429
Convergent Evolution (Data)
-
deqing/convergent-llama-300M-muon-original
Text Generation • 0.3B • Updated • 643 -
deqing/convergent-llama-300M-muon-unigram
Text Generation • 0.3B • Updated • 76 -
deqing/convergent-llama-300M-muon-isolate-1
Text Generation • 0.3B • Updated • 1.48k -
deqing/convergent-llama-300M-muon-swap_numbers
Text Generation • 0.3B • Updated • 71
Convergent Evolution
Convergent Evolution (Architecture and Optimizer)
-
deqing/convergent-llama-300M-muon-original
Text Generation • 0.3B • Updated • 643 -
deqing/convergent-gdn-300M-muon-original
Text Generation • 0.3B • Updated • 172 -
deqing/convergent-mamba2-300M-muon-original
Text Generation • 0.3B • Updated • 109 -
deqing/convergent-lstm-4layer-muon-original
Text Generation • 0.2B • Updated • 223
Fourier Language Model
Convergent Evolution
Convergent Evolution (Addition)
-
deqing/convergent-llama-300M-muon-addition_3digit
Text Generation • 0.3B • Updated • 407 -
deqing/convergent-llama-300M-muon-addition_3digit_seed123
0.3B • Updated • 334 -
deqing/convergent-llama-300M-muon-addition
Text Generation • 0.3B • Updated • 449 -
deqing/convergent-llama-300M-adamw-addition_3digit
Text Generation • 0.3B • Updated • 429
Convergent Evolution (Architecture and Optimizer)
-
deqing/convergent-llama-300M-muon-original
Text Generation • 0.3B • Updated • 643 -
deqing/convergent-gdn-300M-muon-original
Text Generation • 0.3B • Updated • 172 -
deqing/convergent-mamba2-300M-muon-original
Text Generation • 0.3B • Updated • 109 -
deqing/convergent-lstm-4layer-muon-original
Text Generation • 0.2B • Updated • 223
Convergent Evolution (Data)
-
deqing/convergent-llama-300M-muon-original
Text Generation • 0.3B • Updated • 643 -
deqing/convergent-llama-300M-muon-unigram
Text Generation • 0.3B • Updated • 76 -
deqing/convergent-llama-300M-muon-isolate-1
Text Generation • 0.3B • Updated • 1.48k -
deqing/convergent-llama-300M-muon-swap_numbers
Text Generation • 0.3B • Updated • 71