Instructions to use openbmb/DensingLaw-ScalingModels with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/DensingLaw-ScalingModels with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openbmb/DensingLaw-ScalingModels")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("openbmb/DensingLaw-ScalingModels", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/DensingLaw-ScalingModels with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/DensingLaw-ScalingModels"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/DensingLaw-ScalingModels",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/openbmb/DensingLaw-ScalingModels

SGLang

How to use openbmb/DensingLaw-ScalingModels with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/DensingLaw-ScalingModels" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/DensingLaw-ScalingModels",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/DensingLaw-ScalingModels" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/DensingLaw-ScalingModels",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use openbmb/DensingLaw-ScalingModels with Docker Model Runner:
```
docker model run hf.co/openbmb/DensingLaw-ScalingModels
```

DensingLaw-ScalingModels

This repository contains a series of reference models of varying sizes, released as part of our paper, Densing Law of LLMs. These models were trained to establish a robust scaling law, which serves as a foundational component for calculating the "density" of other Large Language Models (LLMs).

📜 Paper | 🤗 Hugging Face Models

💡 Overview

The core contribution of our paper is the concept of LLM Density $\rho$ , defined as the ratio of a model's effective parameter size $\hat{N}$ to its actual parameter size $N$ . To accurately determine a model's effective size, we must first establish a reliable "ruler"—a scaling law that maps training compute to performance on downstream tasks.

The models in this repository serve as that "ruler". We trained a series of six models, ranging from 5 million to 800 million parameters, on a consistent dataset. By measuring their loss on various benchmarks, we fitted a precise scaling function. This function allows us to take any other LLM, measure its performance, and infer its effective parameter size by seeing where it lands on our reference scale.

These models are released to allow researchers to verify our results, build upon our work, and use this established scale for their own model evaluations.

🔬 The Models

We trained six models with architectures designed for scaling. The detailed hyperparameters are listed below.

Table 1: Detailed Hyper-parameters of Models for Loss Estimation

Name	# Para	BS	n_layer	d	d_ffn	n_head	n_kv
0.005B (s1)	5,247,232	32	8	256	640	4	1
0.03B (s2)	31,470,080	32	12	512	1,280	8	2
0.1B (s3)	106,196,736	64	18	768	1,920	12	3
0.2B (s4)	245,416,960	128	24	1,024	2,560	16	2
0.4B (s5)	476,852,480	256	30	1,280	3,200	20	2
0.8B (s6)	828,225,024	512	36	1,536	3,840	24	3

Training Data

As stated in our paper, all reference models were trained on the training corpus of MiniCPM-3-4B (Hu et al., 2024) to ensure consistency.

🎯 Research Context: The Densing Law

Our framework for calculating LLM density involves a two-step estimation process, which is visualized below.

Loss Estimation $f_{1}$ : We first establish the relationship between training compute (approximated as $C \approx 6ND$ and conditional loss $\mathcal L$ on downstream tasks. The models released in this repository are the data points used to fit this curve $\mathcal L = f_1(C)$ .
Performance Estimation $f_{2}$ : We then map the relationship between this loss $\mathcal L$ and a more intuitive performance metric $S$ , such as accuracy $S = f_2(\mathcal L)$ .

By combining these, we can determine the effective compute, and therefore the effective parameter size, for any model based on its performance.

📜 License

This work is released under the Apache 2.0 license.

📚 Citation

If you use our models or the Densing Law concept in your research, please cite our paper:

@misc{xiao2024densinglawllms,
      title={Densing Law of LLMs}, 
      author={Chaojun Xiao and Jie Cai and Weilin Zhao and Guoyang Zeng and Biyuan Lin and Jie Zhou and Zhi Zheng and Xu Han and Zhiyuan Liu and Maosong Sun},
      year={2024},
      eprint={2412.04315},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2412.04315}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train openbmb/DensingLaw-ScalingModels

Paper for openbmb/DensingLaw-ScalingModels

Densing Law of LLMs

Paper • 2412.04315 • Published Dec 5, 2024 • 19