Instructions to use seamoke111/HTL-CodeLlama-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use seamoke111/HTL-CodeLlama-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="seamoke111/HTL-CodeLlama-7B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("seamoke111/HTL-CodeLlama-7B") model = AutoModelForCausalLM.from_pretrained("seamoke111/HTL-CodeLlama-7B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use seamoke111/HTL-CodeLlama-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "seamoke111/HTL-CodeLlama-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "seamoke111/HTL-CodeLlama-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/seamoke111/HTL-CodeLlama-7B
- SGLang
How to use seamoke111/HTL-CodeLlama-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "seamoke111/HTL-CodeLlama-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "seamoke111/HTL-CodeLlama-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "seamoke111/HTL-CodeLlama-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "seamoke111/HTL-CodeLlama-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use seamoke111/HTL-CodeLlama-7B with Docker Model Runner:
docker model run hf.co/seamoke111/HTL-CodeLlama-7B
How Do Humans Write Code? Large Models Do It the Same Way Too
Paper: https://arxiv.org/pdf/2402.15729
Code: https://github.com/seamoke/Human-Think-Language
Introduction
For this model, please sure your transformers>=4.39.2.
We introduce HTL, a model which utilizes the complete reasoning process of CoT to enhance PoT. This model was secondarily fine-tuned based on MAmmoTH-Coder-7B
Evaluation
The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
| Model | GSM | GSM-Hard | NumGLUE | MATH | Sim | SVAMP | MAWPS | ASDiV |
|---|---|---|---|---|---|---|---|---|
| MAmmoTH-Coder-7B | 59.4 | 56.3 | 66.4 | 33.4 | 45.9 | 70.7 | 91.9 | 69.3 |
| TORA | 72.6 | 56.0 | 46.2 | 44.6 | 48.5 | 70.4 | 91.3 | 78.7 |
| MAmmoTH-Coder-7B | 65.7 | 58.3 | 75.1 | 34.9 | 50.8 | 74.4 | 94.2 | 73.1 |
Prompt Format
If you want to do HTL:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
I'd like you to solve this problem in 3 steps:
1.Answer the question in plain language without writing any code.\n
2.Output one line of *\n.
3.Write program code based on the solution process in step 1 to solve the problem.\n
### Instruction:
{query}
Let's write a program.
### Response:"
Citation
If you use the models, data, or code from this project, please cite the original paper:
@article{li2024humans,
title={How Do Humans Write Code? Large Models Do It the Same Way Too},
author={Li, Long},
journal={arXiv preprint arXiv:2402.15729},
year={2024}
}
- Downloads last month
- 4