Llama models

These details have been verified by PyPI

Maintainers

ashwinb daltonf dineshyv raghotham vivic yanxi0830

Project description

🤗 Models on Hugging Face | Blog | Website | Get Started | Llama Cookbook

Llama Models

Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. Part of a foundational system, it serves as a bedrock for innovation in the global community. A few key aspects:

Open access: Easy accessibility to cutting-edge large language models, fostering collaboration and advancements among developers, researchers, and organizations
Broad ecosystem: Llama models have been downloaded hundreds of millions of times, there are thousands of community projects built on Llama and platform support is broad from cloud providers to startups - the world is building with Llama!
Trust & safety: Llama models are part of a comprehensive approach to trust and safety, releasing models and tools that are designed to enable community collaboration and encourage the standardization of the development and usage of trust and safety tools for generative AI

Our mission is to empower individuals and industry through this opportunity while fostering an environment of discovery and ethical AI advancements. The model weights are licensed for researchers and commercial entities, upholding the principles of openness.

Llama Models

Model	Launch date	Model sizes	Context Length	Tokenizer	Acceptable use policy	License	Model Card
Llama 2	7/18/2023	7B, 13B, 70B	4K	Sentencepiece	Use Policy	License	Model Card
Llama 3	4/18/2024	8B, 70B	8K	TikToken-based	Use Policy	License	Model Card
Llama 3.1	7/23/2024	8B, 70B, 405B	128K	TikToken-based	Use Policy	License	Model Card
Llama 3.2	9/25/2024	1B, 3B	128K	TikToken-based	Use Policy	License	Model Card
Llama 3.2-Vision	9/25/2024	11B, 90B	128K	TikToken-based	Use Policy	License	Model Card
Llama 3.3	12/04/2024	70B	128K	TikToken-based	Use Policy	License	Model Card
Llama 4	4/5/2025	Scout-17B-16E, Maverick-17B-128E	10M, 1M	TikToken-based	Use Policy	License	Model Card

Download

To download the model weights and tokenizer:

Visit the Meta Llama website.
Read and accept the license.
Once your request is approved you will receive a signed URL via email.
Install the Llama Models CLI: pip install llama-models. (<-- Start Here if you have received an email already.)
Run llama-model list to show the latest available models and determine the model ID you wish to download. NOTE: If you want older versions of models, run llama-model list --show-all to show all the available Llama models.
Run: llama-model download --source meta --model-id CHOSEN_MODEL_ID
Pass the URL provided when prompted to start the download.

Remember that the links expire after 24 hours and a certain amount of downloads. You can always re-request a link if you start seeing errors such as 403: Forbidden.

CLI Commands Reference

Once installed, the llama-model CLI provides the following commands:

llama-model list              # List available models
llama-model list --show-all   # List all models (including older versions)
llama-model describe -m MODEL_ID     # Show detailed information about a model
llama-model download          # Download models from Meta or Hugging Face
llama-model verify-download   # Verify integrity of downloaded models
llama-model remove -m MODEL_ID       # Remove a downloaded model
llama-model prompt-format -m MODEL_ID  # Show the prompt format for a model

For detailed help on any command, run llama-model COMMAND --help.

Running the models

In order to run the models, you will need to install dependencies after checking out the repository.

# Run this within a suitable Python environment (uv, conda, or virtualenv)
pip install .[torch]

Example scripts are available in models/{ llama3, llama4 }/scripts/ sub-directory. Note that the Llama4 series of models require at least 4 GPUs to run inference at full (bf16) precision.

#!/bin/bash

NGPUS=4
CHECKPOINT_DIR=~/.llama/checkpoints/Llama-4-Scout-17B-16E-Instruct
PYTHONPATH=$(git rev-parse --show-toplevel) \
  torchrun --nproc_per_node=$NGPUS \
  -m models.llama4.scripts.chat_completion $CHECKPOINT_DIR \
  --world_size $NGPUS

The above script should be used with an Instruct (Chat) model. For a Base model, update the CHECKPOINT_DIR path and use the script models.llama4.scripts.completion.

Running inference with FP8 and Int4 Quantization

You can reduce the memory footprint of the models at the cost of minimal loss in accuracy by running inference with FP8 or Int4 quantization. Use the --quantization-mode flag to specify the quantization mode. There are two modes:

fp8_mixed: Mixed precision inference with FP8 for some weights and bfloat16 for activations.
int4_mixed: Mixed precision inference with Int4 for some weights and bfloat16 for activations.

Using FP8, running Llama-4-Scout-17B-16E-Instruct requires 2 GPUs with 80GB of memory. Using Int4, you need a single GPU with 80GB of memory.

MODE=fp8_mixed  # or int4_mixed
if [ $MODE == "fp8_mixed" ]; then
  NGPUS=2
else
  NGPUS=1
fi
CHECKPOINT_DIR=~/.llama/checkpoints/Llama-4-Scout-17B-16E-Instruct
PYTHONPATH=$(git rev-parse --show-toplevel) \
  torchrun --nproc_per_node=$NGPUS \
  -m models.llama4.scripts.chat_completion $CHECKPOINT_DIR \
  --world_size $NGPUS \
  --quantization-mode $MODE

For more flexibility in running inference (including using other providers), please see the Llama Stack toolset.

Access to Hugging Face

We also provide downloads on Hugging Face, in both transformers and native llama4 formats. To download the weights from Hugging Face, please follow these steps:

Visit one of the repos, for example meta-llama/Llama-4-Scout-17B-16E.
Read and accept the license. Once your request is approved, you'll be granted access to all Llama 3.1 models as well as previous versions. Note that requests used to take up to one hour to get processed.
To download the original native weights to use with this repo, click on the "Files and versions" tab and download the contents of the original folder. You can also download them from the command line if you pip install huggingface-hub:

huggingface-cli download meta-llama/Llama-4-Scout-17B-16E-Instruct-Original --local-dir meta-llama/Llama-4-Scout-17B-16E-Instruct-Original

To use with transformers, the following snippet will download and cache the weights:

# inference.py
from transformers import AutoTokenizer, Llama4ForConditionalGeneration
import torch

model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
)

model = Llama4ForConditionalGeneration.from_pretrained(
    model_id, device_map="auto", torch_dtype=torch.bfloat16
)

outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1] :])
print(outputs[0])

 torchrun --nnodes=1 --nproc_per_node=8 inference.py

Installations

You can install this repository as a package by just doing pip install llama-models

Responsible Use

Llama models are a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios. To help developers address these risks, we have created the Responsible Use Guide.

Issues

Please report any software “bug” or other problems with the models through one of the following means:

Reporting issues with the model: https://github.com/meta-llama/llama-models/issues
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
Reporting bugs and security concerns: facebook.com/whitehat/info

Questions

For common questions, the FAQ can be found here, which will be updated over time as new questions arise.

Project details

These details have been verified by PyPI

Maintainers

ashwinb daltonf dineshyv raghotham vivic yanxi0830

Release history Release notifications | RSS feed

This version

0.3.0

Nov 4, 2025

0.2.0

Apr 5, 2025

0.1.5

Feb 28, 2025

0.1.4

Feb 24, 2025

0.1.3

Feb 14, 2025

0.1.2

Feb 7, 2025

0.1.1

Feb 2, 2025

0.1.0

Jan 24, 2025

0.0.63

Dec 18, 2024

0.0.62

Dec 18, 2024

0.0.61

Dec 10, 2024

0.0.60

Dec 10, 2024

0.0.59

Dec 10, 2024

0.0.58

Dec 6, 2024

0.0.57

Dec 3, 2024

0.0.56

Nov 30, 2024

0.0.55

Nov 23, 2024

0.0.54

Nov 22, 2024

0.0.53

Nov 20, 2024

0.0.50

Nov 9, 2024

0.0.49

Nov 5, 2024

0.0.48

Nov 5, 2024

0.0.47

Oct 28, 2024

0.0.46

Oct 25, 2024

0.0.45

Oct 24, 2024

0.0.44

Oct 24, 2024

0.0.43

Oct 22, 2024

0.0.42

Oct 14, 2024

0.0.41

Oct 10, 2024

0.0.40

Oct 4, 2024

0.0.39

Oct 3, 2024

0.0.38

Oct 3, 2024

0.0.37

Oct 2, 2024

0.0.36

Sep 25, 2024

0.0.35

Sep 25, 2024

0.0.24

Sep 24, 2024

0.0.23

Sep 24, 2024

0.0.21

Sep 23, 2024

0.0.19

Sep 18, 2024

0.0.18

Sep 18, 2024

0.0.17

Sep 16, 2024

0.0.16

Sep 14, 2024

0.0.15

Sep 14, 2024

0.0.14

Sep 12, 2024

0.0.13

Sep 5, 2024

0.0.12

Sep 4, 2024

0.0.11

Sep 4, 2024

0.0.10

Aug 27, 2024

0.0.9

Aug 24, 2024

0.0.8

Aug 20, 2024

0.0.7

Aug 19, 2024

0.0.6

Aug 19, 2024

0.0.5

Aug 13, 2024

0.0.4

Aug 8, 2024

0.0.3

Aug 8, 2024

0.0.2

Jul 30, 2024

0.0.1

Jul 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_models-0.3.0.tar.gz (6.6 MB view details)

Uploaded Nov 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_models-0.3.0-py3-none-any.whl (6.6 MB view details)

Uploaded Nov 4, 2025 Python 3

File details

Details for the file llama_models-0.3.0.tar.gz.

File metadata

Download URL: llama_models-0.3.0.tar.gz
Upload date: Nov 4, 2025
Size: 6.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for llama_models-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`6b30bc025ea69021778bc5b5acba1e4b05b47baf4235be284d26b9ea06911f79`
MD5	`ef7995e9c007a5a9937618c312472fbe`
BLAKE2b-256	`aa63a96295a62a6a299ff9574bb4316a767f83fe7da009de936541ac5191e910`

See more details on using hashes here.

File details

Details for the file llama_models-0.3.0-py3-none-any.whl.

File metadata

Download URL: llama_models-0.3.0-py3-none-any.whl
Upload date: Nov 4, 2025
Size: 6.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for llama_models-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7f77f78ff13fca09f70d76a376aff6414cd901623fb9d57e69c2f8367a73032f`
MD5	`4d95733a2362e1c5301522e3735f26c7`
BLAKE2b-256	`3eda36302a2846e4123dc43cd913228ea3a5efd2a56e2f8942f4cf54114745ed`

See more details on using hashes here.

llama-models 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Llama Models

Llama Models

Download

CLI Commands Reference

Running the models

Running inference with FP8 and Int4 Quantization

Access to Hugging Face

Installations

Responsible Use

Issues

Questions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes