Convert samples of 'Hands on Large Language Models (LLM)' to run on macOS.
- You can find all scripts such as
ch01.pyin the root directory. - For
/notedirectory, it contains some Q&A that I was confused about which was answering by Gemini 2.5 Flash/Pro. - For
/resultdirectory, it contains the result of running the script per chapter.
- macOS (Apple Silicon M series recommended)
- Python 3.12
- uv package manager
- At least 20GB of free disk space (for model storage)
curl -LsSf https://astral.sh/uv/install.sh | shcd /path/to/projectThe project uses uv to manage dependencies, which will automatically install packages defined in pyproject.toml:
uv syncFirst-time setup requires downloading the Phi-3 model (~7.6GB):
uv run python download_model.pyThis will download the model to the ./model/Phi-3-mini-4k-instruct/ directory.
Note: Download may take several minutes depending on your network speed.
After downloading the model, run the example script such as ch01.py:
uv run python ch01.pyThis script will:
- Load the Phi-3 model and tokenizer from local storage
- Create a text generation pipeline
- Generate a funny joke about chickens
Edit the messages variable in ch01.py to change the input:
messages = [
{"role": "user", "content": "Your custom prompt here"}
]README.md # Project documentation
pyproject.toml # Project configuration and dependencies
download_model.py # Model download script
ch01.py # Example ch01: Basic text generation
note/ # Q&A notes
result/ # Result of running the script per chapter
model/ # Model storage directory (created after first run)
Phi-3-mini-4k-instruct/
...
.gitignore # Git ignore configuration
The script is configured by default to use MPS (Metal Performance Shaders):
device_map="mps"If you're using an Intel Mac, change the device configuration in ch01.py to:
device_map="cpu"If the download fails, try:
- Check your network connection
- Use a proxy or VPN
- Re-run
download_model.py
Phi-3-mini requires approximately 8GB RAM. If you encounter memory issues:
- Close other applications
- Consider using a smaller model
- Apple Silicon Mac: Ensure you're using
device_map="mps" - Intel Mac: CPU inference will be slower, which is expected
This project is for learning and demonstration purposes only.