Llama 4 Local Setup Guide

Get Meta's strongest open model on your PC. Explaining the two major methods: the one-command "Ollama" and the GUI-friendly "LM Studio."

Llama 4 Setup

Introduction: Which model should you run?

Llama 4 comes in several sizes, but the ones manageable on consumer-grade PCs are mainly the following two. First, decide on your target based on your hardware.

Model VRAM Required (4-bit) Recommended GPU / Environment
Llama 4 12B (Scout) 8GB - 10GB RTX 3060/4060 Ti (12GB/16GB) recommended.
*Can barely run on 8GB VRAM but with no headroom.
Llama 4 70B 24GB - 48GB 1x RTX 3090/4090 (24GB) (for 4-bit GGUF)
or Mac Studio (M2/M3 Max 64GB+)
Llama 4 405B 250GB+ Not runnable on consumer PCs (Requires 4-8x H100)

Method A: Ollama (Recommended)

Method A: Fastest Setup with Ollama

While it uses a terminal (the "black screen"), this is actually the simplest method. A Web UI can be added later.

Step 1

Install Ollama

Go to the official site (ollama.com), click "Download for Windows," and run the installer.

Once installed, confirm the 🦙 icon is present in your system tray (bottom right).

Step 2

Run Llama 4

Open PowerShell or Command Prompt and simply enter the following command; the model download and execution will be automated.

12B Model (Mainstream)

ollama run llama4

* If you have a high-spec PC capable of running 70B:

70B Model (High-end)

ollama run llama4:70b
💡 Hint: The initial download will be several gigabytes and may take some time. Once complete, a >>> prompt will appear. Try talking to it!
Method B: LM Studio (GUI)

Method B: Visual Interaction with LM Studio

Recommended for those who prefer not to use command lines or want to fine-tune GPU settings visually.

Step 1

Install LM Studio

Go to the official site (lmstudio.ai), click "Download LM Studio for Windows," and install it.

Step 2

Search and Download Models

Click the magnifying glass icon (Search) on the left and enter llama 4.

From the search results, pick a model with good "Compatibility" (marked in green) using the filters on the left.

  • The Q4_K_M quantization format is recommended for its good balance.
  • Click the download button, and progress will be shown at the bottom of the screen.
Step 3

Start Chatting

Click the speech bubble icon (AI Chat) on the left, and select the Llama 4 model you just downloaded from the dropdown in the top center.

You're ready to chat. Maxing out GPU Offload in the right-hand settings panel will fully utilize your GPU for faster responses.


Troubleshooting

Q. It's very slow / crashes

Most likely a lack of VRAM.

Q. The language output is strange

While Llama 4 is an English-centric model, its multilingual capabilities are high. If you encounter odd phrasing, try adding the following to the system prompt:

You are a helpful AI assistant. Answer in detailed and natural Japanese (or your preferred language).

🤖 Check Other LLM Setup Guides