Qwen 3 (Qwen 2.5) Local Setup Guide

Introduction: A "Scientific" Model Strong in Math and Code

Qwen 3 (formerly the Qwen 2.5 lineage) possesses performance that rivals or even surpasses GPT-4 in mathematics, programming, and logical reasoning.
With model sizes ranging from small to large, you can select the optimal version based on your GPU.

Model	VRAM Required (4-bit)	Recommended GPU / Use Case
Qwen 3 14B	10GB - 12GB	RTX 3060/4070 (12GB) recommended. Balanced type. Sufficient for general coding and translation.
Qwen 3 32B (Best)	20GB - 24GB	RTX 3090/4090 (24GB) recommended. Highly recommended. Matches 70B intelligence while running on a single consumer high-end GPU.
Qwen 3 72B	40GB - 48GB	Mac Studio (64GB+) or Dual GPUs. Overwhelming performance but higher construction difficulty.

Method A: Ollama (Recommended)

Method A: Fastest Setup with Ollama

Environment set up with one command. Qwen updates frequently, and Ollama ensures you always get the latest version.

Step 1

Install Ollama

Download and install the Windows version from the official site (ollama.com).

Step 2

Run Qwen

Open PowerShell and enter the following command corresponding to your target model size.

32B Model (Recommended for RTX 4090/3090 users)

ollama run qwen2.5:32b

* If your GPU memory is 16GB or less:

14B Model (General/Lightweight)

ollama run qwen2.5:14b

💡 Note: Currently, the Ollama library may list it as the latest stable "qwen 2.5" rather than "qwen 3." The above commands will fetch the latest stable release.

Method B: LM Studio (GUI)

Method B: Fine-Tuning with LM Studio

Useful for adjusting context length or setting a permanent system prompt.

Step 1

Search for Models

Enter qwen 2.5 or qwen 3 in the LM Studio search bar.

Choose models from reliable uploaders like the official Qwen account or Bartowski.

Step 2

Select Quantization Level

Choose based on your VRAM capacity.

Q4_K_M (Recommended): Good balance of quality and speed.
Q6_K: Improved precision if you have VRAM to spare.
IQ3_M: Emergency option if VRAM is very tight.

Troubleshooting

Q. It responds in Chinese or Japanese?

Because Qwen’s training data contains a large amount of Chinese, it may occasionally begin responding in Chinese or default to its original Japanese settings.

Solution: Explicitly set your preference in the System Prompt.

You are a highly capable AI assistant. Please always respond in natural English. Conduct your thought process in English as well.

Q. Maximizing Coding Performance

Qwen's "Coder" models are particularly excellent. For purely programming purposes, try using the code-specialized version instead of the general model.

ollama run qwen2.5-coder:32b