Gemmaclaw - Self-Hosting Guide

Gemma4 Self-Hosting Guide

Find the best Gemma configuration for your hardware. Search by GPU, CPU, or RAM to see what works, how fast, and what quality to expect.

CPU: AMD Ryzen 9 5900X 12-Core Processor

RAM: 121GB

GPU: GPU (via Ollama host)

Best: gemma4:31b (87% at N/A)

gemma4:31b Params31.3B QuantQ4_K_M Backendollama Score87% SpeedN/A

functiongemma-270m:latest Params268.10M QuantQ4_K_M Backendollama Score0% SpeedN/A

CPU: AMD Ryzen 9 5900X 12-Core Processor

RAM: 121GB

GPU: NVIDIA GeForce RTX 3090

Best: gemma4:e4b (14% at N/A)

gemma4:e4b Params8.0B QuantQ4_K_M Backendollama Score14% SpeedN/A

gemma-4-12B-it-Q4_K_M Params12B QuantQ4_K_M Backendllama-cpp (b9496) Score10% Speed77 tok/s

gemma-4-12B-it-Q4_K_M Params12B QuantQ4_K_M Backendllama-cpp (b9496) Score4% Speed77 tok/s

CPU: AMD Ryzen 9 5900X 12-Core Processor

RAM: 121GB

GPU: NVIDIA GeForce RTX 3090 (~8.4 GB VRAM used)

Best: gemma-4-12B-it-Q4_K_M (18% at 71 tok/s)

gemma-4-12B-it-Q4_K_M Params12B QuantQ4_K_M Backendllama-cpp (b9496) Score18% Speed71 tok/s

CPU: AMD Ryzen 9 5900X 12-Core Processor

RAM: 121GB

GPU: CPU only

Best: gemma-4-26B-A4B-it-Q4_K_M (73% at N/A)

gemma-4-26B-A4B-it-Q4_K_M Params26B QuantQ4_K_M Backendllama-cpp Score73% SpeedN/A

CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (8 cores)

RAM: 31.9 GB

GPU: CPU only

Best: gemma3:1b (94% at N/A)

gemma3:1b Params Quantnot reported Backendollama Score94% SpeedN/A

Backend	Best For	GPU Support	Notes
Ollama	Most users, GPU setups	CUDA, Metal, ROCm	Easiest setup, automatic model management
llama.cpp	Flexible quantization	CUDA, Metal, Vulkan	More quant options, manual model files
gemma.cpp	CPU-first setups	CPU only (for now)	Google-native, Gemma 2/3 only currently

High-end GPU (24+ GB VRAM): Run Gemma 4 31B Dense or 26B MoE at full precision. RTX 3090/4090, A100, etc.
Mid-range GPU (8-16 GB VRAM): Gemma 4 26B MoE with quantization, or Gemma 4 E4B unquantized.
Apple Silicon (32+ GB unified): Gemma 4 26B MoE via Ollama Metal. 48+ GB can try 31B Dense.
CPU only (16+ GB RAM): Gemma 4 E4B or Gemma 3 4B via Ollama. Viable for interactive use at 140+ tok/s.
CPU only (8-16 GB RAM): Gemma 3 4B or Gemma 2 via gemma.cpp. Smaller but functional.