A colleague was asking what models I’ve tested for local GenAI coding, and what might fit on a MBP configured with enough RAM. TL;DR:
These are the models I’ve downloaded to the DGX Spark for testing:
codellama:7b 8fdf8f752f6e 3.8 GB # Decent codegen, mixed bag planning; bigger is better
codellama:34b 685be00e1532 19 GB # ^
codellama:13b 9f438cb9cd58 7.4 GB # ^
devstral:24b 9bd74193e939 14 GB # Haven't tried yet
devstral-2:123b 524a6607f0f5 74 GB # ^
devstral-small-2:24b 24277f07f62d 15 GB # ^
glm-4.7-flash:bf16 69c2c86b80aa 59 GB # Recommended by Opencode, never got much out of it
gpt-oss:20b 17052f91a42e 13 GB # Mixed bag; decent for planning, wasn't great at coding
gpt-oss:120b a951a23b46a1 65 GB # ^
qwen3-coder:30b 06c1097efce0 18 GB # Second best run locally; was using this before "next"
qwen3-coder-next:q4_K_M ca06e9e4087c 51 GB # Best I've run locally so far w Claude Code and Opencode
Also, be aware that ollama out of the box often has a small default context window (global setting) – which is too small to do anything meaningful in a codegen scenario – sample overrides below:
$ cat /etc/systemd/system/ollama.service.d/custom.conf
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
# Environment="OLLAMA_CONTEXT_LENGTH=202752" # glm-4.7-flash:bf16
Environment="OLLAMA_CONTEXT_LENGTH=262144" # qwen3-coder-next:latest