GenAI Coding assistant updates

A colleague was asking what models I’ve tested for local GenAI coding, and what might fit on a MBP configured with enough RAM. TL;DR:

These are the models I’ve downloaded to the DGX Spark for testing:

codellama:7b               8fdf8f752f6e    3.8 GB    # Decent codegen, mixed bag planning; bigger is better
codellama:34b              685be00e1532    19 GB     # ^
codellama:13b              9f438cb9cd58    7.4 GB    # ^
devstral:24b               9bd74193e939    14 GB     # Haven't tried yet
devstral-2:123b            524a6607f0f5    74 GB     # ^
devstral-small-2:24b       24277f07f62d    15 GB     # ^
glm-4.7-flash:bf16         69c2c86b80aa    59 GB     # Recommended by Opencode, never got much out of it
gpt-oss:20b                17052f91a42e    13 GB     # Mixed bag; decent for planning, wasn't great at coding
gpt-oss:120b               a951a23b46a1    65 GB     # ^
qwen3-coder:30b            06c1097efce0    18 GB     # Second best run locally; was using this before "next"
qwen3-coder-next:q4_K_M    ca06e9e4087c    51 GB     # Best I've run locally so far w Claude Code and Opencode

Also, be aware that ollama out of the box often has a small default context window (global setting) – which is too small to do anything meaningful in a codegen scenario – sample overrides below:

$ cat /etc/systemd/system/ollama.service.d/custom.conf
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
# Environment="OLLAMA_CONTEXT_LENGTH=202752"  # glm-4.7-flash:bf16
Environment="OLLAMA_CONTEXT_LENGTH=262144"  # qwen3-coder-next:latest