I’m limited to 24GB of VRAM, and I need pretty large context for my use-case (20k+). I tried “Qwen3-14B-GGUF:Q6_K_XL,” but it doesn’t seem to like calling tools more than a couple times, no matter how I prompt it.
Tried using “SuperThoughts-CoT-14B-16k-o1-QwQ-i1-GGUF:Q6_K” and “DeepSeek-R1-Distill-Qwen-14B-GGUF:Q6_K_L,” but Ollama or LangGraph gives me an error saying these don’t support tool calling.
Devstral was released recently specifically trained for tool calling in mind. I havent personally tried it out yet but people say it works good with vscode+roo
Hmm, Devstral doesn’t call any tools for me in the current stable Ollama version or the current release candidate. Wonder if it’s a bug in ollama or langchain. I’ve since tried “QwQ-32B-GGUF:Q3_K_XL”, and it’s a little better than Qwen3-14B:Q6, but still not quite satisfactory, and is much slower and “thinks” too much.