I’m limited to 24GB of VRAM, and I need pretty large context for my use-case (20k+). I tried “Qwen3-14B-GGUF:Q6_K_XL,” but it doesn’t seem to like calling tools more than a couple times, no matter how I prompt it.

Tried using “SuperThoughts-CoT-14B-16k-o1-QwQ-i1-GGUF:Q6_K” and “DeepSeek-R1-Distill-Qwen-14B-GGUF:Q6_K_L,” but Ollama or LangGraph gives me an error saying these don’t support tool calling.

  • SmokeyDope@lemmy.worldM
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 days ago

    Devstral was released recently specifically trained for tool calling in mind. I havent personally tried it out yet but people say it works good with vscode+roo

    • 10001110101@lemm.eeOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 days ago

      Hmm, Devstral doesn’t call any tools for me in the current stable Ollama version or the current release candidate. Wonder if it’s a bug in ollama or langchain. I’ve since tried “QwQ-32B-GGUF:Q3_K_XL”, and it’s a little better than Qwen3-14B:Q6, but still not quite satisfactory, and is much slower and “thinks” too much.