Tl;dr

I have no idea what I’m doing, and the desire for a NAS and local LLM has spun me down a rabbit hole. Pls send help.

Failed Attempt at a Tl;dr

Sorry for the long post! Brand new to home servers, but am thinking about building out the setup below (Machine 1 to be on 24/7, Machine 2 to be spun up only when needed for energy efficiency); target budget cap ~ USD 4,000; would appreciate any tips, suggestions, pitfalls, flags for where I’m being a total idiot and have missed something basic:

Machine 1: TrueNAS Scale with Jellyfin, Syncthing/Nextcloud + Immich, Collabora Office, SearXNG if possible, and potentially the *arr apps

On the drive front, I’m considering 6x Seagate Ironwolf 8TB in RAIDz2 for 32TB usable space (waaay more than I think I’ll need, but I know it’s a PITA to upgrade a vdev so trying to future-proof), and I am thinking also want to add in an L2ARC cache (which I think should be something like 500GB-1TB m.2 NVMe SSD); I’d read somewhere that back of the envelope RAM requirements were 1GB RAM to 1TB storage (though the TrueNAS Scale hardware guide definitely does not say this, but with the L2ARC cache and all of the other things I’m trying to run I probably get to the same number), so I’d be looking for around 48GB (though I am under the impression that using an odd number of DIMMs isn’t great for performance, so that might bump up to 64GB across 4x16GB?); I’m ambivalent on DDR4 vs. 5 (and unless there’s a good reason not to, would be inclined to just use DDR4 for cost), but am leaning ECC, even though it may not be strictly necessary

Machine 2: Proxmox with LXC for Llama 3.3, Stable Diffusion, Whisper, OpenWebUI; I’d also like to be able to host a heavily modded Minecraft server (something like All The Mods 9 for 4 to 5 players) likely using Pterodactyl

I am struggling with what to do about GPUs here; I’d love to be able to run the 70b Llama 3.3, it seems like that will require something like 40-50GB VRAM to run comfortably at a minimum, but I’m not sure the best way to get there; I’ve seen some folks suggest 2x3090s is the right balance of value and performance, but plenty of other folks seem to advocate for sticking with the newer 4000 architecture (especially with the 5000 series around the corner and the expectation prices might finally come down); on the other end of the spectrum, I’ve also seen people advocate for going back to P40s

Am I overcomplicating this? Making any dumb rookie mistakes? Does 2 machines seems right for my use cases vs. 1 (or more than 2?)? Any glaring issues with the hardware I mentioned or suggestions for a better setup? Ways to better prioritize energy efficiency (even at the risk of more cost up front)? I was targeting something like USD 4,000 as a soft price cap across both machines, but does that seem reasonable? How much of a headache is all of this going to be to manage? Is there a light at the end of the tunnel?

Very grateful for any advice or tips you all have!


Hi all,

So sorry again for the long post. Just including a little bit of extra context here in case it’s useful about what I am trying to do (I feel like this is the annoying part of an online recipe where you get a life story instead of the actual ingredient list; I at least tried to put that first in this post.) Essentially I am a total noob, but have spent the past several months lurking on forums, old Reddit and Lemmy threads, and have watched many hours of YouTube videos just to wrap my head around some of the basics of home networking, and I still feel like I know basically nothing. But I felt like I finally got to the point where I felt that I could try to articulate what I am trying to do with enough specificity to not be completely wasting all of your time (I’m very cognizant of Help Vampires and definitely do not want to be one!)

Basically my motivation is to move away from non-privacy respecting services and bring as much in-house as possible, but (as is frequently the case), my ambition has far outpaced my skill. So I am hopeful that I can tap into all of your collective knowledge to make sure I can avoid any catastrophic mistakes I am likely to blithely walk myself into.

Here are the basic things I am trying to accomplish with this setup:

• A NAS with a built in media server and associated apps
• Phone backups (including photos) 
• Collaborative document editing
• A local ChatGPT 4 replacement 
• Locally hosted metasearch
• A place to run a modded Minecraft server for myself and a few friends

The list in the tl;dr represent my best guesses for the write software and (partial) hardware to get all of these done. Based on some of my reading, it seemed that a number of folks recommend running TrueNAS baremetal as opposed to in ProxMox for when there is an inevitable stability issue, and that got me thinking more about how it might be valuable to split out these functions across two machines, one to hand heavier workloads when needed but to be turned off when not (e.g. game server, all local AI), and a second machine to function as a NAS with all the associated apps that would hopefully be more power efficient and run 24/7.

There are two things that I think would be very helpful to me at this point:

  1. High level feedback on whether this strategy sounds right given what I am trying to accomplish. I feel like I am breaking the fundamental Keep It Simple Stupid rule and will likely come to regret it.
  2. Any specific feedback on the right hardware for this setup.
  3. Any thoughts about how to best select hardware to maximize energy efficiency/minimize ongoing costs while still accomplishing these goals.

Also, above I mentioned that I am targeted around USD 4,000, but I am willing to be flexible on that if spending more up front will help keep ongoing costs down, or if spending a bit more will lead to markedly better performance.

Ultimately, I feel like I just need to get my hands on something and start screwing things up to learn, but I’d love to avoid any major costly screw ups before I just start ordering parts, thus writing up this post as a reality check before I do just that.

Thanks so much if you read this far down the post, and for all of you who share any thoughts you might have. I don’t really have folks IRL I can talk to about these sorts of things, so I am extremely grateful to be able to reach out to this community. -------

Edit: Just wanted to say a huge thank you to everyone who shared their thoughts! I posted this fully expecting to get no responses and figured it was still worth doing just to write out my plan as it stood. I am so grateful for all of your thoughtful and generous responses sharing your experience and advice. I have to hop offline now, but look forward to responding to any comments I haven’t had a chance to turn to tomorrow. Thanks again! :)

  • Estebiu@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 days ago

    For llama 70B I’m using an rtx a6000; slightly older but it does the job magnificently with hers 48gb of vram.

    • libretech@reddthat.comOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      Wow, that sounds amazing! I think that GPU alone would probably exceed my budget for the whole build lol. Thanks for sharing!

      • Estebiu@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 hours ago

        You can still run smaller models on cheaper gpus, no need for the greatest gpu ever. Btw, I use it for other things too, not only LLMs

    • sntx@lemm.ee
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 days ago

      I’m also on p2p 2x3090 with 48GB of VRAM. Honestly it’s a nice experience, but still somewhat limiting…

      I’m currently running deepseek-r1-distill-llama-70b-awq with the aphrodite engine. Though the same applies for llama-3.3-70b. It works great and is way faster than ollama for example. But my max context is around 22k tokens. More VRAM would allow me more context, even more VRAM would allow for speculative decoding, cuda graphs, …

      Maybe I’ll drop down to a 35b model to get more context and a bit of speed. But I don’t really want to justify the possible decrease in answer quality.

      • Estebiu@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        Uhh, a lot of big words here. I mostly just play around with it… Never used LLMs for anything more serious than a couple of test, so I don’t even know now many tokens can my setup generate…

      • libretech@reddthat.comOP
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 days ago

        This is exactly the sort of tradeoff I was wondering about, thank you so much for mentioning this. I think ultimately I would probably align with you in prioritizing answer quality over context length (but it sure would be nice to have both!!) I think my plan for now based on some of the other comments is to go ahead with the NAS build and keep my eyes peeled for any GPU deals in the meantime (though honestly I am not holding my breath). Once I’ve proved to myself I can something stable without burning the house down, I’ll on something more powerful for the localLLM. Thanks again for sharing!

    • bradd@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 days ago

      I’m running 70b on two used 3090 and an a6000 nvlink. I think i got these for $900ea, and maybe $200 for the nvlink. Also works great.

      • libretech@reddthat.comOP
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 days ago

        Thanks for sharing! Will probably try to go this route once I get the NAS squared away and turn back to localLLMs. Out of curiosity, are you using the q4_k_m quantization type?