I’ve been looking into self-hosting LLMs or stable diffusion models using something like LocalAI and / or Ollama and LibreChat.
Some questions to get a nice discussion going:
- Any of you have experience with this?
- What are your motivations?
- What are you using in terms of hardware?
- Considerations regarding energy efficiency and associated costs?
- What about renting a GPU? Privacy implications?
Picked up an AMD instinct mi25 to try and do just that. Can get easy-diffusion working after some cussing and voodoo. Cannot get rocm to do ANY llm of any kind, feels like a waste of video ram
Also have a tesla p4 that runs most text-to-image models rather well, but have been unsuccessful at any llm either, even oobabooga can’t seem to run on it.
Have given up because the software stack keeps advancing and leaving my hardware behind. I don’t have $3000 for an a100 or $1300 for an mi100 sooo… until the models can run on older/less powerful hardware, I’m probably sitting out of this game. Even though I’d love to be elbow deep in this one.
Sounds like a rather frustrating journey for you.
It has been. I started in this because I liked picking up kick ass enterprise hardware really cheap and playing around with what it can do. Used enterprise hardware is so damn expensive now, it’s cheaper and easier to do everything with consumer products and use the rx6700 in my gaming rig. Just don’t want that running llms and always on.