cross-posted from: https://lemmy.dbzer0.com/post/36841328

Hello, everyone! I wanted to share my experience of successfully running LLaMA on an Android device. The model that performed the best for me was llama3.2:1b on a mid-range phone with around 8 GB of RAM. I was also able to get it up and running on a lower-end phone with 4 GB RAM. However, I also tested several other models that worked quite well, including qwen2.5:0.5b , qwen2.5:1.5b , qwen2.5:3b , smallthinker , tinyllama , deepseek-r1:1.5b , and gemma2:2b. I hope this helps anyone looking to experiment with these models on mobile devices!


Step 1: Install Termux

  1. Download and install Termux from the Google Play Store or F-Droid

Step 2: Set Up proot-distro and Install Debian

  1. Open Termux and update the package list:

    pkg update && pkg upgrade
    
  2. Install proot-distro

    pkg install proot-distro
    
  3. Install Debian using proot-distro:

    proot-distro install debian
    
  4. Log in to the Debian environment:

    proot-distro login debian
    

    You will need to log-in every time you want to run Ollama. You will need to repeat this step and all the steps below every time you want to run a model (excluding step 3 and the first half of step 4).


Step 3: Install Dependencies

  1. Update the package list in Debian:

    apt update && apt upgrade
    
  2. Install curl:

    apt install curl
    

Step 4: Install Ollama

  1. Run the following command to download and install Ollama:

    curl -fsSL https://ollama.com/install.sh | sh
    
  2. Start the Ollama server:

    ollama serve &
    

    After you run this command, do ctrl + c and the server will continue to run in the background.


Step 5: Download and run the Llama3.2:1B Model

  1. Use the following command to download the Llama3.2:1B model:
    ollama run llama3.2:1b
    
    This step fetches and runs the lightweight 1-billion-parameter version of the Llama 3.2 model .

Running LLaMA and other similar models on Android devices is definitely achievable, even with mid-range hardware. The performance varies depending on the model size and your device’s specifications, but with some experimentation, you can find a setup that works well for your needs. I’ll make sure to keep this post updated if there are any new developments or additional tips that could help improve the experience. If you have any questions or suggestions, feel free to share them below!

– llama

    • llama@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      10 hours ago

      Not true. If you load a model that is below your phone’s hardware capabilities it simply won’t open. Stop spreading fud.

      • projectmoon@forum.agnos.is
        link
        fedilink
        arrow-up
        1
        ·
        5 hours ago

        @llama@lemmy.dbzer0.com Depends on the inference engine. Some of them will try to load the model until it blows up and runs out of memory. Which can cause its own problems. But it won’t overheat the phone, no. But if you DO use a model that the phone can run, like any intense computation, it can cause the phone to heat up. Best not run a long inference prompt while the phone is in your pocket, I think.

        • llama@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          5 hours ago

          Thanks for your comment. That for sure is something to look out for. It is really important to know what you’re running and what possible limitations there could be. Not what the original comment said, though.

      • kekmacska@lemmy.zip
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        3
        ·
        5 hours ago

        that’s not how it works. Your phone can easily overheat if you use it too much, even if your device can handle it. Smartphones don’t have cooling like pcs and laptops (except some rog phone and stuff). If you don’t want to fry your processor, only run LLMs on high-end gaming pcs with All in one water cooling

        • llama@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          4 hours ago

          This is all very nuanced and there isn’t a clear cut answer. It really depends on what you’re running, for how long you’re running, your device specs, etc. The LLMs I mentioned in the post did just fine and did not cause any overheating if not used for extended periods of time. You absolutely can run a SMALL LLM and not fry your processor if you don’t overdo it.

          Of course that is something to be mindful of, but that’s not what the person in the original comment said. It does run, but you need to be aware of the limitations and potential consequences. That goes without saying, though.

          Don’t overdo it and your phone will be just fine.