cross-posted from: https://lemmy.dbzer0.com/post/36841328

Hello, everyone! I wanted to share my experience of successfully running LLaMA on an Android device. The model that performed the best for me was llama3.2:1b on a mid-range phone with around 8 GB of RAM. I was also able to get it up and running on a lower-end phone with 4 GB RAM. However, I also tested several other models that worked quite well, including qwen2.5:0.5b , qwen2.5:1.5b , qwen2.5:3b , smallthinker , tinyllama , deepseek-r1:1.5b , and gemma2:2b. I hope this helps anyone looking to experiment with these models on mobile devices!


Step 1: Install Termux

  1. Download and install Termux from the Google Play Store or F-Droid

Step 2: Set Up proot-distro and Install Debian

  1. Open Termux and update the package list:

    pkg update && pkg upgrade
    
  2. Install proot-distro

    pkg install proot-distro
    
  3. Install Debian using proot-distro:

    proot-distro install debian
    
  4. Log in to the Debian environment:

    proot-distro login debian
    

    You will need to log-in every time you want to run Ollama. You will need to repeat this step and all the steps below every time you want to run a model (excluding step 3 and the first half of step 4).


Step 3: Install Dependencies

  1. Update the package list in Debian:

    apt update && apt upgrade
    
  2. Install curl:

    apt install curl
    

Step 4: Install Ollama

  1. Run the following command to download and install Ollama:

    curl -fsSL https://ollama.com/install.sh | sh
    
  2. Start the Ollama server:

    ollama serve &
    

    After you run this command, do ctrl + c and the server will continue to run in the background.


Step 5: Download and run the Llama3.2:1B Model

  1. Use the following command to download the Llama3.2:1B model:
    ollama run llama3.2:1b
    
    This step fetches and runs the lightweight 1-billion-parameter version of the Llama 3.2 model .

Running LLaMA and other similar models on Android devices is definitely achievable, even with mid-range hardware. The performance varies depending on the model size and your device’s specifications, but with some experimentation, you can find a setup that works well for your needs. I’ll make sure to keep this post updated if there are any new developments or additional tips that could help improve the experience. If you have any questions or suggestions, feel free to share them below!

– llama

  • Cris16228
    link
    fedilink
    arrow-up
    1
    ·
    6 hours ago

    And what’s the purpose of running it locally? Just curious. Is there’s anything really libre or better?

    Is there any difference between LLaMA or any libre model and ChatGPT (the first and popular I know)

    • projectmoon@lemm.ee
      link
      fedilink
      arrow-up
      3
      ·
      6 hours ago

      Most open/local models require a fraction of the resources of chatgpt. But they are usually not AS good in a general sense. But they often are good enough, and can sometimes surpass ChatGPT in specific domains.

      • Cris16228
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        5 hours ago

        Do you know about anything libre? I’m curious to try something. Better if self-hosted (?)

        According to a Youtuber, deekseek (or whatever the name is, the Chinese Open source one) is better than ChatGPT when he tried one simple request of making a Tetris game and ChatGPT gave a broken game while the other one didn’t

        Idk why lol

        • projectmoon@lemm.ee
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          5 hours ago

          They’re probably referring to the 671b parameter version of deepseek. You can indeed self host it. But unless you’ve got a server rack full of data center class GPUs, you’ll probably set your house on fire before it generates a single token.

          If you want a fully open source model, I recommend Qwen 2.5 or maybe deepseek v2. There’s also OLmo2, but I haven’t really tested it.

          Mistral small 24b also just came out and is Apache licensed. That is something I’m testing now.

          • Cris16228
            link
            fedilink
            arrow-up
            1
            ·
            59 minutes ago

            But unless you’ve got a server rack full of data center class GPUs, you’ll probably set your house on fire before it generates a single token.

            Its cold outside and I don’t want to spend money on keeping my house warm so I could… Try

            I’ll check them out! Thank you