Are there any free/open-source TTS options out there that are on the same level as Google Cloud’s? I tried a lot of free ones, but they are absolutely awful and still sound like my Amiga did 30 years ago. With LLMs being available as open source, I am hoping there’s also a good TTS offering I just haven’t found yet.

  • @tal
    link
    English
    11
    edit-2
    8 months ago

    Festival – not cutting edge – will definitely be better than your Amiga, and can handle long text. Last time I set it up, IIRC I wanted some voices generated by Tokyo University or something, which took some setting up. It’ll probably be packaged in your Linux distro.

    You can listen to a demo here.

    https://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html

    It’s not LLM-based.

    For short snippets, offline, one can use Tortoise TTS – which is LLM based. But it’s slow and can only generate clips of a limited length. Whether it’s reasonable for you will depend a lot on your application. It will let one clone – or make a voice sounding more-or-less similar – a voice using some sound samples from them speaking.

    https://github.com/neonbjb/tortoise-tts

    Examples at:

    https://nonint.com/static/tortoise_v2_examples.html

    I haven’t used Google’s, but I’d assume, given that Google is paying people to work on it full time, that whatever they’ve done probably sounds nicer. But, then not open source, so…shrugs

    • @state_electrician@discuss.tchncs.deOP
      link
      fedilink
      English
      28 months ago

      Ah, I looked at Tortoise, but I do not have an nVidia GPU, so I couldn’t try it. Festival I tried and the results were bad. Not so much for the voice, but for intonation and pronunciation.

      • @tal
        link
        English
        4
        edit-2
        8 months ago

        Ah, I looked at Tortoise, but I do not have an nVidia GPU, so I couldn’t try it.

        I use it on an AMD GPU.

        EDIT: Wait, let me make sure. I was using an Nvidia GPU for a while and switched to AMD.

        EDIT2: Oh, yeah, it uses transformers, and that doesn’t work on rocm presently, IIRC.