• tal
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    edit-2
    5 hours ago

    AI voice synth is pretty solidly-useful in comparison to, say, video generation from scratch. I think that there are good uses for voice synth — e.g. filling in for an aging actor/actress who can’t do a voice any more, video game mods, procedurally-generated speech, etc — but audiobooks don’t really play to those strengths. I’m a little skeptical that in 2025, it’s at the point where it’s a good drop-in replacement for audiobooks. What I’ve heard still doesn’t have emphasis on par with a human.

    I don’t know what it costs to have a human read an audiobook, but I can’t imagine that it’s that expensive; I doubt that there’s all that much editing involved.

    kagis

    https://www.reddit.com/r/litrpg/comments/1426xav/whats_the_average_narrator_cost/

    So I produced my own audiobooks for my Nova Roma series so I know the exact numbers for you:

    $250 per finished hour for the narrator. Books ranged from about 200k words-270k words, which came out to 22 hours, 20 hours, and 25 hours.

    So books 1-3 cost me $5,500, $5,000, and $6,250. I’m contracted for two more books with my narrator, so I expect to spend another 5k-6k for each of those.

    So for a five book series, each one 200k+ words, the total cost out of pocket for me will be about $27,000 give or take to make the series into audiobooks.

    That’s actually lower than I expected. Like, if a book sells at any kind of volume, it can’t be that hard to make that back.

    EDIT: I can believe that it’s possible to build a speech synth system that does do better, mind — I certainly don’t think that there are any fundamental limitations on this. It’d guess that there’s also room for human-assisted stuff, where you have some system that annotates the text with emphasis markers, and the annotated text gets fed into a speech synth engine trained to convert annotated text to voice. There, someone listens to the output and just tweaks the annotated text where the annotation system doesn’t get it quite right. But I don’t think that we’re really there today yet.