• tal
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    7 months ago

    On the other hand, there are things that a human artist is utterly awful at, that LLM-based generative AIs are amazing at. I mentioned that LLMs are great at producing works in a given style, can switch up virtually effortlessly. I’m gonna do a couple Spiderman renditions in different styles, takes about ten seconds a pop on my system:

    Spiderman as done by Neal Adams:

    Spiderman as done by Alex Toth:

    Spiderman in a noir style done by Darwyn Cooke:

    Spiderman as done by Roy Lichtenstein:

    Spiderman as painted by early-19th-century American landscape artist J. M. W. Turner:

    And yes, I know, fingers, but I’m not generating a huge batch to try to get an ideal image, just doing a quick run to illustrate the point.

    Note that none of the above were actually Spiderman artists, other than Adams, and that briefly.

    That’s something that’s really hard for a human to do, given how a human works, because for a human, the style is a function of the workflow and a whole collection of techniques used to arrive at the final image. Stable Diffusion doesn’t care about techniques, how the image got the way it is – it only looks at the output of those workflows in its training corpus. So for Stable Diffusion, creating an image in a variety of styles or mediums – even ones that are normally very time-consuming to work in – is easy as pie, whereas for a single human artist, it’d be very difficult.

    I think that that particular aspect is what gets a lot of artists concerned. Because it’s (relatively) difficult for humans to replicate artistic styles, artists have treated their “style” as something of their stock-in-trade, where they can sell someone the ability to have a work in their particular style resulting from their particular workflow and techniques that they’ve developed. Something for which switching up styles is little-to-no barrier, like LLM-based generative AIs, upends that business model.

    Both of those are things that a human viewer might want. I might want to say “take that image, but do it in watercolor” or “make that image look more like style X, blend those two styles”. LLMs are great at that. But I equally might want to say “show this scene from another angle with the characters doing something else”, and that’s something that human artists are great at.