It’s been a while since I’ve updated my Stable Diffusion kit, and the technology moves so fast that I should probably figure out what new tech is out there.
Is most everyone still using AUTOMATIC’s interface? Any cool plugins people are playing with? Good models?
What’s the latest in video generation? I’ve seen a lot of animated images that seem to retain frame-to-frame adherence very well. Kling 1.6 is out there, but it doesn’t appear to be free or local.
I’m using InvokeAI now. Still on SDXL based models. I’ve been meaning to try Flux.
My own main irritation with Flux is that it’s more-limited in terms of generating pornographic material, which is one thing that I’d like to be able to do. Pony Diffusion has been trained on danbooru tags, and so Pony-based models can recognize prompt terms like this, for which there is a vast library of tagged pornographic material (including some pretty exotic tags) about which knowledge has been trained into Pony models.
https://danbooru.donmai.us/wiki_pages/tag_groups
There are Flux-based derived models that have been trained on pornographic material, but they don’t really have the scope of knowledge that Pony models do.
If one isn’t generating pornography, that’s not really a concern, though.
Flux also doesn’t use negative prompts, which is something to get used to.
It doesn’t have numeric prompt term weighting (though as best I can tell, some adjectives, like “very”, have a limited, somewhat-similar effect).
However, Flux can do some stuff that left me kinda slack-jawed the first time I saw it, like sticking objects in scenes that are casting soft shadows or have faint reflections. I still don’t know how all of this happens internally, assume that there has to be at least some limited degree of computer vision pre-processing on the training corpus to detect light sources at a bare minimum. Like, here’s an image I generated a while back around when I started using Flux:
https://lemmy.today/post/18453614
Like, when that first popped out, I’m just sitting there staring at it trying to figure out how the hell the software was able to do that, to incorporate light sources in the scene with backlighting and reflections and such. The only way you can do that that I can think of — and I’ve written at least a little computer vision software before, so I’m not competely out of the loop on this — is to try to pre-process your training corpus, identify light sources, and then separate their contributions from the image. And just to forestall one suggestion — no, this isn’t simply me happening to generate something very close to a particular image that the thing was trained on. I’ve generated plenty of other images that have placed light sources that affect nearby objects.
Here’s a (NSFW) image from a Tarot deck I generated, with the The Devil trump containing light sources. Same thing.
Another example of an image (Progression) I created using only the prompt in Flux: an image containing a series of panels with a boy transforming into a girl:
https://lemmy.today/post/18460312
Unlike the Turn of the Seasons image that I generated also linked to in another comment here, I did not explicitly specify the content in each panel. I know how to accomplish a somewhat-similar effect with a Stable Diffusion model and with plugins, where basically you divide the image into regions and have prompt weighting that is procedurally-altered in each frame, and I assume that somehow, Flux must be doing something akin to that internally…but Flux figured out how to do all this from a simple natural-language description in the prompt alone, which left me pretty boggled.
Nice. I’ve heard it’s better with fingers, too.
Yes, though I’ve seen it also make errors.
The really bad days in my experience were Stable Diffusion 1.5. I mean, at that point, trying to get anything reasonable finger-wise was just horrendous.
After hitting Stable Diffusion XL, I might have to try a couple goes or inpaint or something, but I could usually get something reasonable. Maybe cut out some prompt terms, or see what nonessential prompt terms I could reduce weighting on to give SD more freedom.