It’s been a while since I’ve updated my Stable Diffusion kit, and the technology moves so fast that I should probably figure out what new tech is out there.
Is most everyone still using AUTOMATIC’s interface? Any cool plugins people are playing with? Good models?
What’s the latest in video generation? I’ve seen a lot of animated images that seem to retain frame-to-frame adherence very well. Kling 1.6 is out there, but it doesn’t appear to be free or local.
I don’t do video generation.
I’m mostly moved away from Automatic1111 to ComfyUI. If you’ve ever used an image processing program that uses a flowchart-style of operations to modify images, it looks kinda like that. Comfy’s more work to learn — you need to learn and understand some things that Automatic1111 is doing internally — but:
It’s much more capable at building up complex images and series of dependent processes that are re-generated when you make a change in a workflow.
It can run Flux. Last I looked, Automatic1111 could not. I understand that Forge can, and is a little more like Automatic1111, but I haven’t spent time with it. I’d say that Flux and derived models are quite impressive from a natural language standpoint. My experience on SD and Pony-based models meant that most of the prompts I wrote were basically sequences of keywords. With Flux, it’s far more natural-language looking, and it can do some particularly neat stuff just from the prompt (“The image is a blended composite with a progression from left to right showing winter to spring to summer to autumn.”).
It has queuing. It may be that Automatic1111 has since picked it up, but I found it to be a serious lack back when I was using it.
ComfyUI scales up better if you’re using a lot of plugins. In Automatic1111, a plugin adds buttons and UI elements into each page. In Comfy, a plugin almost always just adds more nodes to the node library, doesn’t go fiddling with the existing UI.
That being said, I’m out-of-date on Automatic1111. But last I looked, the major selling point for me was the SD Ultimate Upscale plugin, and that’s been subsequently ported to ComfyUI.
For me, one major early selling point was that a workflow that I frequently wanted was to (a) generate an image and then (b) perform an SD Ultimate Upscale. In Automatic1111, that required setting up txt2img and SD Ultimate Upscale in img2img, then running a txt2img operation to generate an image, waiting until it finished, manually clicking the button to send the image to img2img, and then manually running the upscale operation. If I change the prompt, I need to go through all that again, sitting and watching progress bars and clicking appropriate buttons. With ComfyUI, I just save a workflow that does all that, and Comfy will rerun everything necessary based on any changes that I make (and won’t rerun things that aren’t). I can just disable the upscale portion of the workflow if I don’t need that bit. ComfyUI was a higher barrier to entry, but it made more-complex tasks much less time-consuming and require less manual nursemaiding from me.
Automatic1111 felt to me like a good, simple first pass to get a prompt to an image and to permit for some level of extensibility. I think that it (and maybe Forge, haven’t poked at that) may be a better introduction to local AI image generation, because the barrier to entry is lower. But ComfyUI feels a lot more like a serious image-manipulation program, something that you’d use to construct elaborate projects.
EDIT: Not exactly what you asked, but since you say that you’re trying to come up to speed again, I’d mention !imageai@sh.itjust.works, which this community does not have in the sidebar. I haven’t been very active there recently, but you can see what at least what sorts of images the small community of users on the Threadiverse are generating, though it’s not specific to local generation. A maybe bigger-picture view would be to look at new stuff on civitai.com. I originally used that to see what prompts and plugins and such were used to generate images that I thought were impressive. By default, ComfyUI saves the entire JSON workflow used to generate an image in the EXIF metadata, and ComfyUI will recreate a workflow from that EXIF data — IIRC can also auto-download missing nodes — if you drop an image on there. I believe that you can just grab an image that you’re impressed with on civitai.com and start working from the point the artist was at.