If you use humans to fine tune and judge the quality of output, then in some sense, that’s pretty much all the AI can possibly do.
Everyone can see the output, “I don’t know.” and mark it zero (dude). But the meat bags will definitely end up rewarding the model if it instead generates some plausible nonsense.
What if the ai is lying? What if it’s lazy and like “fuck it, this’ll do to keep the meat bag happy.”
What if?!
I’m pretty sure that’s how AI image generation works.
What would explain a lot at least
If you use humans to fine tune and judge the quality of output, then in some sense, that’s pretty much all the AI can possibly do.
Everyone can see the output, “I don’t know.” and mark it zero (dude). But the meat bags will definitely end up rewarding the model if it instead generates some plausible nonsense.