• cordlesslamp
    link
    fedilink
    arrow-up
    9
    ·
    7 months ago

    Too bad those posts are mostly screenshots. I think they only use text-based posts and comments to train the “AI”.

    • Mixel@feddit.de
      link
      fedilink
      arrow-up
      1
      ·
      7 months ago

      They probably also do some OCR on that and then let something other run over that to see if the text makes sense (basically letting another AI grade the output, commonly done to judge what’s a good dataset and what isn’t) and then just feed the ai again. Today you have a shortage of data since the internet is too small (yes I know it sounds crazy) so I wouldn’t wonder if they actually tried to use pictures and ocr to gather a bit more usable data