You know how Google’s new feature called AI Overviews is prone to spitting out wildly incorrect answers to search queries? In one instance, AI Overviews told a user to use glue on pizza to make sure the cheese won’t slide off (pssst…please don’t do this.)

Well, according to an interview at The Vergewith Google CEO Sundar Pichai published earlier this week, just before criticism of the outputs really took off, these “hallucinations” are an “inherent feature” of  AI large language models (LLM), which is what drives AI Overviews, and this feature “is still an unsolved problem.”

  • Aceticon@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    6 months ago

    The problem is that given the way they combine things is determine by probability, even training it with the greatest bestest of data, the LLM is still going to halucinate because it’s combining multiple sources word by word (roughly) guided only by probabilities derived from language, not logic.

    • MacN'Cheezus
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      Yes, I understand that. But I’m fairly certain the quality of the data will still have a massive influence over how much and how egregiously that happens.

      Basically, what I’m saying is, training your AI on a corpus on shitposts instead of factual information seems like a good way to increase the frequency and magnitude of such hallucinations.

      • Aceticon@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 months ago

        Yeah, true.

        If you train you LLM on exclusivelly Nazi literature (to pick a wild example) don’t expect it to by chance end up making points similar to Marx’s Das Kapital.

        (Personally I think what might be really funny - in the sense of laughter inducing - would be to purposefull train an LLM exclusivelly on a specific kind of weird material).

        • MacN'Cheezus
          link
          fedilink
          English
          arrow-up
          3
          ·
          6 months ago

          Yeah, I mean that’s basically what GPT4Chan did, which someone else already mentioned ITT.

          Basically, this guy took a dataset of several gigabytes worth of archived posts from /pol/ and trained a model on that, then hooked it up to a chatbot and let it loose on the board. You can see the results in this video.