• @tehciolo@lemm.ee
      link
      fedilink
      152 months ago

      I think you missed the part where you were strongly suggested “not” to use copyrighted text.

      The point is not to get rid of the original text. It’s to “poison” the training data.

      • FaceDeer
        link
        fedilink
        -22 months ago

        If the AI trainers have the original text then “poisoning” the live site’s content isn’t going to do anything at all.

        You can’t touch the original text. It’s already been archived.

        • @tehciolo@lemm.ee
          link
          fedilink
          72 months ago

          If they scrape the updated comments again and ingest copyrighted text, you are poisoning the data.

    • Th4tGuyII
      link
      fedilink
      72 months ago

      Yeah - this is what I was thinking. We all heard about people being unable to delete comments or Reddit keeping comments even after account deletions back during the first migration, so what stops them holding onto comment history - and what stops them using that to teach llms to discern poisoned data from real data as @pixxelkick said.

    • @pixxelkick@lemmy.world
      link
      fedilink
      42 months ago

      Yeah in fact you’re giving the llm additional data to train on what poisoned data looks like so it can avoid it better, as they can clear see the before vs after

      • @InternetPerson@lemmings.world
        link
        fedilink
        32 months ago

        It is necessary to employ a method which enables the training procedure to distinguish copyrighted material. In the “dumbest” case, some humans will have to label it.

        Just because you’ve edited a comment, doesn’t mean that this can be seen as “oh, this is under copyright now”.

        I don’t say it’s technical impossible. To the contrary, it very much is possible. It’s just more work. This drives the development costs up and can give some form of satisfaction to angered ex-reddit users like me. However, those costs will be peanuts for giants like Google / Alphabet.