caption

a screenshot of the text:

Tech companies argued in comments on the website that the way their models ingested creative content was innovative and legal. The venture capital firm Andreessen Horowitz, which has several investments in A.I. start-ups, warned in its comments that any slowdown for A.I. companies in consuming content “would upset at least a decade’s worth of investment-backed expectations that were premised on the current understanding of the scope of copyright protection in this country.”

underneath the screenshot is the “Oh no! Anyway” meme, featuring two pictures of Jeremy Clarkson saying “Oh no!” and “Anyway”

screenshot (copied from this mastodon post) is of a paragraph of the NYT article “The Sleepy Copyright Office in the Middle of a High-Stakes Clash Over A.I.

  • vzq@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    99
    arrow-down
    2
    ·
    edit-2
    10 months ago

    We need copyright reform. Life of author plus 70 for everything is just nuts.

    This is not an AI problem. This is a companies literally owning our culture problem.

    • grue@lemmy.world
      link
      fedilink
      English
      arrow-up
      44
      arrow-down
      4
      ·
      edit-2
      10 months ago

      We do need copyright reform, but also fuck “AI.” I couldn’t care less about them infringing on proprietary works, but they’re also infringing on copyleft works and for that they deserve to be shut the fuck down.

      Either that, or all the output of their “AI” needs to be copyleft.

      • SirQuackTheDuck@lemmy.world
        link
        fedilink
        arrow-up
        24
        ·
        edit-2
        10 months ago

        Not just the output. One could construct that training your model on GPL content which would have it create GPL content means that the model itself is now also GPL.

        It’s why my company calls GPL parasitic, use it once and it’s everywhere.

        This is something I consider to be one of the main benefits of this license.

        • grue@lemmy.world
          link
          fedilink
          English
          arrow-up
          14
          arrow-down
          1
          ·
          10 months ago

          It already is.

          If you mean that the output of AI is already copyleft, then sure, I completely agree! What I meant to write that we “need” is legal acknowledgement of that factual reality.

          The companies running these services certainly don’t seem to think so, however, so they need to be disabused of their misconception.

          I apologize if that was unclear. (Not sure the vitriol was necessary, but whatever.)

    • MustrumR@kbin.social
      link
      fedilink
      arrow-up
      37
      arrow-down
      1
      ·
      10 months ago

      Going one step deeper, at the source, it’s oligarchy and companies owning the law and in consequence also its enforcement.

    • OmnipotentEntity@beehaw.org
      link
      fedilink
      arrow-up
      7
      arrow-down
      1
      ·
      10 months ago

      If this is what it takes to get copyright reform, just granting tech companies unlimited power to hoover up whatever they want and put it in their models, it’s not going to be the egalitarian sort of copyright reform that we need. Instead, we will just getting a carve out just for this, which is ridiculous.

      There are small creators who do need at least some sort of copyright control, because ultimately people should be paid for the work they do. Artists who work on commission are the people in the direct firing line of generative AI, both in commissions and in their day jobs. This will harm them more than any particular company. I don’t think models will suffer if they can only include works in the public domain, if the public domain starts in 2003, but that’s not the kind of copyright protection that Amazon, Google, Facebook, etc. want, and that’s not what they’re going to ask for.

      • Rivalarrival
        link
        fedilink
        arrow-up
        1
        ·
        10 months ago

        Copyright protects against creating and distributing copies. Copyright does not protect against reading and understanding a work.

        What LLMs and other models are doing is analogous to reading a book and writing a book report. They are not regurgitating a copy of the book to users. They are not creating or distributing a copy.

        The purpose of copyright laws are to promote the progress of Science and the Useful Arts. The purpose is to expand the depth and breadth of human knowledge and technology. “Fair Use” is not an exception: “Fair Use” is purpose. “Copyright” is the exception.

        If technology is fundamentally incompatible with copyright law, that technology has the right-of-way, and copyright must yield.

        • OmnipotentEntity@beehaw.org
          link
          fedilink
          arrow-up
          1
          ·
          10 months ago

          What LLMs and other models are doing is analogous to reading a book and writing a book report.

          It is purported to be analogous to that. But given that in actuality it can also simply reproduce nearly entire articles word for word from a short prompt, it’s clear that the analogy that you are attempting to draw is flawed. Inside of the LLM, encoded in the weights and biases of the network, is that article and many others, it has been copied into the network, encoded, and can be referenced.

          The Pile is 825GiB of text. ChatGPT-4 is about 400 billion parameters, and each of those parameters is 2 bytes, which is 800GiB of data. There’s certainly enough redundancy in whatever corpus they’re using to just memorize the entire thing and still have sufficient network space leftover to actually make some sense of it.

  • LavaPlanet@lemmy.world
    link
    fedilink
    arrow-up
    63
    arrow-down
    3
    ·
    10 months ago

    Piracy / stealing content is ok for big corps Piracy / stealing content punishable by life in prison for us proletarians

    • Dkarma@lemmy.world
      link
      fedilink
      arrow-up
      22
      arrow-down
      9
      ·
      10 months ago

      This is simply not stealing. Viewing content has never ever ever been stealing.

      There is no view right.

      • 🦄🦄🦄@feddit.de
        link
        fedilink
        arrow-up
        13
        arrow-down
        1
        ·
        10 months ago

        They are downloading the data so thei LLM can “view” it. How is that different than downloading movies to view them?

        • Dkarma@lemmy.world
          link
          fedilink
          arrow-up
          3
          arrow-down
          10
          ·
          edit-2
          10 months ago

          They’re not downloading anything tho. That’s the point. At no point are they posessing the content that the AI is viewing.

          This is LESS intrusive than a Google web scraper. No one trying to sue Google for copyright for Google searches.

          • 🦄🦄🦄@feddit.de
            link
            fedilink
            arrow-up
            6
            arrow-down
            1
            ·
            10 months ago

            What? Of course they are downloading, the content still has to reach their networks and computers.

            • Dkarma@lemmy.world
              link
              fedilink
              arrow-up
              2
              arrow-down
              4
              ·
              edit-2
              10 months ago

              Go look up how ai works. There is no download lol. It’s the exact same principal as web scrapers which have been around for literally decades.

      • Jamyang@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        1
        ·
        10 months ago

        Tech illiterate guy here. All these Ml models require training data, right? So all these AI companies that develop new ML based chat/video/image apps require data. So where exactly do they? It can’t be that their entire dataset is licensed, isn’t it?

        If so, are there any firms that are using these orgs for data theft? How to know if the model has been trained on your data? Sorry if this is not the right place to ask.

        • Dkarma@lemmy.world
          link
          fedilink
          arrow-up
          13
          arrow-down
          3
          ·
          edit-2
          10 months ago

          You know how you look at a pic on the internet and don’t pay? The AI is basically doing the same thing only it’s collecting the effect of the data points ( like pixels in a picture) more accurately. The input no matter what it is only moves a set of weights. That’s all. It does not copy anything it is trained on.

          Yes it can reproduce with some level of accuracy any work just like a painter or musician could replay a piece they see or hear.

          Again, this is not theft any more than u hearing a Song or viewing a selfie.

          • BellaDonna@mujico.org
            link
            fedilink
            arrow-up
            5
            arrow-down
            1
            ·
            10 months ago

            I make the exact argument all the time and it gets ignored. I think people fundamentally don’t understand the tech and can’t conceptualize that AI models train the same way we get ideas and schemas from our own observations.

            • LarmyOfLone@lemm.ee
              link
              fedilink
              arrow-up
              2
              arrow-down
              1
              ·
              10 months ago

              People even deny that AI can “learn” but that they just copy and manipulate data…

          • Jamyang@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            10 months ago

            only it’s collecting the effect of the data points ( like pixels in a picture) more accurately

            Isn’t that the entire point of creativity. though? What separates an artist from a bad painter is the positioning of pixels on a 2-Dimensional plane? If the model collects the positions of pixels together with the pixel RGB (color? Don’t know the technical term for it), then the model is effectively stealing the “pixel configuration and makeup” of that artist which can be reproduced by the said model anywhere if similar prompts were passed to it?

      • Katana314@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        10 months ago

        Could say piracy is just running a program that “views” the content, and then regurgitates its own interpretation of it into local data stores.

        It’s just not very creative, so it’s usually very close.

        • Dkarma@lemmy.world
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          edit-2
          10 months ago

          You could say that but you’d be wrong.downloading is a bitwise copy. Training isn’t even close to the same thing.

        • ruination@discuss.tchncs.de
          link
          fedilink
          arrow-up
          1
          ·
          10 months ago

          Also, I’m pretty sure the argument is more about the unequal enforcement of the law. Copyright should be either enforced fairly or not at all. If AI is allowed to scrape content and regurgitate it, piracy should also be legal.

  • far_university1990@feddit.de
    link
    fedilink
    arrow-up
    24
    arrow-down
    2
    ·
    10 months ago

    Either this kill large AI models (at least commercial). Or it kill some copyright bs in some way. Whatever happens, society wins.

    Second option could also hurt small creator though.

    • LarmyOfLone@lemm.ee
      link
      fedilink
      arrow-up
      10
      arrow-down
      1
      ·
      10 months ago

      I fear this is a giant power grab. What this will lead to is that IP holders, those that own the content that AI needs to train will dictate prices. So all the social media content you kindly gave reddit, facebook, twitter, pictures, all that stuff means you won’t be able to have any free AI software.

      No free / open source AI software means there is a massive power imbalance because now only those who can afford to buy this training data and do it, any they are forced to maximize profits (and naturally inclined anyway).

      Basically they will own the “means of generation” while we won’t.

      • far_university1990@feddit.de
        link
        fedilink
        arrow-up
        6
        arrow-down
        1
        ·
        10 months ago

        Current large model would all be sued to death, no license with IP owner yet, would kill all existing commercial large models. Except all IP owner are named and license granted retroactive, but sound unlikely.

        Hundred of IP owner company and billion of individual IP owner setting prices will probably behave like streaming: price increase and endless fragmentation. Need a license for every IP owner, paperwork will be extremely massive. License might change, expire, same problem as streaming but every time license expire need to retrain entire model (or you infringe because model keep using data).

        And in the EU you have right to be forgotten, so excluded from models (because in this case not transformative enough, ianal but sound like it count as storing), so every time someone want to be excluded, retrain entire model.

        Do not see where it possible to create large model like this with any amount of money, time, electricity. Maybe some smaller models. Maybe just more specific for one task.

        Also piracy exists, do not care about copyright, will just train and maybe even open source (torrent). Might get caught, might not, might become dark market, idk. Will exist though, like deepfakes.

        • LarmyOfLone@lemm.ee
          link
          fedilink
          arrow-up
          2
          ·
          10 months ago

          Yeah those are the myriad of complications this will cause. People are worried about AI, I’m too, but we need smart regulation not to use IP laws that only increases power of the ultra-rich. Because if AI will continue to exist, that will severely distort and limit the market to very specific powerful entities. And that is almost certainly going to be worse than completely unregulated.

    • Honytawk@lemmy.zip
      link
      fedilink
      arrow-up
      11
      arrow-down
      5
      ·
      10 months ago

      I know plenty of small creators who urge me to pirate their content.

      Because all they want is people to enjoy their content, and piracy helps spread their art.

      So even small creators are against copyright.

  • peak_dunning_krueger@feddit.de
    link
    fedilink
    arrow-up
    19
    arrow-down
    1
    ·
    10 months ago

    I mean, I won’t deny that small bit of skill it took to construct a plausible sounding explanation for why the public should support your investment, because it’s “not illegal (yet)”.

  • pelespirit@sh.itjust.works
    link
    fedilink
    arrow-up
    17
    arrow-down
    4
    ·
    10 months ago

    They have chosen to think that if it runs through AI, it is then a derivative, it is not. If I put Disney and Amazon together as a prompt, things come out very similar to their logos and it’s obviously a copyright infringement. The worst part of this, they’ll still steal all of the small artists and avoid the larger ones.

    https://imgur.com/a/Rhgi0OC

    • OpenStars@startrek.website
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      1
      ·
      10 months ago

      Don’t forget that they will then switch sides and try to copyright “their work”, preventing others from even thinking about their work without paying the toll.

      Hey, what if I were to draw two circles…

      COPYRIGHT INFRINGEMENT!

  • mindbleach@sh.itjust.works
    link
    fedilink
    arrow-up
    13
    arrow-down
    1
    ·
    10 months ago

    I don’t give a shit about copyright for training AI.

    But I don’t give a shit about investors, either.

    • webghost0101@sopuli.xyz
      link
      fedilink
      arrow-up
      11
      ·
      10 months ago

      Copyright should ceise to exist and sharing digital copies of any content should be a protected right. The best software is foss anyway.

      But if i cant have that i will settle for techbross going to jail for mass theft. Either the law is equal or it is unlawful.

      • mindbleach@sh.itjust.works
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        10 months ago

        Nah. Even in its current stupid state, copyright has to recognize that sifting through the entire internet to get a gigabyte of linear algebra is pretty goddamn transformative.

        No kidding the machine that speaks English read every book in the library. Fuck else was it gonna do?

        • webghost0101@sopuli.xyz
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          10 months ago

          For some licensing its not about how transformed it is but wether or not it was used.

          Many of the books it read where not supposed to be in this library. The datasets used contains heaps of pirated content.

          I repeat i am for abolishing copyright and legalizing digital piracy, including to train intelligence but if that wont be the case and piracy remains illegal then i want to see the “criminals” punisht. Nothing is worse in law then double standards that punish the small and leave alone the giants.

          Remember this? He downloaded and shared a grand total of 30 songs. https://abcnews.go.com/Business/story?id=8226751&page=1

          • mindbleach@sh.itjust.works
            link
            fedilink
            arrow-up
            2
            ·
            10 months ago

            “This law is immoral but also twist these criminals’ balls off” is a double standard. Mercilessly enforcing shite laws is never a sane position.

            Especially when I am telling you - this isn’t illegal. The overwhelming majority of AI training is plainly transformative, and based on readily-available public materials. A torrented version of a published work is still a published work. I don’t care how they got it, and obviously neither should you, But you want to act infuriated that this cutting-edge technology used dubiously-sourced… text files? Shush.

            • webghost0101@sopuli.xyz
              link
              fedilink
              arrow-up
              1
              ·
              edit-2
              10 months ago

              Thats not quite what i am saying.

              Personally i believe the law to be immoral and should be changed and no one should be punished for using, creating or distributing digital copies. Including tech companies.

              But my personal opinions dont seem to matter. If we as a society chose to enforce these laws and the consequences for people for breaking them is harsh then logicaly those same consequences should apply to the rich and powerful.

              My hope is once these laws threaten the powefull they will finnaly lobby to get fully rid of them so everyone can be free.

              I am not upset they use pirated materials, i am a pirate myself because i believe piracy is morally in the right.

              I am upset that admitting that in this comment could be used as evidence in police investigation, heavy fines and jail for me personally while if your rich enough you wont, and if they try you just fly off to wherever your not prosecuted.

              I fail to realize how it matters how much it is transformed. CopyLEFT works are about any use where something is derived from x. Well they where “used” and the result is therefor a derived work. From that perspective talking about how transformed it is completely unrecognizable is is a strawman because copyleft doesn’t care about. Its designed to destroy copyright by forcing free access on derived works.

              On a sidenote, i am near certain that meta trained llama on personal profiles and messengers from facebook, instagram, whatsapp and that is why it is such a powerful model for its size. That has nothing to do with copyright and probably Fully legal using some twisted legal words. In this i see a form of exploitation that should be punished but Zuckerberg will never see jail for that so yeah any reason to show tech companies that don’t rule the world i am happy to see, even if it uses a law i rather have not existing, especially when that law will keep existing for the rest of us.

              • mindbleach@sh.itjust.works
                link
                fedilink
                arrow-up
                2
                ·
                10 months ago

                I fail to realize how it matters how much it is transformed.

                And ignoring when people explain that it does.

                You want copyleft to be super duper copyright, where even quoting a sentence of a Cory Doctorow novel demands an entire newspaper gets GPL’d forever. We don’t care what fair use says! This anti-copyright goal demands more protection than mere copyright!

                This is silly.

                Transformation is where copyright does not apply, because you did something new. No matter what works you referenced - your thing is different. It’s why Disney can’t sue Wikipedia for articles describing their movies. It’s also why Wikipedia can’t sue OpenAI for models describing their articles.

                • webghost0101@sopuli.xyz
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  edit-2
                  10 months ago

                  I want copyleft to destroy copyright by using the law against itself because I believe that to be the only way.

                  I am all about destabilizing a system that should not exists. I would go as far and say if you read a copyleft book then all your future ideas should be barred from holding patents and become public domain instantly. That is also how i treat my own ideas because my ultimum stance is that:

                  “The highest reward for any intelligent or creative thought is to see everyone adopt it” Copying is truly a form of flattery its tangible proof that you contributed to the world, that others perceive you as good.

                  But that is only half my point and its easily refutable as unrealistic and extremist. I feel like its my second stance that is getting ignored.

                  Which is “if a law is enforced it should be enforced equally towards all social classes

                  That is because I recognize my main stance is an ideal that no politician would take serious. This second one though, how can anyone disagree?

                  Again its about destabilizing the system. Rich people don’t like going to jail so if we enforce the strictest versions of laws we can help motivate lobbyist to get rid of copyright all together.

  • ShortN0te@lemmy.ml
    link
    fedilink
    arrow-up
    8
    arrow-down
    1
    ·
    10 months ago

    That’s the point about money, if you have enough you can simply sue or bribe in order to not lose money.

  • Harbinger01173430@lemmy.world
    link
    fedilink
    arrow-up
    5
    arrow-down
    2
    ·
    10 months ago

    A’ight. Time to self host the entire of the internet in a server and do machine learning with the content I stored. :)