very upsetting

Arthur Besse · edit-2 5 months ago

very upsetting

@vzq@lemmy.blahaj.zone · edit-2 5 months ago

We need copyright reform. Life of author plus 70 for everything is just nuts.

This is not an AI problem. This is a companies literally owning our culture problem.

@grue@lemmy.world · edit-2 5 months ago

We do need copyright reform, but also fuck “AI.” I couldn’t care less about them infringing on proprietary works, but they’re also infringing on copyleft works and for that they deserve to be shut the fuck down.

Either that, or all the output of their “AI” needs to be copyleft.

@SirQuackTheDuck@lemmy.world · edit-2 5 months ago

Not just the output. One could construct that training your model on GPL content which would have it create GPL content means that the model itself is now also GPL.

It’s why my company calls GPL parasitic, use it once and it’s everywhere.

This is something I consider to be one of the main benefits of this license.

@LarmyOfLone@lemm.ee · 5 months ago

So if I read a copyleft text or code once, because I understood and learned from it any text I write in the future also has to be copyleft?

HOLY SHIT!

Trailblazing Braille Taser · 5 months ago

Doctor here, I’m sorry to inform you that you have a case of parasitic copyleftiosis. Your brain is copyleft, your body is copyleft, and even your future children will be copyleft.

@LarmyOfLone@lemm.ee · 5 months ago

GPL. Not even once!

@SirQuackTheDuck@lemmy.world · 5 months ago

Yes, now gimme that brain of yours. My comment was GPL too.

@Dkarma@lemmy.world · 5 months ago

It already is. God you uninformed people are insufferable.

@grue@lemmy.world · 5 months ago

It already is.

If you mean that the output of AI is already copyleft, then sure, I completely agree! What I meant to write that we “need” is legal acknowledgement of that factual reality.

The companies running these services certainly don’t seem to think so, however, so they need to be disabused of their misconception.

I apologize if that was unclear. (Not sure the vitriol was necessary, but whatever.)

@Dkarma@lemmy.world · 5 months ago

There have already been cases decided… That’s enough

MustrumR · 5 months ago

Going one step deeper, at the source, it’s oligarchy and companies owning the law and in consequence also its enforcement.

@OmnipotentEntity@beehaw.org · 5 months ago

If this is what it takes to get copyright reform, just granting tech companies unlimited power to hoover up whatever they want and put it in their models, it’s not going to be the egalitarian sort of copyright reform that we need. Instead, we will just getting a carve out just for this, which is ridiculous.

There are small creators who do need at least some sort of copyright control, because ultimately people should be paid for the work they do. Artists who work on commission are the people in the direct firing line of generative AI, both in commissions and in their day jobs. This will harm them more than any particular company. I don’t think models will suffer if they can only include works in the public domain, if the public domain starts in 2003, but that’s not the kind of copyright protection that Amazon, Google, Facebook, etc. want, and that’s not what they’re going to ask for.

@Rivalarrival · 5 months ago

Copyright protects against creating and distributing copies. Copyright does not protect against reading and understanding a work.

What LLMs and other models are doing is analogous to reading a book and writing a book report. They are not regurgitating a copy of the book to users. They are not creating or distributing a copy.

The purpose of copyright laws are to promote the progress of Science and the Useful Arts. The purpose is to expand the depth and breadth of human knowledge and technology. “Fair Use” is not an exception: “Fair Use” is purpose. “Copyright” is the exception.

If technology is fundamentally incompatible with copyright law, that technology has the right-of-way, and copyright must yield.

@OmnipotentEntity@beehaw.org · 5 months ago

What LLMs and other models are doing is analogous to reading a book and writing a book report.

It is purported to be analogous to that. But given that in actuality it can also simply reproduce nearly entire articles word for word from a short prompt, it’s clear that the analogy that you are attempting to draw is flawed. Inside of the LLM, encoded in the weights and biases of the network, is that article and many others, it has been copied into the network, encoded, and can be referenced.

The Pile is 825GiB of text. ChatGPT-4 is about 400 billion parameters, and each of those parameters is 2 bytes, which is 800GiB of data. There’s certainly enough redundancy in whatever corpus they’re using to just memorize the entire thing and still have sufficient network space leftover to actually make some sense of it.