Microsoft Corp. and OpenAI are investigating whether data output from OpenAI’s technology was obtained in an unauthorized manner by a group linked to Chinese artificial intelligence startup DeepSeek, according to people familiar with the matter.
The question isn’t whether they’ve used the same information. It’s whether they’ve faked the process to achieve that 20x efficiency.
Look at it like a dictionary. Writing one from scratch is a huge task, no matter how many other books exist. How do you even go about finding all of the words?
But if other people have already written dictionaries, you can just use their word lists and go from there.
It’s more efficient, but only because it’s a completely different task.
No AI company has ever made any of their own content to train their models, they took what others created, remixed it, and presented it as something new.
Yes, but that doesn’t mean it is more efficient, which is what the whole thing is about.
Let’s pretend we’re not talking about AI, but tuna fishing. OpenTuna is sending hundreds of ships to the ocean to go fishing. It’s extremely expensive, but it gets results.
If another fish distributor shows up out of nowhere selling tuna for 1/10 the price, it would be amazing. But if you found out that they could sell them cheap because they were stealing the fish from OpenTuna warehouses, you wouldn’t argue that the secret to catching fish going forward is theft and stop building boats.
So what happens when OpenTuna runs out of fish to steal and there are no more boats?
Information doesn’t stop being created. AI models need to be constantly trained and updated with new information. One of the biggest issues with GPT3 was the 2021 knowledge cutoff.
Let’s pretend you’re building a legal analysis AI tool that scrapes the web for information on local, state, and federal law in the US. If your model was from January 2008 and was never updated, then gay marriage wouldn’t be legal in the US, the ACA wouldn’t exist, Super PACs would be illegal, the Consumer Financial Protection Bureau wouldn’t exist, zoning ordinances in pretty much every city would be out of date, and openly carrying a handgun in Texas would get you jailtime.
It would essentially be a useless tool, and copying that old training data wouldn’t make a better product no matter how cheap it was to do.
Its actually very much the conversation. The quicker the race to the bottom happens, the quicker this entire bubble bursts, and the quicker we stop torching the planet for imaginary profits.
Are they worried that deepsink too stuff written by others, mixed it up, and repackaged it as it’s own?
Well, yeah, that’s all AI is. An expensive weighted pachinko machine, that uses human made content, and remixes it.
The question isn’t whether they’ve used the same information. It’s whether they’ve faked the process to achieve that 20x efficiency.
Look at it like a dictionary. Writing one from scratch is a huge task, no matter how many other books exist. How do you even go about finding all of the words?
But if other people have already written dictionaries, you can just use their word lists and go from there.
It’s more efficient, but only because it’s a completely different task.
No AI company has ever made any of their own content to train their models, they took what others created, remixed it, and presented it as something new.
This AI model did the same thing.
AI lost its job to AI.
Yes, but that doesn’t mean it is more efficient, which is what the whole thing is about.
Let’s pretend we’re not talking about AI, but tuna fishing. OpenTuna is sending hundreds of ships to the ocean to go fishing. It’s extremely expensive, but it gets results.
If another fish distributor shows up out of nowhere selling tuna for 1/10 the price, it would be amazing. But if you found out that they could sell them cheap because they were stealing the fish from OpenTuna warehouses, you wouldn’t argue that the secret to catching fish going forward is theft and stop building boats.
Yes, I would.
So what happens when OpenTuna runs out of fish to steal and there are no more boats?
Information doesn’t stop being created. AI models need to be constantly trained and updated with new information. One of the biggest issues with GPT3 was the 2021 knowledge cutoff.
Let’s pretend you’re building a legal analysis AI tool that scrapes the web for information on local, state, and federal law in the US. If your model was from January 2008 and was never updated, then gay marriage wouldn’t be legal in the US, the ACA wouldn’t exist, Super PACs would be illegal, the Consumer Financial Protection Bureau wouldn’t exist, zoning ordinances in pretty much every city would be out of date, and openly carrying a handgun in Texas would get you jailtime.
It would essentially be a useless tool, and copying that old training data wouldn’t make a better product no matter how cheap it was to do.
Once tuna runs out, and we run out of boats?
Maybe we then stop destroying the tuna population?
Or, to bring this back to point: the environment will be better off once the AI bubble collapses.
That’s a very important, but entirely separate conversation.
Its actually very much the conversation. The quicker the race to the bottom happens, the quicker this entire bubble bursts, and the quicker we stop torching the planet for imaginary profits.
Is it worth it? Let me work it I put my thing down, flip it and reverse it