• @tal
    link
    English
    261 month ago

    Well, you’ve got a timestamped copy of much of the Web that existed up until latent-diffusion models at archive.org. That may not give you access to newer information, but it’s a pretty whopping big chunk of data to work with.

    • palordrolap
      link
      fedilink
      211 month ago

      Hopefully archive.org have measures in place to stop people from yanking all their data too quickly. As least not without a hefty donation or something. As a user it can chug a bit, and I’m hoping that’s the rate-limiting I’m talking about and not that they’re swamped.

      • @Grimy@lemmy.world
        link
        fedilink
        English
        7
        edit-2
        1 month ago

        That would go against the principal of the archive imo but regardless, if you take away all means of acquiring data freely, you are just giving companies like OpenAI and Google who already have copies of it an insane advantage.

        AI isn’t going away, we need to make sure we have free access to it as to not give our whole economy to a handful of companies.