Office space meme:

“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”

  • Prunebutt@slrpnk.netOP
    link
    fedilink
    arrow-up
    1
    ·
    1 day ago

    The differenge is that the dataset is baked into the weights of the model. Your emulation analogy simply doesn’t have a leg to stand on. I don’t think you know how neural networks work.

    The standards are literally the basis of open source.

    • WraithGear@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      I made my level of understanding kinda open at the start. And you say it’s not, open source most say it is, and they explained why, and when i checked all their points were true, and o tried to understand as best i could. The bottom line is that the reason for the disagreement is you say the training data and the weights together are an inseparable part of the whole and if any part of that is not open then the project as a whole is not open. I don’t see how that tracks when the weights are open, and both it and the training data can be removed and switched to something else. But i have come to believe the response would just boil down to you can’t separate it. There really is no where else to go at this point.

      • Prunebutt@slrpnk.netOP
        link
        fedilink
        arrow-up
        1
        ·
        1 day ago

        You can read all the other comments which explained why it is not open source. You can’t really retrain the model without petabytes of data. Even if you “train” stuff on your dataset: it’s more like tweaking the model weights a bit, rather than building the model from scratch.

        “Open source” is PR talk by Meta and deepseek.