• FiniteBanjo
    link
    fedilink
    arrow-up
    11
    ·
    9 months ago

    Why would that need to be proven? We’re the sample data. It’s implied.

      • FiniteBanjo
        link
        fedilink
        arrow-up
        7
        ·
        9 months ago

        What you’ve described would be like looking at a chart of various fluid boiling points at atmospheric pressure and being like “Wow, water boils at 100 C!” It would only be interesting if that somehow weren’t the case.

        • jarfil@beehaw.org
          link
          fedilink
          arrow-up
          5
          arrow-down
          1
          ·
          9 months ago

          Where is the “Wow!” in this post? It states a fact, like “Water boils at 100C under 1 atm”, and shows that the student (ChatGPT) has correctly reproduced the experiment.

          Why do you think schools keep teaching that “Water boils at 100C under 1 atm”? If it’s so obvious, should they stop putting it on the test and failing those who say it boils at “69C, giggity”?

          • FiniteBanjo
            link
            fedilink
            arrow-up
            4
            ·
            9 months ago

            Derek feeling the need to comment that the bias in the training data correlates with the bias of the corrected output of a commercial product just seemed really bizarre to me. Maybe it’s got the same appeal as a zoo or something, I never really got into watching animals be animals in a zoo.

            • jarfil@beehaw.org
              link
              fedilink
              arrow-up
              3
              arrow-down
              1
              ·
              9 months ago

              Hm? Watching animals be animals at a zoo, is a way better sampling of how animals are animals, than for example watching that wildlife “documentary” where they’d throw lemmings of a cliff “for dramatic effect” (a “commercially corrected bias”?).

              In this case, the “corrected output” is just 42, not 37, but as the temperature increases on the Y axis, we get a glimpse of internal biases, which actually let through other patterns of the training data, like the 37.

    • EatATaco@lemm.ee
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      9 months ago

      “we don’t need to prove the 2020 election was stolen, it’s implied because trump had bigger crowds at his rallies!” -90% of trump supporters

      Another good example is the Monty Hall “paradox” where 99% of people are going to incorrectly tell you the chance is 50% because they took math and that’s how it works.

      Just because something seems obvious to you doesn’t mean it is correct. Always a good idea to test your hypothesis.

      • FiniteBanjo
        link
        fedilink
        arrow-up
        1
        ·
        8 months ago

        Trump Rallies would be a really stupid sample data set for American voters. A crowd of 10,000 people means fuck all compared to 158,429,631. If OpenAI has been training their models on such a small pool then I’d call them absolute morons.

        • EatATaco@lemm.ee
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          8 months ago

          A crowd of 10,000 people means fuck all compared to 158,429,631.

          I agree that it would be a bad data set, but not because it is too small. That size would actually give you a pretty good result if it was sufficiently random. Which is, of course, the problem.

          But you’re missing the point: just because something is obvious to you does not mean it’s actually true. The model could be trained in a way to not be biased by our number choice, but to actually be pseudo-random. Is it surprising that it would turn out this way? No. But to think your assumption doesn’t need to be proven, in such a case, is almost equivalent to thinking a Trump rally is a good data sample for determining the opinion of the general public.