• V0ldek@awful.systems
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 hours ago

    Okay I mean, I hate to somehow come to the defense of a slop company? But WSJ saying nonsense is really not their fault, like even that particular quote clearly says “DeepSeek said training one” cost $5.6M. That’s just a true statement. No one in their right mind includes the capital expenditure in that, the same way when you say “it took us 100h to train a model” that doesn’t include building a data center in those 100h.

    Beside whether they actually lied or not, it’s still immensely funny to me that they could’ve just told a blatant lie nobody factchecked and it shook the market to the fucking core wiping off like billions in valuation. Very real market based on very real fundamentals run by very serious adults.

    • ebu@awful.systems
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      9 hours ago

      i can admit it’s possible i’m being overly cynical here and it is just sloppy journalism on Raffaele Huang/his editor/the WSJ’s part. but i still think that it’s a little suspect on the grounds that we have no idea how many times they had to restart training due to the model borking, other experiments and hidden costs, even before things like the necessary capex (which goes unmentioned in the original paper – though they note using a 2048-GPU cluster of H800’s that would put them down around $40m). i’m thinking in the mode of “the whitepaper exists to serve the company’s bottom line”

      btw announcing my new V7 model that i trained for the $0.26 i found on the street just to watch the stock markets burn

      • V0ldek@awful.systems
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        but i still think that it’s a little suspect on the grounds that we have no idea how many times they had to restart training due to the model borking, other experiments and hidden cost

        Oh ye, I totally agree on this one. This entire genAI enterprise insults me on a fundamental level as a CS researcher, there’s zero transparency or reproducibility, no one reviews these claims, it’s a complete shitshow from terrible, terrible benchmarks, through shoddy methodology, up to untestable and bonkers claims.

        I have zero good faith for the press, though, they’re experts in painting any and all tech claims in the best light possible like their lives fucking depend on it. We wouldn’t be where we are right now if anyone at any “reputable” newspaper like WSJ asked one (1) question to Sam Altman like 3 years ago.