• tal
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    2
    ·
    edit-2
    1 month ago

    If this is you, then build your own home server.

    While I don’t disagree, there’s also a very considerable cost difference here between running locally and remotely.

    If a user sets up an AI chatbot, then has their compute card under average 24/7 load of 1% – which would require averaging, say, a daily session for an hour with the thing averaging 25% of its compute capacity during that session – then the hardware costs for a local setup would be 100x that of a remote setup that spreads load evenly across users.

    That is, if someone can find a commercial service that they can trust not to log the contents, the economics definitely permit room for that service to cost less.

    That becomes particularly significant if one wants to run a model that requires a substantial amount of on-card memory. I haven’t been following closely, but it looks like the compute card vendors intend to use amount of memory on-card to price discriminate between the “commercial AI” and “consumer gaming” market. That permits charging a relatively large amount for a relatively small amount of additional memory on-card.

    So an Nvidia H100 with 80GB onboard runs about (checks) $30k, and a consumer Geforce 4090 with 24GB is about $2k.

    An AMD MI300 with 128GB onboard runs about (checks) $20k, and a consumer Radeon XT 7900 XTX with 24GB is about $1k.

    That is, at current hardware pricing, the economics make a lot of sense to time-share the hardware across multiple users.