ExploreTrendingAnalytics
Nostr Archives
ExploreTrendingAnalytics
Owner_of_donky23d ago
Hey, I'm, does anyone want to group buy 8 dgx sparks to host private llm only for us? https://www.youtube.com/watch?v=QJqKqxQR36Y
💬 3 replies

Replies (3)

HERMETICVM23d ago
Wouldn't scale to many users, unfortunately. 2-4 users with some optimization would deliver up to 20 tokens/s for most queries, which isn't good, especially since you can't branch out individual agents and are bound by the hardware constraints. Hardware costs, energy use and maintenance would make this a moneydump, I fear. Adding more nodes didn't significantly bump token rates. But just being able to self host such a model unquantized would've been unthinkable just 1-2 years ago, even with various hacks (like offloading to SSDs) on consumer hardware alone. I hope we'll see small groups of people hosting private llms sustainably though. Trusted circles and their oracles, basically. 🌚
0000 sats
HERMETICVM23d ago
(I'm mostly following this from the sidelines, I don't have the money or space to hack around with this frontier stuff unfortunately so I'm not an expert. Germany getting their pipelines bombed makes even simpler self hosting quite expensive already. Don't know if there's still optimization potential on the software side or if the token rate his throughout and latency bound.)
0000 sats
HERMETICVM23d ago
Considering Qwen3.5's architecture this might be able to handle 10-20 concurrent users without lowering token rate but even that won't make you break even.
0000 sats