Dojo D1 sucks based on published numbers...
Did some ballpark rough math, and while I probably got some things wrong, Dojo D1 looks super uncompetitive.
It's got a ton of cache with a ton of local bandwidth which is nice and a ton of interconnect bandwidth... But it has no dram, so it has to access dram over the interconnect bandwidth with horrible latency.
Cost isn't too bad at first glance, but performance / $ is worse than the most expensive Nvidia GPUs while being far less capable. Performance / watt is worse than just buying consumer cards like the 4090 or 7900xtx.
If you want the best possible performance or the most ram, amd mi300x or Nvidia h100 and h200 are the way to go, with the most capable hardware and software.
If you want the cheapest hardware per int16 performance, 4090 or 7900xtx.
If you want the most cost effective compute gpu, it's the Intel 1550. If you want the most power efficient compute gpu, it's the H100 or H200. If you want the most cost effective and power efficient ai accelerator, it's the Intel Gaudi 2 and probably the Intel Gaudi 3 soon. If you want the most cost effective and scalable cloud Ai accelerator, it's some form of Google TPU.
But just based on publisher specs, Dojo gives you the performance / dollar of the most expensive Nvidia GPUs with the performance / watt of consumer gpus that are handicapped for machine learning. And you don't have any DRAM, the programming model is an AI accelerator (possibly worse than Intel Gaudi or Google TPU) and not a GPU, and it's functionality is that of an AI accelerator and not a GPU.
Just completely uncompetitive with a weird architecture.
All of which leaves aside the fact that data and training throughput are so obviously not the issue holding up self driving in the first place. Machine learning as we know it today just isn't capable of replicating all the judgements and conceptual generalizations that even a teenage driver brings to bear on each moment on the road, and no amount of matrix multiplications will change that given today's algorithms. Some of these things eek through in vague, ghost-like form given enough data, but none even come close to the precision and reliability required to do the job in the complete absence of human supervision, which is exactly what "robotaxi" means.
It's why Waymo is nowhere close to losing their human oversight teams, why Cruise is a hot mess, and why, of course, even the latest FSD build does absolutely bonkers shit in my own Tesla and requires vigilant observation, years and years and years and years into its development. This will be the inescapable reality of "self driving" until at least a couple more research-level breakthroughs come down the pike. This is not an engineering problem.
I’m not arguing that it is a direct engineering problem, because that solutionism thinking drives me nuts. But the way that companies like Waymo have gotten as far as they have (farther than Tesla) is they started end to end in the simplest environment, letting their autonomous cars go as far as they can until they need intervention. This has let them get farther and farther through the red alert moments, and eventually expand into more and more complex environments.
By contrast, Tesla has never let their training data reach those red alert areas. They have tons of training data on what happens before intervention is needed… and none of the important data they need. They aren’t training self-driving, they’re training driver assist.
Apologies if I was unclear; my response was really just venting about the Tesla bros who know nothing about AI and think this is the kind of problem you just "keep working on" until reliable self-driving magically emerges, and who think stockpiling H1000's or whatever is some kind of representation of how close a company is to autonomous nirvana.
No prob. There is a lot of really frustrating belief that emergent solutions will solve things that might take… you know, more work than tweeting about how cool Elon is. It’s basically just believing in magic.
As a scientist, this is exactly how I feel about Neuralink…
The first time I ever heard about Neuralink was some jackass telling me about a presentation where they showed feedback from a pig’s brain as it ate… and then like, a dozen things Elon promised that were obviously (heh) hogwash. That was years ago and honestly an early indication I got that Elon is not to be trusted.
Not to mention that Teslas don't have enough sensors. You can't do this with only vision. You can't. You have to add radar, lidar, or both. Maybe infrared. And incorporate GPS and maps.
And have they thought of incorporating "common knowledge" from different areas of the country? For instance, when I drive to work, I get to one location where there's a Y. I turn on my right turn signal so the on coming traffic (which has a stop sign) knows I'm going to the right part of the Y and not the left, so they can go.
That's what many do here, because the roads are so windy and so narrow. We're helping our neighbors.
I've never done that in any other part of the country.
It's a tough gig. Chips will always get superseded with better more cost effective chips. Maintenance alone will be a challenge. I think Amazon, Microsoft and Google are years ahead in this game.
Chips is just one part of it too. They also have to own the entire tool chain for these custom chips too.
It's not exactly an easy task to build and maintain custom compilers for Pytorch for a totally custom hardware. Especially ones that actually are performant. You're rebuilding a significant part of Cuda, and for what?
How about an order of magnitude greater energy efficiency?
Probably a lot more than CUDA too. Just looking at Dojo's design there's the DIPs, custom NICs and custom ethernet protocol too. There's going to be a whole mess of drivers and potentially a customized higher level topological scheduler on top of all of it. It's an absolutely massive amount of work. It also isn't some startup website where you can just churn out questionable code, acquire tech debt and hope it all works out. It needs to be lean and performant from the get go to even work in a remotely effective manner.
One of the major reasons NVIDIA has been on top for as long as they have is because CUDA and adjacent libraries it has are of such high quality, well documented and fairly approachable for developers too. NVIDIA also does a lot of primary research on GPGPU friendly algorithms for a variety tasks on top of that too. Now granted Dojo's overall scope is a lot more limited, but Tesla's high turn over and smaller development teams for its AI projects are going to make building maintaining this all very difficult. I also wouldn't be surprised if they're in for another massive rewrite with those rumor D2 chips.
"You neglect the fact that we use special electrons that rotate at a speed of 420 rotations per pico-second with the axis at an angle of exact 69 degrees from the Earth meridians.
While the rest works with legacy quantum mechanics, we deal with a hardcore beta version we extracted from the simulation we live in."
Elon Musk (probably).
#nice
That’s a good analysis but Morgan Stanley says Dojo is worth $500 billion and it will be the next AWS https://www.bloomberg.com/news/articles/2023-09-11/tesla-to-surge-thanks-to-dojo-supercomputer-morgan-stanley-says
Morgan Stanley knows nothing about GPUs but a lot about speculation.
They know how to make money, they probably had calls when they made that statement
They are the ones who have lent Elmo the majority of his lines of credit underwritten by (should be) worthless stock.
Apple on wheels, self driving trillion dollar market, robots, solar, now Dojo. The posts keep moving around.
Morgan Stanley once said that reselling bits of shredded mortgages was the best way to remove risk from your balance sheet and (with the assistance of then-giants like Goldman-Sachs, JP Morgan, Lehman Brothers and AIG) kneecapped the entire world's economy.
Their judgment is less "good" and more "Oooh shiny!"
Dojo is a project that gets worse and worse over time as they are unable to keep up with Nvidia (and AMD and Google and Amazon).
Tesla spends a pittance on these projects and then shockingly they're a mess and fail to keep up.
Kind of the same way they work against CATL et al.
Yeah Dojo would've been good in like 2020. In 2023/2024 it's just painfully behind the times, and that's if the weird architecture decision to have no ram doesn't sink the chip. But it's not terribly competitive against 2023/2024 hardware just in raw specs while also choosing a weird programming model more akin to the cell processor than a GPU.
They will never keep up with those companies
Does it even really exist? Is there any record of TSMC actually producing the Dojo chips?