What Is Going On With Tesla's Dojo? - CleanTechnica
What Is Going On With Tesla's Dojo? - CleanTechnica
Tesla-Answer is a dedicated community forum for Tesla owners and enthusiasts. Our platform connects experienced Tesla drivers who share practical knowledge, troubleshooting tips, and maintenance insights to help you maintain and upgrade your Tesla.
What Is Going On With Tesla's Dojo? - CleanTechnica
My guess based on his extensive track record:
NOT MUCH...
As dead as a dojo.
Finally, some people are waking up to the fact this this was bullshit from the very beginning. A bigger computer was never going to solve FSD. Computation is not the problem: And it is certainly not a problem if you can throw billions of dollars at it and rent cloud computing at pretty much any scale you want. From a cost perspective it is almost always cheaper to rent your computing resources unless you require your cluster to run 24/7/365
Doing something sensible with all that computing power at your disposal, that is the hard part. And this is where the problem comes in. Like many machine learning problems, FSD is fundamentally an ill-defined problem. At the end of the day you are minimizing a loss function. The question is: What loss function is a good one to minimize for FSD? Is your dataset reasonably complete? How do you deal with the imbalance in your dataset. Often you have instances in your dataset that are rare, but are important to handle correctly. And of course "handle correctly" is not exactly a well-defined term in the context of self-driving cars.
Full self driving has to operate on practical hardware in a car. By processing real time streams of video to understand the world around it. That sounds fairly computationally intensive to me. I take your point that the dataset is the key challenge but the computation problem isn’t trivial. Unsurprisingly Tesla is not going to be better than Nvidia at creating the processors to do it. There are only so many problems their engineers can take on and it’s very evident they haven’t mastered the core mission of producing a reliable lineup of new models and vehicles that can maintain the public’s interest.
The problem for me is that, no matter how much video input there is, a ML model still won't know some things. For instance, when I'm driving on a road where I crest a hill and see there's traffic stopped near me, I stop at the crest of the hill. Why? Because then I'm warning others that traffic is stopped, and I'm protecting myself from getting slammed into the read by an over-eager person cresting the hill.
Where I live, there are very windy roads, some of which have Y intersections. If I'm coming up on the Y, I'll let people know which way I'm going by turning on my turn signal (I have the right of way and go either path).
What happens in the snow? With a front wheel drive car, I "attack" hills when it's snowing. I start in a reasonable gear that I think gives me the high and low range so I don't have to switch gears mid-hill. I then "blast" up the hill so that I have confidence I can make it the whole way up.
Then, there's turn into the spin for wet roads/snow, slowing down when it's poured outside because of water on the roads, etc.
No matter how much video they analyze, they aren't going to decipher those rules.
The problem for me is that, no matter how much video input there is, a ML model still won't know some things. For instance, when I'm driving on a road where I crest a hill and see there's traffic stopped near me, I stop at the crest of the hill. Why? Because then I'm warning others that traffic is stopped, and I'm protecting myself from getting slammed into the read by an over-eager person cresting the hill.
One thing that I noticed Autopilot struggled with was communication with pedestrians. Autopilot would see a human being jaywalk into the road and panic brake, but as another human being I can lock eyes with that person, see that they see me, see that they've stopped walking and are now looking behind me, and understand that they're waiting for me to drive past first. That communication can't happen with a "self-driving" car.
Then, there's turn into the spin for wet roads/snow
Agree with you on everything else, but this is one thing that it should be possible for software to do a pretty decent job of, if set up correctly.
I designed an autonomous control system for a racing car at uni, and that was a decade ago, didn't require any vast amount of computing power, just an absolute buttload of sensors. That thing could drift figure 8s pretty reliably when we tried that, and there was no ML model involved. Whether Teslas have the sensor suite they'd need to make that work I have no idea, but it's certainly very achieveable.
Now, predicting upcoming grip levels visually? That's a much harder challenge.
To be clear there's more sensors and feedback in even a Tesla than Musk's commentary indicates. There's all sorts of kinematics sensors, there's feedback on wheel speed and so on. Saying vision only is a misnomer and even after all its removals Tesla doesn't use a single type of sensor either. Similarly all vehicles have map data of some kind despite what Musk tends to imply and it's more a matter of fidelity than anything else. In at least some cases the vehicle is also sampling from driver inputs to mimic try to build a rule set around them to understand the problem.
Another big thing is that in truth not all the rules and connections in these networks are completely free form either. Data scientist and ML engineers will design a wet surface detection network or a snowy road detection network and explicitly feed its results into the control and planning system. Similarly map data and expectations from prior state flow into a lot of planning and prediction components.
There's still a lot of human designated structure in neural networks and that's another big thing that impacts their performance and the abstractions they come up with internally. Which is another thing worth mentioning, the only reason you use a neural network is to abstract and estimate a problem that would otherwise intractable computationally. They aren't necessarily going to arrive at the same abstractions and concepts that humans by simply analyzing data and that's one of the reasons hand coded algorithms, relevant sensor inputs and explicit inputs are useful to just making a really big NN with a lot of hidden state. A lot of this is basic KISS principle stuff stuff that Musk hates but it's way easier to just plug in data from an acceleration sensor than to go through the computational complexity of training a system to realize that it's an important concept that can be used in solving a problem from a bunch of images and video.
Where things get really difficult is in with communicating or coordinating with actors that external to the system, dealing with partial or absent information, larger amounts of uncertainty and the reality that you'll have to deal with adversarial, indifferent or outright malicious actors who ignore the rule set that's supposed to apply to the problem and implement their own.
There's just a ton of misrepresentation out there of how these systems work, what engineers do and how far you can get by just stuffing data into them as well as the overall value of certain kinds of data and how much data it takes to make a marked improvement at a certain point. As an example you could have a billion miles worth of data traveling down a straight highway road and it would be of no value in dealing with crowded urban area with lots of pedestrians and cyclists. A lot of work goes in on the data set to build a representative data set or one that weights harder situations higher.
Overall there's just a lot of actual work beyond collecting data that goes into these systems and that's a big part of why data scientists and ML engineers are paid so well.
There's just a ton of misrepresentation out there of how these systems work, what engineers do and how far you can get by just stuffing data into them as well as the overall value of certain kinds of data and how much data it takes to make a marked improvement at a certain point. As an example you could have a billion miles worth of data traveling down a straight highway road and it would be of no value in dealing with crowded urban area with lots of pedestrians and cyclists. A lot of work goes in on the data set to build a representative data set or one that weights harder situations higher.
As a former data wrangler (in neuroscience research), this is very true and something I think Elon fundamentally does not understand due to his extremely surface level knowledge of, well, everything. In his mind, more computing power, more data and more time will somehow magically arrive at the singularity of autonomous driving. I'm of two minds whether he still believes it's actually possible or it's just a hustle now (v13 is the one guys, you're gonna be blown away! - Whole Mars Catalogue, probably) but he was extremely well aware we were nowhere near that point in 2016 when he presented it as a solved problem
I really just think he doesn't understand ML and AI at all and despite all his bluster has put pretty much no effort into learning even the basics about it. You'll see him making weird comments in interviews both past and present where he's not only incapable of answering technical questions but will sometimes try to anyways and say something either nonsensical or that belies the fact he doesn't really understand how it differs from traditional software coding. It's been especially funny to me recently because they keep on going to how FSD V12 removed a lot of lines of C++ code but you can literally see him telling Lex Fridman that the solution to FSD is going to involve "a lot of lines of code, that's for sure" not too terribly long ago. So we have a literal contradiction coming from him at this point and one that anyone who understands the code involved in training and utilizing NNs knew was nonsense the second he uttered it. There's a ton of these little things like him claiming "Tesla is the best at inference" in the most recent conference call without even attempting to quantify or qualify the statement in any way. My best guess is at some point the actual engineers tried to explain that HW3 is bottle necked and that as a result Tesla puts a lot of effort into optimizing their networks through stuff like Supernet optimization (they literally acquired a company that focused on that type of optimization afterall). But Musk himself basically said "we're the best at math" because he understand any of the details involved.
With Dojo and HW4 I also think part of it is just to deliver something tangible too. I mean we don't have a car that can drive itself but hey here's a chip we spent some money on and obviously more processing power couldn't hurt right? Same goes with the frequent major version bumps and voluminous release notes. It's just there to give the illusion of more substantial progress than is actually being made. At this point they're just in a prison of their own design without any other choice than to hope some breakthrough comes along or that HW4 or 5 will be able to fill in the gaps in some way that can be back ported to older models with minimal expense. Until then it's just a matter of keeping everyone distracted with the idea that things are indeed getting better every day.
Your whole first para, that's been his MO for his entire career. That he's got this far with it is pretty astounding. Whenever he speaks on something that you have some knowledge in, it's clear that he doesn't have even a surface level understanding of a lot of it. He boldly gave himself the title of Chief Engineer (title, not job description) at SpaceX when he doesn't understand the first fucking thing about rocketry and aeronautics. Whenever he does manage to get in control and make decisions, overriding the objections of people who know what they're doing, we get shit like the Cybertruck and insistence on not having a flame trench or deluge system for Starship launches because 'we won't have those on Mars' (fucking what? Seriously??). He keeps getting away with flagrantly lying about future features with 'we know he makes overly optimistic claims/timings, it's just what he does', with some trying to make the claim it's because 'he sees so much more of the bigger picture and doesn't think in normal timescales' no mate, it's called being a lying huckster
Yeah that seems to the sentiment whenever he talks about anything technical. What's just crazy is that no one really seems to push back against publicly and everyone still treats him like some kind of universal expert on everything. Obviously CEOs don't need to know all the technical details either, but when he openly appoints himself Chief Engineer as you mentioned and constantly sits around talking about how what he does all day is engineering and hates the corporate sides it's just insulting.
Ask anyone who was around the PayPal era before he got the boot, he was this annoying even then.
His whole thing with "Linux servers are stupid, I'm CEO, and I say we use Windows" is hilarious. Imagine that conversation? We're not talking someone who didn't want to pay a RHEL license and just use CentOS or something, we're talking a "super hardcore technical" guy who couldn't just admit he hadn't learned what Linux is.
Now apply that to EVERYTHING in his life.
Why would he be constantly getting divorced on, I wonder?
He also doesn't know how to use GitHub which was why he was asking people at twitter to print their fucking code out for him for the weekly code review. Now I'm not a coder but even I know that is beyond parody. Silicon Valley was a TV show, not a documentary lol
every time I hear about machine learning solving self-driving I think about the time during Big Dig construction in heavy snowfall a cop directed me to go the wrong way down a one way street because of rerouted traffic
Sure, inference has to be relatively efficient. But Dojo has absolutely nothing to do with it. It would only be useful for training.
Which incidentally is another reason why the whole premise of Dojo is stupid from the start. You cannot make the model arbitrarily big. As you said, the model needs to be fast enough be executed many times per second on relatively cheap hardware. And that is also mainly an algorithmic problem, not a compute problem.
The current architecture might be computationnally demanding now, but the problem is even with more power and data, it wpn't solve it. A human easily transition from walking to driving a bicycle to driving a car or a boat. We have broader concepts because we have general intelligence.
Those are all real problems but there is one which is more fundamental yet.
The entire field (not just Karpathy and the remaining FSD team) have convinced themselves they are replicating the human brain when, at best, they've built something inspired by it.
Sure. I agree. But I think in relation to driving - which you and I and all of us do without thinking much - the problem is more bounded. That said it’s clear we bring biases based on our understanding of context that a model has yet to replicate. Does it need to be of the complexity of a human brain? I don’t think so. Machines have an advantage over us of incredible speed of processing. But their perception is inferior to ours. It can’t just be 5 fixed cameras. It’s something that hasn’t been replicated yet. Why? Because our eyes are optically imperfect instruments set close together and yet they give us perception of depth. Our understanding of a highway and what should be on it and its moving or fixed is perceptually superior. So far. Our read of a mother with pram crossing a road - or how a cyclist is erratically weaving is superior. And Elon insists that standard cameras can do it because we can do it. They’re assuming it’s solvable with vastly less complex instruments behind the wheel. I guess what I’m saying is they need lidar. Something we can’t do - an advantage we don’t have - to solve the problem. Because we’re actually really good at driving.
They think it's solvable because they believe they've recreated how the human brain works, which they haven't. They believe they are on the path and that they either need more compute and/or more data.
I do take one exception to your general point that we're pretty good at driving. Perhaps a young attentive person is good at it compared to a machine. But as a whole (everyone who drives and interacts with each other as well as non-driving road users, we're not. It's one of the leading cause of preventable injury and death, which is the whole reason big money is going into replacing us as drivers.
What bothers me though is that all this in under the guise of road safety, but self-driving in the US has actually set road safety back. Other countries continue to implement road design changes that actual reduce accidents and are seeing real results, while the US continues to double down on bad design generally and try to bandage over its deficiencies with tech in new cars.
The comment on humans being good at driving is derived from stats per million miles driven (12 accidents, 1.38 fatalities) that appear to be superior to some Ai driven driving - although I base that on articles that are 3 or 4 years old so Ai may generate better stats these days. Obviously humans drive billions of miles in all conditions and on all roads whereas the dataset for Ai is much smaller and less representative of all conditions/scenarios. But I think it’s notable that some of the self driving startups like Aurora still emphasize that under certain conditions humans must take over from their software.
I agree with your general point that better roads and infrastructure is a better way to reduce fatalities than self-driving Ai. But self-driving Ai is not government mandated as a way of papering over the cracks. It’s our tech industry looking to make a buck.
science, of all sorts, generally disregards edge cases and removes them from the data pool to avoid messing up analyses and such. That's not really something you can do in this circumstance. So. Then what?
I would argue that their technical approach to the problem is fundamentally flawed and rests on assumptions with no basis in reality, and that FSD is not solvable the way they’re trying to solve it.
I like how these guys are shocked when he admits he lies despite him doing it time and time again over the years. Keep in mind Musk and Tesla management pulled this exact same stunt with the 4680 too. It went from being industry leading technology that was going to blow everyone out of the water to just being a template and a hedge to getting cut out by suppliers. It wouldn't matter in the end so long as they got the cells. Of course even that statement was bullshit as we found one when the CT was released with its heavily nerfed range because of the assumptions that were made around cell weight and energy density.
Dojo was the same thing. A hugely exotic architecture built with all these cutting edge fabrication techniques that did nothing to address the overall challenges of the architecture. Tesla thought they could duplicate the same success they had with their initial FSD chips switching from a more traditional and dated MIMD GPU architecture to a dedicated tensor accelerator but completely missed the fact that NVIDIA had already changed their designs to do that and that by the time they would actually chips fabricated and delivered NVIDIA and AMD would be mass producing alternatives that were far more performant.
I'm sure in 5 years these same guys are going to be shocked that large castings, structural batteries and their huge panel proposal for their unboxed fabrication are all things that ruin the serviceability and repairability of their vehicles in a lot of situations and make them far more expensive to insure even if it makes them a bit cheaper to produce.
because of the assumptions that were made around cell weight and energy density.
I would go a step further and say that they did it intentionally through misleading graphics. I still remember their PowerPoint slide when they announced the new batteries, saying "50% more power" when comparing the old battery and 4680.
To anyone not familiar with battery sizes, they assumed it means 50% more energy density. But if you calculate the volume of the batteries, it clearly shows they're just 50% bigger in size.
They intentionally use vague or misleading data, that's just correct enough that they can legally claim it is true, and let all the non-technical Tesla/Musk fans make assumptions and promote it via YouTube and other social media.
50% bigger or 150% of the size…
They did both, early on they had a slide saying the 4680 cells has 5X the energy of the prior 2170 cells, but they're literally 5X the volume so that's to be expected.
But they also dedicated a lot of slides to showing how they were increase vehicle range, reduce cost per kwh and capex by over 50% in total too though all these technologies they were hoping to deploy too. Drew Baglino went so far as to literally say "This isn't stuff that will never make it out of the lab, these are things we're putting into batteries today" yet here we are almost 4 years later and Tesla is barely making any cells and the cells they are making are still slightly worse than the same 2170 Panasonic cells they've been using for years.
I've heard one of the regular Tesla folks say that the 5X stock split increased Teslas value five-fold so the bar on understanding numbers can be very low.
I'm sure in 5 years these same guys are going to be shocked that large castings, structural batteries and their huge panel proposal for their unboxed fabrication are all things that ruin the serviceability and repairability of their vehicles in a lot of situations and make them far more expensive to insure even if it makes them a bit cheaper to produce.
Man, the logic pretzels they'll tie themselves into to defend the way whatever it is Tesla/Musk happen to be doing at the time, as the good and correct to do things, especially crediting Musk with being forward-thinking if it's a thing that other companies Do Not Do That Way For A Reason...only to seamlessly switch when the company line changes. They'll parrot whatever meaningless word salad nonsense Elon has to say about it, slam anyone who points out the emperor's lack of clothes.
But it's not a cult /s
It uses out of shelf Nvidia chips and currently sits at the not impressive, nowhere near close to the competition in performance and software. On they yearly call they basically reported "meh, it's there"
"Say what?!?!? This is not how Elon Musk was talking about Dojo a couple years ago! But even more than what was said, how Elon Musk talked about this — seemingly resigned to less than amazing results"
I'm gong to help Zach out here. You see, when Musk was bragging about DOJO, that was before he dumped $39 billion in TSLA stock.
Damn that was a bunch of word salad
So at the 4Q meeting DOJO was a long shot and probably not, and 2+ weeks it’s a thing.
The musk cult doesnt listen to earnings calls, nor reads the investor reports. They rely exclusively on musk tweets. If he says Dojo is the best AI computer in the world, it is, no question...
It wasnt long ago people were talking about Dojo being faster than Frontier, as if it already existed.
Facing the fact that its all BS and they are just buying Nvidia cards, at an order of magnitude lower quantities than the competition, is too much for them..
Boy, I listened to the earnings call and seeing it all written out is somehow worse.
Another failed Musk project like the semi, bot, robotaxi, hyperloop etc. The radio silence is confirmation.
Also:
- saving Puerto Rico with Tesla batteries
- saving Flint's water supply with magical engineer thinking
- saving trapped kids in a cave with a submarine that can't fit in a cave
- marriage number one
- marriage number two/three
- marriage number four... etc etc
Tesla Dojo was going to change the world. It was Tesla’s secret sauce to help solve Full Self Driving, and general AI.
None of that makes any sense unless you know literally nothing about ML hardware. Dojo is the most overpumped TSLA pump by far (compared to its potential pay off). Way more than 4680, FSD, Bot, etc.
I feel like I'm taking crazy pills when people talk about Dojo. Nobody would ever suggest Google's TPUs are Google's secret sauce to domination... but somehow Dojo is worth endless speculation because a moron pumped it to a bunch of bleating laymen morons.
And at most they know about TPUs... Not so sure many of Dojo bulls have heard of Amazon's Inferentia and Trainium or that Microsoft and Facebook are also developing custom silicone....
Let's separate out the two systems. First, you need to train vision models - the model that reads in still frame and labels them for the ground truth system, the pathing system, etc. Nvidia is the right tool for this job. The V100, A100, and H100 are all great, and have been out for a long time. I knew Tesla was a fraud when Cruise had racked almost $1 billion of V100 chips, and Tesla had invested only about $30 million in a much smaller cluster. This was around 2017.
Next, you need to run a large scale simulation model of the real world and ideally you would do 10,000 years of testing on a build. During this time, if there's a fatal accident the build fails. Dojo is a genius solution to this problem. Attach a RISC-V computer (different architecture than Intel/AMD or ARM) to an FSD computer, and then trick the computer into thinking it's in a real car. Assemble 100,000 of these, and now you can do a 10,000 years of testing in under a month. Your "gold release candidate" builds go through that month of testing before you put them on a real car. Then you roll it out slowly to the whole fleet.
The problem is, it's really, really hard to make a new computer system. From the operating system, to the drivers, to the I/O bottlenecks. Not only do you have revisions of the FSD computer, but you also have revisions of the Dojo system architecture. Last time I checked, they were on their 3rd major revision of the Dojo system. The thing Nvidia solved is removing every bottleneck in the system so data flies through like shit through a goose. In particular, a lot of training relies on floating point 8-bit operations and a technique called quantization, which attempts to scale down a larger 16-bit calculation without losing meaningful precision in the data.
If all 50,000 people working at Tesla were just focused on building this system, it would still take them 8-10 years to put a robotaxi on the roads for testing. Then the requisite 5 years of permits and testing in a city like San Francisco. So basically it would be ready for rides in 2040. But they're not doing this.
Elon's style is to put a team of 10 on it, make them work 120 hours a week, and see if they can make a novel computer. Even the deployment pipeline automation system would take more people than that. Not to mention inventing an ethernet bus that can feed data to those cards at thousands of times the normal network speed. Nvidia calls this infiband.
Was going to reply but you nailed it perfectly.
What the Dojo team built is really interesting. But even they are clear about the decisions they made for throughput sake. I wish they were a whole other company, as their cred from previous roles, and interesting design choices, make for a compelling project.
But it's a project built on a flawed idea of magical vision processing. And it gets lapped up by Morgan Stanley and co who claimed it will unlock "$500B in value" (???).
Imagine being an actual ML engineer used to using CUDA and a badass A100/H100 rig in a standard HPC environment, and you start at Tesla and are told you can't use any of that, and have to learn a new system, job queues, etc, without ever saying "hey this is dumb" out of risk of being fired if overheard by your weirdo bosses bosses boss.
I read the MS report (someone forwarded me the PDF) and the problem is they had a semi conductor stock analyst write it. So he just assumed “number of chips * price * throughput” and claimed TSLA would trump NVDA. Overlooking Elon’s famous fallacy, “all we need (now) is the software”
Mojo dojo casa AI
Wait till Tesla combines Dojo + Optimus + FSD and opens a portal in space-time to the dimension of infinite orgasms!
Tesla intern: "Umm... s-sir, isn't it about time we upgraded Dojo's OS?"
Musk: "But Windows 95 is the only thing I know how to use"
Intern: "It's just that we have this whole crate of 6 year old NvIdia GPUs..."
Musk: "No."
the actual truth
Without even reading the article I'll save everyone else a click: It doesn't exist. And even if it did, it's not going to fix the problem.
Elon is definitely the kind of CompSci 101 C-student who thinks throwing more computing power at an issue will surely fix it, but as a biomed graduate I can tell him that no matter how much money he wants to waste on it, we cannot at this point in time build a computer equal in processing power and complexity as the human brain. This is like the one time squishy science people get to feel smug over the hard science ones lol
I don't believe it was anything but hype
This is all vaporware and Tesla never plans on doing anything with any of this. It is all just hype to boost the stock price. This is why you tie a CEOs compensation to fundamentals instead of stock price as this encourages follow through and seeing something to market. For example: The lowest amount that the top 5 AUTOMAKERS spend on R&D is GM with 5.02%. (https://www.nasdaq.com/articles/automobile-companies-and-rd%3A-top-5-spenders-2021-07-14) This is what it costs to be innovative, A PURE AUTOMAKER, in the top 5 and bring new products to market. Tesla spent 4% in 2023. (https://fourweekmba.com/tesla-research-and-development-strategy/#:~:text=Tesla%20R%26D's%20costs%20have%20doubled,4%25%20in%202022%20and%202023.) Tesla is supposed to be A TECH COMPANY. Tech companies spend anywhere from 10-15% on R&D because this is what it costs to be innovative and bring products to market in the tech space. Tesla is supposedly working on FSD, AI, robotics, EVs AND is supposed to be a tech company but they only spend a fraction of what the 5th highest automaker spends on R&D. It's because they are not innovating anything. It's all just hype.
I think I heard it’s gonna be the best thing ever.
many people have said it
Forwards computation of a neural network isn't all that intensive. A Raspberry Pi 3 can do a decent job on a 30 fps video stream from a picam. Its the back propagation, or learning, that is intensive (there are other learning algorithms, but that is the main one). You dont need to do that with the car though, do it in the cloud and pass the pretrained network to the cars with OTA. The main issue is it is supervised learning, so you need an external teacher (human) provide the training and verification data (which is used to avoid overtraining). All kinds of possible objects you need the car to react to, manually provided. Training data need to be of decent quality too with enough variety, or you get AI bias issues, like how AI struggled with recognizing faces of black people etc.