TechTalks was founded by Ben Dickson. He regularly writes about politics, technology, and business. Follow him on Facebook and Twitter (show all). Ben Dickson founded TechTalks. He regularly writes about politics, technology, and business. Follow him on Facebook and TwitterHow do you create fully autonomous vehicles? Researchers and companies are split on this question. There are many approaches to autonomous driving. They range from cameras and computer vision to advanced sensors and a combination thereof.Tesla has long been a strong advocate for pure vision-based autonomous driving. Andrej Karpathy, its chief AI scientist, explained why at this year's Conference on Computer Vision Pattern Recognition (CVPR).Karpathy, who has been the leader of Teslas self-driving efforts over the past years, spoke at the CVPR 2021 Workshop on Autonomous Driving. He explained how deep learning systems are being developed by the company that require only video input to understand the car's surroundings. He explained why Tesla is the best company to develop vision-based self driving cars.I gave a talk to CVPR on the weekend about our recent work at Tesla Autopilot, which uses neural nets to predict acceleration, velocity, depth, and velocity. The necessary ingredients are: 1M car fleet data engine and strong AI team. Supercomputer https://t.co/osmEEgkgtLpic.twitter.com/A3F4i948pD Andrej Karpathy, @karpathy. June 21, 2021General computer vision systemOne of the key components of self-driving technology is deep neural networks. Neural networks analyze car camera feeds to find roads, signs, cars and obstacles.Deep learning can make mistakes when detecting objects within images. Alphabet subsidiary Waymo uses lidars to create 3D maps of their cars using laser beams. Lidars provide additional information that can be used to fill in the gaps in the neural networks.However, the addition of lidars to the self driving stack can present its own challenges. Karpathy explained that you need to first map the environment using the lidar. Next, you will need to create a high definition map and insert all lanes and traffic lights. At test time you will be able to use that map to navigate around.It is very difficult to map every place the self-driving vehicle will travel. Karpathy stated that it is impossible to build and maintain high-definition lidar maps. This infrastructure would be very difficult to maintain.Tesla doesn't use lidars or high-definition maps for its self-driving technology. Karpathy stated that everything that happens is the first time it happens in the car, based upon the footage from eight cameras around the car.Self-driving technology has to figure out the location of lanes, traffic lights, their status and what information is relevant to the vehicle. It must be able to navigate the roads without any predefined information.Karpathy admitted that vision-based autonomous driving can be more challenging technically because it relies on neural networks that work extremely well using video feeds. He said that once it works, it can be used anywhere in the world.You will not need to have any additional gear for the general vision system. Karpathy claims that Tesla has already made progress in this direction. The company's self-driving cars previously used a combination radar and cameras. It has now started shipping cars without radars.We have removed the radar from our cars and are now driving on vision alone, Karpathy stated. He added that Tesla's deep learning system is 100 times more efficient than the radar. Now, the radar is beginning to slow down and contribute noise.Supervised learningThe main argument in favor of pure computer vision is the uncertainty around whether neural networks are capable of range-finding and depth estimation with no help from lidar depth mappings.Karpathy stated that vision is essential for humans to drive. Therefore, our neural network can process visual input to determine the velocity and depth of objects around us. The big question is: Can synthetic neural networks do this? In the past few months, I believe the answer is a clear yes.The goal of Tesla's deep learning system was to detect objects in depth, velocity, acceleration, and velocity. The engineers decided to approach the problem as a supervised-learning problem. This means that a neural network is trained on annotated data to learn to identify objects and their properties.The Tesla team required a huge dataset of millions of videos to train their deep learning architecture. Each video was carefully annotated with their objects and properties. It is difficult to create datasets for self driving cars. Engineers must include different road settings and edge situations that don't happen often.Karpathy stated that success is almost guaranteed when you have large, clean, diverse datasets and train a large neural net on them.Auto-labeled datasetTesla has millions of cameras-equipped cars around the globe, making it a formidable platform to gather the data necessary to train the car vision deep-learning model. 1.5 petabytes worth of data was accumulated by the Tesla self-driving technology team. It included one million 10-second videos, 6 billion objects and bounding boxes that indicate depth, velocity, and depth.It is difficult to label such a dataset. It is possible to have the data annotated manually by either data-labeling firms or online platforms like Amazon Turk. This would take a lot of manual labor, can be expensive, and could become very slow.Instead, the Tesla team used an automated labeling technique that combines radar data and neural networks. The dataset is annotated offline so that the neural networks can run the videos back-in-foot, compare their predictions to the ground truth, adjust their parameters. This is in contrast to test-time inference where everything happens in real time and deep learning models cannot make recourse.The engineers were also able to use offline labeling to create powerful, compute-intensive object detection network that cannot be deployed on cars. They can also use them in real-time, low latency applications. They also used radar sensor data for further verification of the neural network inferences. This improved the precision and accuracy of the labeling network.Karpathy stated that offline you have the advantage of hindsight so you can calmly fuse [different sensor data]. You can also involve humans to do the cleaning, verification and editing.Karpathy's videos at CVPR show that the object detection network is consistent through debris, dust and snow clouds, according to Karpathy.