The recent advancement in transportation technology is computer vision that is enabled by deep learning. It helps autonomous vehicles in mapping their surroundings, precisely identify objects, and navigate in the real world. This requires massive onboard computing power and is made possible with neural networks. Also, you must know that it is also used in most high-end consumer mobile phones like the iPhone, as it powers AI-powered mobile apps. In this article, we will investigate how a convolutional neural network used in autonomous vehicles work.
What Are Neural Networks?
Neural networks are circuits that are designed to function like neurons in a nervous system, and its basic unit is the neuron. A neuron comprises of the dendrites, the cell nucleus, and axons. The dendrites take input, the cell nucleus process the input, and the axon transmits the processed input. Likewise, in a neural network, a node takes multiple inputs, assigns them before summing up, i.e. the net input. Moreover, the node can adjust the weight assigned to the inputs to get the desired output during training, and this is how they learn. Every neuron in one layer is connected to every other neuron in the next layer.
What Is A Convolutional Neural Network?
A neural network is converted into a convolutional neural network by adding convolutional layers to it. In a convolutional neural network, lower level features like edges are detected by layers close to the input. These features are used to detect complex physical features like face. In short, a convolutional layer gives a neural network the ability to detect useful features. However, this also results in the neural network focusing on too many features.
A pooling layer follows the convolutional layer in a convolutional neural network; it helps to reduce the representation size. It breaks down the input into regions and selects the strongest feature in each region. For example, if the first layer has 1000 nodes and the second has 500 nodes, then there will be a total of 500,000 connections. As a result, the neural network focuses on many features slowing down its computation, and this is avoided using the pooling layer. Also, it makes the features detected in the image clearer. It uses the features detected by the convolution layers to classify the input into different categories based on the data set.
How Do Computers See An Image?
Computers view an image as 3 matrices with a pixel value of 1 for each RGB channel. This feature is used by the convolution process in a convolutional neural network, and it is called kernels or filters. Its logic is that if it can detect a feature in an image, it can probably detect similar features in other images, as well as in different parts of the same image. Moreover, its efficiency is particularly important in the functioning of driverless transport technologies.
Training A Convolutional Neural Network
The first step in training a convolutional neural network (CNN) for autonomous driving is selecting the image for the dataset. They are labeled by the weather conditions, road type, and the driver’s activity, i.e. switching lanes, staying in lane, turning, and so on. For instance, to train the convolutional neural network for lane following, the data of the driver staying in the lane is detected, while others are discarded. Next, the video is sampled at a rate that ensures it does not include highly similar images. Also, a bias towards driving straight is removed in the training of the CNN by including more frames representing road curves.
The final set of selected frames for training the CNN are augmented with artificial rotations and shifts. This helps to teach it to recover from poor orientation or position. These perturbations and their magnitude are selected at random; it is twice of that faced by human drivers.
For a driverless transport system to work, it must process and identify images in a 360-degree dynamic environment. This makes dual image processing a basic requirement as it helps to consider collected and combined frames in context with each other. This is done with a rotating camera or smaller cameras positioned at different parts of the autonomous vehicle. Another way is to use the LIDAR which can map the surrounding world more accurately using multiple types of sensors.
The different types of images that CNN must recognize are metric, symbolic, and conceptual knowledge. Metric knowledge is the identification of static and dynamic objects; it is required to keep the autonomous vehicle in lane and at a safe distance from other vehicles. Symbolic knowledge helps the autonomous vehicle to stick to road rules and classify lanes. Conceptual knowledge is the most important of the three as it helps the vehicle to anticipate the evolution of a driving scene.
The only way to improve an autonomous vehicle is to train it using real-world data, and their performance is improving with advancements in AI.