CHALLENGE: Writing algorithms to automate complex decision making is difficult and time-consuming.
Leverage deep learning with open source libraries, Nvidia hardware, and FLIR cameras.
What is Deep Learning?
Deep learning is a form of machine learning that uses neural networks with many “deep” layers between the input and output nodes. By training a network on a large data set, a model is created which can be used to make accurate predictions based on input data. In neural networks used for deep learning, each layer’s output is fed forward to the input of the next layer. The model is optimized iteratively by changing the weights of the connections between layers. On each cycle, feedback on the accuracy of the model’s predictions is used to guide changes in the connection weighting.
Deep learning is transforming traffic systems by automating processes that were too complex for traditional vision applications. Easy-to-use frameworks, affordable, accelerated Graphics Processing Unit (GPU) hardware, and cloud computing platforms make deep learning accessible to everyone.
Cucumber Sorting Example
Why is Deep Learning taking off now?
GPU accelerated hardware: more power, less cost
The architecture of GPUs, which uses a large number of processors to perform a set of coordinated computations in parallel (known as a “massively parallel” architecture), is ideal for deep learning systems. Ongoing development from Nvidia has resulted in large increases in the power, efficiency, and affordability of GPU-accelerated computing platforms. This technology is available in a range of form factors such as compact embedded systems based on the Jetson TX1 and TX2, PC GPUs like the GTX 1080, and dedicated AI platforms like the Nvidia DGX-1 and Drive PX 2.
Democratization of deep learning frameworks
In addition to the development of easy-to-use frameworks, the widespread availability of tutorials and online courses has contributed to deep learning accessibility. C++ wrappers, including Google’s TensorFlow, and open-source Caffe, Torch and Theano, enable users to quickly build and train their own Deep Neural Networks (DNNs). The general purpose TensorFlow is a great starting point, while Caffe’s GPU optimization makes it an excellent choice for deployment on the Jetson TX1 and TX2. The Nvidia CUDA Deep Neural Network (cuDNN) library provides developers with highly-optimized implementations of common deep learning functions, further streamlining development for these platforms.
Deep Learning for Traffic Systems
While the development of autonomous vehicles attracts a lot of media attention, deep learning has many other traffic applications. Deep learning is used to solve problems on smaller scale systems, like detecting pedestrians and emergency vehicles for traffic signal control, parking management, high-occupancy vehicle lane enforcement, and high-accuracy vehicle and license plate recognition. It is also applied on larger scale systems to solve problems like optimizing traffic flows across cities.
Better prices, shorter leadtimes
The availability of discrete, off-the-shelf cameras and embedded platforms gives traffic system designers the flexibility to tailor systems to fit their projects. Separate cameras and processing hardware enable a simple, independent upgrade path for each component. This ecosystem results in better prices and shorter lead times versus dedicated smart cameras.
How to Implement a System
Training data acquisition
Designers must train a deep learning model before deploying it. High-quality training data is essential to achieving accurate results. High-performance cameras provide the best possible training imagery to systems that make decisions based on visual input .
On-camera image processing simplifies the data normalization required prior to training. Camera features like precise control over auto-algorithms, sharpening, pixel format conversion, and FLIR’s advanced debayering and Color Correction Matrix optimize images. FLIR’s strict quality control during manufacturing minimizes variation in camera performance, reducing the need for pre-training normalization.
For applications that image moving vehicles, global shutter sensors read all pixels simultaneously, eliminating distortion caused by the subject moving during the readout process. Many of FLIR’s machine vision cameras use Sony Pregius global shutter CMOS sensors. They have 72dB of dynamic range and less than 3e- read noise, enabling them to simultaneously capture details in brightly-lit and shaded areas and providing excellent low-light performance.
Low-light applications like indoor parking management benefit from the pixel structure of Back-Side-Illuminated (BSI) Sony Exmor R and Starvis sensors. These devices trade readout speed for greater quantum efficiency, making them small, inexpensive sensors with great low-light performance.
Train on specialized hardware
Once enough training data has been collected, it’s time to train your model. To expedite this process, it is possible to use a PC with one or more CUDA-enabled GPUs, or specialized AI training hardware like the Nvidia DGX-1. Computing platforms that specialize in deep learning are also available.
Once the training of your deep learning model is complete, it’s time to deploy it to the field. Compact and powerful GPU-accelerated embedded platforms enable applications where space and power requirements preclude a traditional PC, and limited internet connectivity necessitates on-the-edge computing. These systems are based on the ARM processor architecture and typically run on a Linux based OS. Information on how to use FLIR’s FlyCapture SDK on an ARM device in a Linux environment is found here.
Many traffic applications rely on systems with more than one camera. With FLIR machine vision cameras, system designers have the freedom to accurately trigger multiple cameras over GPIO or software. The IEEE 1588 Precision Time Protocol (PTP) supported by FLIR's Blackfly S camera family enables camera clock synchronization to a common time base or a GPS time signal with no user oversight. The MTBF of multi-camera systems decreases with every additional camera, making highly reliable cameras critical to building robust systems. The design and testing of FLIR machine vision cameras ensures 24/7 reliability that minimizes downtime and maintenance.
The Nvidia Jetson TX1 and TX2 are powerful and efficient GPU-accelerated embedded platforms that support USB 3.1 Gen 1 and GigE vision cameras. Specialized Jetson carrier boards provide I/O connectivity and application-specific features. The SmartCow TERA+ supports up 8 GigE cameras natively with the use of a managed switch, as well as RS-232 and RS-485 serial communication. SmartCow also provides a Caffe wrapper which streamlines the design and deployment of deep-learning-powered vision applications on the TERA+ hardware. The Connect Tech Cogswell Carrier supports both USB 3.1 Gen 1 and Power Over Ethernet (PoE) GigE cameras. Information on getting started with FLIR cameras on the Nvidia TX 1 and TX 2 is available here .
The Nvidia Drive PX 2 is an open automotive AI platform built around two Pascal GPU cores. Capable of eight TFLOPS, the Drive PX 2 has the equivalent computing power of 150 Macbook Pros. The drive PX 2 is designed to support deep learning applications for autonomous vehicle guidance. In addition to USB 3.1 Gen 1 and GigE vision cameras, it has inputs for cameras using the automotive GMSL camera interface. For information on getting started with the Drive PX 2 click here.