Vehicle movement detection with Deep Learning Image Classification

Vehicle movement detection with Deep Learning Image Classification

Marcin Budny
March 10, 2020  | 6 min read

In this article, we will show you how we started to detect the direction of vehicle movement from still images with deep learning image classification. This will be shown in a specific case study of the Norwegian parking management solution.

Authors: Marcin Budny, Mateusz Walo

Deep learning: the new level of image classification

Computers can nowadays perform tasks that just a decade ago could only be handled by humans. One such task is accurate image recognition. We previously had algorithms that served this purpose, but deep learning took image classification to a whole new level.

This machine learning algorithm has an advantage over traditional computer vision techniques because it does not require manual feature engineering. When recognizing images of cats, you don’t have to explicitly tell the model to look for cat ears or whiskers in the image. You let the model figure out the important visual features on its own.

It is possible to get better results faster, with less human work involved. Of course, if enough data is available for the training. That is why this approach gained so much popularity in recent years.

Will deep learning image classification help detect a vehicle movement?

We were faced with an interesting problem with the parking solution we've been developing for one of our Norwegian partners. In this solution, ALPR cameras are used to take pictures and to read the license plate numbers of vehicles coming in and out of parking facilities.

In some deployments, we were also relying on the camera to give us the direction in which a vehicle was moving (towards the camera or away from it).

But cameras have limited capabilities. Since we only have pictures and not a video stream, we can't really tell the actual direction of movement. In other words, in some cases, cameras are not able to determine the direction reliably.

So the question is: will deep learning image classification help us?

The solution for 99% of image recognition problems

What if we had a component in our system that looks at the problematic pictures and automatically determines if the front or rear of a vehicle is visible? That would solve 99% of the problematic cases. The last 1% would be vehicles backing off from the camera, which doesn't happen that often.

Our problem seemed to be a fairly straightforward image classification task. We were pretty sure that we could get good results with a modern deep learning model and transfer learning. There was one caveat though: the picture quality.

What about low image quality?

The ALPR camera highlights the part of the image it is most interested in: the license plate. It uses infrared flash for this purpose because license plates have a special reflective surface. This is also the reason for the monochrome picture. Apart from the LPN, the only parts of the vehicle visible to the naked eye are elements emitting light and other reflective surfaces.

So in the first image, you can see the front of a car, judging by the headlights.


The second image may also show the front of the car, but it is not obvious.


The last one is most probably rear if the reflective elements on the bumper are considered.


These images are a bit challenging to label, right? Fortunately, it turns out the important visual features of the vehicle - such as the shape of lamps, lines of the car body - are actually in the picture. We just can't see them right away.

Human eye vs deep learning

Let's try the last picture again, this time with brightness enhanced 10x and contrast 2x. Now we can definitely confirm that this is a picture of the rear of a car.


As it later turned out, these brightness and contrast enhancements are only needed for humans to label the data. The deep learning model can do just fine on the raw images.

Training the image classification model

Knowing that important visual features of vehicles are present in the pictures, we could plan for testing different deep learning models:

We experimented with random image augmentations (horizontal flip, rotation, skew), image size fed to the network and training hyperparameters.

Proper dataset

We needed a dataset large enough to fine-tune an image classification model. Also, we knew that having just the front and rear class won't be enough, because a kind of "negative" class is also needed. That one would represent situations where:

  • there is no vehicle in the picture (ALPR camera was triggered incorrectly)
  • the vehicle is in the picture, but it is impossible to tell whether the front or rear is visible
And this is how we introduced a class called "unknown".
coming-up-with-classesWe iterated several times with the dataset, correcting labels and extending its size. It turned out that pictures coming from different facilities have different characteristics and we need to:

  • balance the number of pictures coming from different facilities,
  • balance the number of pictures in each of the classes.

Here's an overview of the dataset size in subsequent versions: On each dataset, 80% / 20% train/validation split was applied.


What tools did we use for deep learning image classification?


PyTorch advantages in deep learning image classification were numerous. First of all, the framework provides a very good developer experience. It offers a lot of pre-trained models and makes using them really easy. It provides a set of configurable image transformations so it is easy to augment the dataset with randomly modified images. Integration with CUDA and running models on multiple GPUs is also straightforward.

Having worked a little with Tensorflow 1.x previously, we've seen a tremendous difference in the ease of use, although that changes with Tensorflow 2.0.

One issue we had with PyTorch was the need to translate from torch representation to NumPy representation and from PIL images to NumPy arrays. A lot of existing libraries and code samples assume usage of NumPy.


For training, we used Azure NC12 instance that comes with 2x half of Tesla K80 GPU. This is essentially a single GPU but visible as two GPUs in the OS.

At the time we worked on the project, Microsoft offered promotional prices for their Tesla K80 machines and we were able to use NC12 instance for as little as €0.87/hour (and this includes 12 cores and 112GiB of RAM).

Other tools

Of course, Jupyter Lab was very useful, but another solution for remote work on the notebooks blew our minds: VS Code with remote SSH access.

The Python extension for VS Code supports notebooks and you can use the external computation power of a cloud instance from the comfort of your development machine.

The results

In order to know how well the model performed, we kept track of several metrics.


  • 95% Accuracy

Accuracy tells us how often the model prediction is correct. Most of the time the results were very satisfying - we were able to reach 95% accuracy rate.

It is important to take note of the drop in accuracy rate for dataset 2. This dataset was created as a sort of side experiment. It contained around 200 images where the camera was not able to detect the direction of the car movement.

An important factor was that these images came from multiple different parking facilities, while the main dataset 1.9 was dominated by images from one or two large facilities. Poor accuracy on dataset 2 prompted us to work on a better balance of image sources in later versions of the main dataset.

  • 92-94% Precision

Precision is a ratio of true positives in regards to all positives predicted. In other words, it tells us how many predictions for a given class were correct. We achieved a high precision rate of around 92 to 94 percent.

It is important to note that precision is calculated for each class separately. The chart shows the average result of all classes.

  • 92-95% Recall
Recall informs us what the rate of correct findings is for a given class. Here we also got very high results: from 92 up to 95 percent.

Deep learning image classification techniques

One of the challenges in deep learning image classification is to understand why the model decided to assign an image into a given class. Grad-CAM and Guided Grad-CAM are techniques, which allowed us to create a heatmap of the image.

Regions that were crucial for the model decision are highlighted, so it is easy to explain on what basis the decision was made. This gives us better confidence in a model's ability to make correct predictions outside of the training dataset.

As you can see in the following pictures, the model performed predictions based on the most relevant features: the lamps, bumpers and license plates. The last image is especially interesting because we can learn which car the model is looking at.


gradcam-image-classification-good-result-2Also, we know now that the model is not paying attention to the horizontal road signs showing the lane direction.gradcam-image-classification-good-result-3We can also visualize the situations where the model failed to classify correctly.gradcam-error-1The model failed to capture important features of the vehicle which caused invalid predictions.


The deployment concerns

While it is possible to build a simple Python service in order to expose the deep learning model with a REST API, there are concerns that need to be addressed.

Things like:

  • parallelizing model usage according to underlying machine capabilities,
  • queueing requests,
  • packaging and versioning the model.
A project we found useful was Amazon's Multi Model Server. Its main focus is the model produced by AWS Machine Learning services, but it is generic enough to support other use cases as well. In particular, they have a PyTorch example (it is a bit outdated). With a little bit of work, we were able to get it up and running.

Inference time

To find out how fast the model can do its job, we benchmarked a classification of 100 images. Tests were performed on both CPU (12 cores) and GPU (half of Tesla K80).

As shown on the chart, GPU (graphics processing unit)  outperforms CPU (central processing unit) in the image classification task (ResNet34 model). However, inference on CPU is also possible if the cost of the cloud instance becomes a factor for the final selection of the processing method.


Future challenges

The main future challenge is to support the model that runs in production. Pictures may change their characteristics over time with the change of seasons, facility lighting conditions and new models of cars.

The focus is to develop techniques to continuously monitor the quality of the results and improve the model.

If you have questions concerning deep learning image classification, machine learning or if you are interested in similar projects, contact us.