Computer Vision (CV) is a fast developing branch of Machine Learning, that uses images and videos to extract knowledge about the world. Because the sense of sight is so important for humans and we have a lot of actions dependent on it, Computer Vision will become crucial in future automation, visual intensive works like RTG luggage inspection, finding criminals with public cameras or preventing financial fraud using face recognition. This domain will open new areas of development and will help to create new industries. Below we will try to explain what are real Computer Vision applications.
Computer Vision applications:
Object Detection – the most popular computer vision applications
Object Detection is a part of Computer Vision which focuses on detecting various objects on photos like cats, dogs, cars, bikes or humans etc. by extracting features from pixels and deep learning to recognize patterns. One of the main areas of Object Detection is face recognition.
Image and Video Pre-Processing
Advanced CV with use of neural networks can perform image transformations not available for traditional image processing algorithms. As an example we can artificially increase number of trees or remove them without noticing artificial change. It is possible to generate missing part of the photo or change the sky appearance from Earth to Mars. Possibilities of image enhancing and transformation are limitless and require only creation of specialized model for a given task.
Traditionally to detect object on image it was sufficient to select its position by the rectangle. An improvement of this technique is outlining the given object (for example by a slight change in its color) and in that way segment image on different objects where result is an image very similar to the stained glass. This technology will be extensively used in autonomous navigation and radiology (outlining cancerous changes in tissue).
Video and Image Content Indexing
Model trained to detect objects on photos can extract its content and prepare tags automatically. Nowadays inference is so fast that videos can be processed in real time. This technology can be used in personalized advertisement (for example screens in public space) where ads are chosen based on your clothes and objects you carry.
3D Scene Reconstruction
Algorithms of Computer Vision are able to reconstruct 3D object from 2D imagery taken from different angles. As an example we can acquire city model from images gathered by drones or we may create model of the cave based on a movie recorded inside it.
Deep Learning in building Computer Vision applications
Deep Learning (DL) originates its name from large number of layers in neural networks. Thanks to constant development of computing power in recent years it we are available to train more and more complex neural networks with increased number of NN layers. Such sophisticated models better generalize truths hidden in data than “shallow” neural networks.
Computer Vision use special type neural networks called Convolutional Neural Networks. They use convolutional layers which are 2d surfaces learning from correlations between image pixels. CNN watch images multiple times tweaking its parameters constantly to improve outcome.
Real-life examples of Computer Vision applications
Retail Shelf analysis
Automatic product detection allows recognising missing and misplaced products on shelves with comparison to planogram. Aggregated information about shop condition gives opportunity to improve quality of customer service.
Computer Vision can also automate process of discovering illicit items in luggage during customs inspection on customs or airports. Such a mundane task is ideal for Convolutional Neural Networks taking into consideration huge, available data-set.
Automatic video tagging for real time marketing
This technology will improve advertisement industry making it more personalized for example after tagging customer’s favorite brands and gaining deep insight into his preferences we can recommend products with higher probability to be chosen. It is a win-win situation for both customer (more relevant ads) and e-commerce (higher income).
Real estate valuation
Having real estate imagery data with its value we can create model predicting value from new real estate photos, which allows fast comparison of given and predicted price to find investment gems or to find undervalued rent occasions.
Recognizing faces in security systems
Make identification easier for security officers and ordinary people – there is no need for additional cards or keys. Also there is a possibility to determine if somebody is a wanted criminal or not.
Automatic reading personal information from identity cards
This technique protects from misspelling and is much faster than reading information manually. It has potential to simplify maintaining customers database and improve quality of data.
CV techniques use data from cameras to visually check condition of assets for example valves and pipes and compare it with optimal conditions. This information can be transferred to remote maintenance crew, that check any anomaly.
Computer Vision uses data gathered from sensors to drive safely from point A to B. It can automate our commuting habits and make life a lot easier especially for the old and disabled people. Although this technology can increase car usage (hence increase traffic), prevent from accident or it can also reduce number of cars by automating taxi system self-driving cars, so there will be no need to own a car.
Python – the best open source tool for Computer Vision applications
Training Convolutional Neural Networks using Python has become easier with a great abundance of libraries to choose from. Below we present the most popular ones:
Caffe is a framework built especially to be used in deep learning. Developed in Berkeley it is one of the best library for CV where models are not defined in code but in configuration files that can be a drawback for some of us. It was not developed in Python but it provides bindings to it. Caffe is known to be fast, it can inference an image in 1ms and learn from it in 4ms if used e.g. on Nvidia K40 GPU.
Theano is one of the oldest Python library built for operating on multi-dimensional arrays and those allows training neural networks. It is integrated with NumPy, it has efficient symbolic differentiation, possibilities to evaluate expressions faster thanks to dynamic C code generation and can automatically diagnose many types of errors. Its development seems finished in late 2017 but it still a decent library to use for your project.
TensorFlow was designed by Google Brain Team and released as an open source library for abstract (using tensors) numerical computation. It is a low-level library, old enough to have many sophisticated projects using it as a backbone, decent documentation and vast community. TensorFlow main advantage (over Theano) is multi-GPU support. It has two API: low-level (original), and high-level Keras.
Lasagne is built on top of Theano with intention to be simple to understand, use and directly process and return Theano expression or numpy data types. Lasagne allows defining Convolutional Neural Networks, Recurrent Neural Networks and its combinations. It supports CPU and GPU thanks to Theano’s compiler. In terms of library level it is medium between low-level libraries like TensorFlow or Theano and high-level libraries like Keras.
Keras is a high-level library which uses TensorFlow, CNTK or Theano as a back-end. It is officially supported by Google (TensorFlow) who has intercepted its development. Keras positions itself as a CV API for “human beings”, it focuses on simplicity so creating networks is fast and intuitive. Model architecture is divided on fully-configurable modules like neural layers, optimizers (Adam, RMSProp), cost functions etc. It also includes built-in models like ResNet50, InceptionV3 or MobileNet. Keras can be used on multi-GPU systems but it requires more time to configure with using both Keras and Tensorflow API.
MXNet allows using many GPUs in distributed systems. It is also easy to manage where every piece of data should be stored in such systems. This library has also built in methods for fast derivative calculations. Every coded layer has been optimized and now MXNet is one of the fastest available CV library however it takes more time to start modeling compared to Keras.
Real Life Computer Vision Applications
Check our Case Studies to understand better real world use cases of Computer Visions and Deep Learning. Or contact us and get a consultation of how Computer Visions is implemented in real business cases.