Computer Vision — How does it work?

Hami Ismail
4 min readApr 18, 2021

--

Image Source: https://www.pcmag.com/news/what-is-computer-vision

As introduced in previous article, Artificial Intelligence can be break down into several subtopics where Computer Vision is one of them. Computer Vision or CV is made up of two words: Computer and Vision. The individual meaning of the words are obvious, hence the combination is simply means something about computers perceiving visual information or how computers “see” the world.

At an abstract level, the goal of computer vision problems is to use the observed image data to infer something about the world.

— Page 83, Computer Vision: Models, Learning, and Inference, 2012.

Computer Vision is not Image Processing

The goal of Image Processing is to produce an Output Image, with given Input Image. The Input Image are “processed” such that it produce a desirable output image. The “process” can be “smoothing”, “sharpening”, “contrasting” and “stretching”.

On the other hand, Computer Vision’s goal is to “have a look” at the Input Image, and perform human-like tasks such as classify the image, finding objects, human language character recognition and so on.

Below is an interesting diagram to see the difference of closely related jargons:

Image Source: https://www.geeksforgeeks.org/digital-image-processing-basics/

Computer Vision Popular Tasks

Now, Computer Vision is more widely used than ever before and you usually already have a couple of them at your smart phone. If you set up Face Recognition login on your phone, you have been using them already! Remember that sometimes your phone classify pictures into each different person? That is another score!

Below are popular CV subtasks in general that has been used in several application developments:

  • Object Classification: What broad category of object is in this photograph?
  • Object Identification: Which type of a given object is in this photograph?
  • Object Verification: Is the object in the photograph?
  • Object Detection: Where are the objects in the photograph?
  • Object Landmark Detection: What are the key points for the object in the photograph?
  • Object Segmentation: What pixels belong to the object in the image?
  • Object Recognition: What objects are in this photograph and where are they?
Image Source: https://indatalabs.com/blog/how-does-computer-vision-work#:~:text=By%20definition%2C%20computer%20vision%20mimics,tasks%2C%20replicate%20natural%20neural%20networks.

How Computer Really See Images?

Computer does not see images like human does. Computer works with numbers and logics, now and then — this fact will not change. Therefore, the very first thing a Computer would do when having an image is to convert them into logical numbers or binary, such illustrated as below:

Image Source: https://indatalabs.com/blog/how-does-computer-vision-work#:~:text=By%20definition%2C%20computer%20vision%20mimics,tasks%2C%20replicate%20natural%20neural%20networks.

That is merely the input part. For interpreting those images, usually a trained model at the backend would be used. In AI world, a model means is a file that has been trained to recognize certain types of patterns.

To train the model, bulk loads of images which properly labelled are supplied to the model. Deep learning process are being applied for training models, but we are not going to that in detail now.

After training, the final model output is produced and it is ready for the tasks. Models are usually in the form of Hierarchical Data Format version 5(HDF5) with file extension of “.h5". Based in the trained model, Computer will use it to perform whatever task the model supposed to do. Note that each model is unique based on the purpose it’s designed. Among popular state of art model that has been designed for Computer Vision are: YOLO, CNN, RCNN, and R-FCN.

In the next article, we will dive a little bit deeper into Machine Learning and Deep Learning, as it can be fundamental to understand the underlying concept of Computer Vision models.

See you in the next one!

References:

https://indatalabs.com/blog/how-does-computer-vision-work#:~:text=By%20definition%2C%20computer%20vision%20mimics,tasks%2C%20replicate%20natural%20neural%20networks.

https://docs.microsoft.com/en-us/windows/ai/windows-ml/what-is-a-machine-learning-model#:~:text=A%20machine%20learning%20model%20is,and%20learn%20from%20those%20data.&text=Windows%20Machine%20Learning%20uses%20the,ONNX)%20format%20for%20its%20models.

--

--

Hami Ismail

A working professional of two worlds: Engineering Asset Management on the right hand, Artificial Intelligence on the left hand