Introduction to Using OpenCV With Unity
If you’ve been working with Unity for a while, you must have realized how powerful the game engine is. From making simple 2D and 3D mobile games, to full-fledged virtual reality applications, you can do it all with Unity.
However, if you are a fan of playing motion-based games like Kinect Table Tennis or Motion Sports, as a developer, you might have wondered whether it’s possible to make these kinds of games with Unity.
This tutorial will serve as the perfect starting point as you’ll learn about OpenCV (Open Source Computer Vision) — one of the world’s most popular and widely used Computer Vision Libraries — which makes use of your webcam for real-time movement detection. You’ll learn how to use Hand Gesture Recognition in this tutorial, as well as how to communicate and make use of the data generated by the Hand Gesture Recognition in order to control a player in your Unity game.
Specifially, you’ll learn:
- Basics of communication protocols in Networking.
- What is OpenCV and some of its applications.
- How to install, set up and run the Python version of OpenCV on your system.
- How to send data to a specific port via Sockets in Python.
- How to read data from a Socket with the help of UDP Client in Unity.
You’ll need the following for this tutorial:
- The latest stable copy of Unity installed on your machine — currently 2018.2.
- A code editor of your choice.
- Python and OpenCV installed (installation instruction in the appropriate section).
For now, assuming you have Unity set up, Download the project materials using the “Download Materials” link at the top or bottom of this tutorial.
Unzip the contents. Inside the Starter folder, you will find two more folders:
- Unity3D, which contains the Unity Project.
- Python, which contains Recognition.py and requirements.txt.
Open the Unity3D folder as a Project with Unity.
Exploring the Starter Project
Take a look at the folder structure in the Project window:
Here’s what each contains:
- Animation Controllers: Contains the Animation Controller for the Player.
- Animations: Contains the Idle and Jumping animations for the Player.
- Materials: Contains the necessary materials required for this tutorial.
- Prefabs: Contains the Models for the Gym and the Player.
- Scenes: Contains the Main scene.
- Scripts: Contains the PlayerController script, which will store the OpenCV logic for controlling player movement.
- Sounds: Music and sound files for the project are kept here.
- Textures: Contains the main textures for the project.
Setting Up Your Scene
Open the Main scene inside the Scenes folder.
You’ll see a gym, some nice motivational posters (all gyms need them) and the star of the show: Mr. Jumpy McJumper.
To save you the hassle of fiddling with the Transform values of the GameObjects in the scene, you already have all the right things in all the right places.
If you click the Play button at this moment, there should be background music playing. Additionally, you’ll see the player in his Idle animation, and not much else going on at this moment.
You should also see the PlayerController under Managers, with a PlayerController script attached to it. This is the file you’ll add all your code to later on.
In this tutorial, you’ll use Python to create a virtual server and use OpenCV to detect the number of fingers of your hand as recorded by the webcam, and you’ll use Sockets to send that information to a predefined ‘port’.
Once that is working, you’ll use an UDP Client, which is part of the System.Net.Sockets class library, to receive that data in the Unity project.
Understanding Key Concepts
Before moving on to the actual coding, take a look at the underlying concepts that will be applied throughout the rest of this tutorial.
Communication Protocols and UDP
In order to understand how the Python-OpenCV script and your Unity instance will communicate with each other, it’s important to know about the underlying principles of how data is shared between two or more applications in computer systems.
Communication protocols are formal descriptions of digital message formats and rules. They are required to exchange messages in or between computing systems and are required in telecommunications.
Two commonly used Protocols are TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).
TCP is the dominant protocol used for the bulk of internet connectivity owing to the services it provides for breaking large data sets into individual packets, checking for and resending lost packets, and reassembling packets into the correct sequence — sending and receiving email (SMTP) and browsing the web, for example.
In contrast, applications that communicate using the UDP protocol just send the packets and don’t check or wait for an acknowledgement before sending the next packet, which means that they have much lower bandwidth overhead and latency.
Some common real-world applications of UDP: Tunneling/VPN, Live Media streaming services (when it is OK if some frames are lost) and most commonly in online multiplayer games like Counter Strike:Global Offensive or DOTA 2.
Now that you understand a little bit about how the data will be communicated between Unity and our Python instance, here’s a little bit about OpenCV and how it will help in detecting the fingers of your hand.
OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage, then Itseez (which was later acquired by Intel). The library is cross-platform and free for use under the open-source BSD license. OpenCV also supports the deep-learning frameworks TensorFlow, Torch/PyTorch and Caffe.
In this tutorial, you will use the Python API for OpenCV to detect the number of fingers your hand displays when it is open as opposed to when you make a fist (zero fingers). It will then send a message to a predefined port using sockets, which will be used to trigger an action in the Unity project.
A broad overview of the steps performed by OpenCV to detect the number of fingers:
- Frames are captured from the camera (webcam) as images.
- Gaussian Blur is applied to the image to blur the image. This is done to reduce the noise in the image and to separate out the outlines of the fingers for the later processing stages. (Left : Normal Image, Right : Blurred Image)
- A binary image (by removing all but two colors from the image) is created, wherein the skin color is replaced with white and everything else is replaced with black.
- Contour detection is applied to find the defective areas of the hand.
- Based on the number of contours detected the number of fingers is calculated.
You can read through the Recognition.py script to get an idea of what each line of code is doing.
Once the number of fingers is recognized, the Socket library is used to send the relevant data via UDP to port number 5065 (line number 128 in Recognition.py).