Gesture detection using webcam


Problem Definition

The goal of this assignment is to design and implement algorithms that delineate gestures in video images and analyze their movements. The vision-based techniques largely rely on color of the target objects and background. 

Method and Implementation

The three different gestures we chose to do are: waving with one hand, rapidly closing and opening your fist, and making the gesture at different speeds. 

To approach this problem, we first must give the program the ability to pick out the hand (skin). To solve this problem, we are doing a projection on the RGB values to determine if a certain pixel is a skin pixel or not. If the color values fall within a given thresholds, then the program will treat that certain pixel as a skin pixel. The program changes the color of all skin pixels and set everything else to black. This algorithm works best when given a solid colored background that is vastly different than the skin color. 

Once we have isolated the skin area, we will calculate and draw the centroid on the screen. By using the centroid, we can calculate the change in distance between every frame. A moving hand means the location of the centroid will change as well. By looking at the amount of distance change, we can calculate the speed the hand is moving at. We will color the hand with different colors with different speeds (stationary=green, slow=yello, fast=red). To determine the closing and opening of a fist, we use the size of the skin area. An open palm covers a larger area than a closed fist. Between each frame, if there is a big change in the skin area, then that implies someone is opening and closing their fist. At first we printed a score in the console rating the probability of each gesture being performed. Later we dropped this idea and decided to color the hand differently based on the gesture being performed. 


The hand is placed roughly 1 foot in front of the camera. Our program can differentiate between objects and skins. But it cannot differentiate between objects with skin colors. We are conducting the experiments in front of a solid white wall to increase the detection accuracy. 



a green hand means it's stationary 

an orange hand means it's moving slowly

a red hand means it's moving fast

a purple hand means it's changing rapidly in size (making a fist)



Our old method detected skin color in 3 steps. 
1, Set a relatively broad RGB threshold to narrow down the scope of target area. 
2, Use perspective projection on x-axis and y-axis, count the number of skin pixels on both axes. Set thresholds to distinguish skin or background. Color the skin area into white and background into black. There may still exist noisy white blobs and dots. 
3, Redo step 2 to remove white noisy points. 
However, we finally give up this method because it occupies too much CPU resources and slow down our system. We end up adding more constraints on filtering out background color and narrowing down RGB color range, in the original step 1. We didn’t make use of the Step 2 and 3 from the original method. 
The new method changes the color of the hand based on the gesture being performed. This, however is not 100% accurate. A slow moving hand can sometime be confused as a fist opening and closing due to the inconsistency of the hand size being detected. 


Our gesture algorithms are good, but the program does not detect skin very well. Our first skin detection algorithm worked better, but for a tradeoff of slow computation. The second algorithm runs much faster but requires a solid colored background and might not recognize other hands as well as demonstrated. 


Geng Haoqing (Panda)