HAND GESTURE :
FOR HUMAN MACHINE INTERACTION
Applied Electronics and Instrumentation
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
COLLEGE OF ENGINEERING
HAND GESTUR- FOR HUMAN MACHINE INTERACTION.rar (Size: 711.55 KB / Downloads: 512)
This paper presents a new approach for Human Machine Interaction(HMI) by merely showing hand gestures in front of a camera. With the help of this technique one can pose a hand gesture in the vision range of a robot and corresponding to this notation, desired opertation is performed by the machine. Simple video camera is used for computer vision, which helps in monitoring gesture presentation. This approach consists of four modules: (a) A real time hand gesture formation monitor and gesture capture, (b) feature extraction, © Pattern matching for gesture recognition, (d) Command determination corresponding to shown gesture and performing operation by the machine. Real-time hand tracking technique is used for object detection in the range of vision. If a hand gesture is shown for one second, the camera captures the gesture. Object of interest is extracted from the background and the portion of hand, representing the gesture, is cropped out using the statistical property of hand. Extracted hand gesture is matched with the stored database of hand gestures using pattern matching. Corresponding to the matched gesture, action is performed by the machine. This system can be used for controlling robots such as movement of robots, operations of robots etc...
Interpretation of human gestures by a computer is used for human-machine interaction in the area of computer vision. The main purpose of gesture recognition research is to identify a particular human gesture and convey information to the user pertaining to individual gesture. From the corpus of gestures, specific gesture of interest can be identified, and on the basis of that, specific command for execution of action can be given to the machine. Overall aim is to make the computer to understand human body language, thereby bridging the gap between machine and human. Hand gesture recognition can be used to enhance human– computer interaction without depending on traditional input devices such as keyboard and mouse. Hand gestures are extensively used for telerobotic control and applications. Robotic systems can be controlled naturally and intuitively with such telerobotic communication. A prominent benefit of such a system is that it presents a natural way to send geometrical information to the robot such as: left, right, etc. Robotic hand can be controlled remotely by hand gestures. Research is being carried out in this area for a long time. Several approaches have been developed for sensing hand movements and corresponding by controlling robotic hand.
Glove based technique is well-known means of recognizing hand gestures. It utilizes sensor-detached mechanical glove devices that directly measure hand and/or arm joint angles and spatial position. Although glove-based gestural interfaces give more precision, it limits freedom as it requires users to wear cumbersome patch of devices. Jae-Ho Shin used entropy analysis to extract hand region in complex background for hand gesture recognition system. Robot controlling is done by Fusion of Hand Positioning and Arm Gestures using data glove. Although it gives more precision, it limits freedom due to necessity of wearing gloves. For capturing hand gestures correctly, proper light and camera angle are required. The problem of visual hand recognition and tracking is quite challenging. Many early approaches used position markers or colored bands to make the problem of hand recognition easier, but due to their inconvenience, they cannot be considered as a natural interface for the robot control. We have proposed a fast as well as automatic hand gesture detection and recognition system. This approach of gesture identification On the basis of recognized hand gesture can be used in any robotic system or machines with a number of specific commands suitable to that system.
Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current focuses in the field include emotion recognition from the face and hand gesture recognition. Many approaches have been made using cameras and computer vision algorithms to interpret sign language. However, the identification and recognition of posture, gait, human behaviors is also the subject of gesture recognition techniques.
Gesture recognition can be seen as a way for computers to begin to understand human body language, thus building a richer bridge between machines and humans than primitive text user interfaces or even GUIs (graphical user interfaces), which still limit the majority of input to keyboard and mouse.
Gesture recognition enables humans to interface with the machine (HMI) and interact naturally without any mechanical devices. Using the concept of gesture recognition, it is possible to point a finger at the computer screen so that the cursor will move accordingly. This could potentially make conventional input devices such as mouse, keyboards and even touch-screens redundant.
Gesture recognition can be conducted with techniques from computer vision and image processing.
A gesture recognition system could be used in any of the following areas:
• Man-machine interface: using hand gestures to control the computer mouse and/or keyboard functions.
• 3D animation: Rapid and simple conversion of hand movements into 3D computer space for the purposes of computer animation.
• Visualisation: Just as objects can be visually examined by rotating them with the hand, so it would be advantageous if virtual 3D objects (displayed on the computer screen) could be manipulated by rotating the hand in space
• Computer games: Using the hand to interact with computer games would be more natural for many applications.
• Control of mechanical systems (such as robotics): Using the hand to remotely control a manipulator.
There are many challenges associated with the accuracy and usefulness of gesture recognition software. For image-based gesture recognition there are limitations on the equipment used and image noise. Images or video may not be under consistent lighting, or in the same location. Items in the background or distinct features of the users may make recognition more difficult.
The variety of implementations for image-based gesture recognition may also cause issue for viability of the technology to general usage. For example, an algorithm calibrated for one camera may not work for a different camera. The amount of background noise also causes tracking and recognition difficulties, especially when occlusions (partial and full) occur. Furthermore, the distance from the camera, and the camera's resolution and quality, also cause variations in recognition accuracy.
Proposed methodology is able to use live video camera for gesture identification. It sniffs frames of live video stream in some time interval. In our case frame capture rate for gesture search is 3 frames per second. Proposed technique to control robotic system using hand gesture display is divided into four subparts:
• Capture frame containing some gesture presentation.
• Extract hand gesture area from captured frame.
• Determine gesture by pattern matching using PCA algorithm
• Determine control instruction, corresponding to matched gesture, and give that instruction to specified robotic system.
The block diagram above shows the flow diagram of whole system, i.e. performing hand gesture identification and robot control. Gesture is captured by taking a snap shot from a continuous video. The captured image is searched for a valid hand gesture. The region showing the gesture is then cropped out and the image is resized to match with the gestures in the database. On the basis of gesture, identified by pattern matching, control instruction is determined from the stored instructions set. The selected instruction set, corresponding to recognized hand gesture is given to robot for carrying out the control action.
Hand Gesture Recognition
Human hand gestures are a set of movements of the hand and arm which range from the simple action of pointing at something to the complex ones used to communicate with other people. Understanding and interpreting these movements requires modeling them in both spatial and temporal domains. Static configuration of the human hand which is called hand posture and its dynamic activities are vital for human compute interaction.
Psychological studies show that a hands gesture consists of three phases. These phases are: Preparation, Nucleus, and Retraction. The preparatory phase is to bring the hand from its resting state to the starting posture of the gesture. This phase sometimes is very short and sometimes it is combined with the retraction phase of the previous gesture. The nucleus contains the main concept and has a definite form. The retraction phase shows the resting movement of the hand after completing the gesture. Retraction may be very short or not present if the gesture is succeeded by another gesture. The preparatory and retraction phases are generally short and the hand movements are faster compared to the nucleus phase.
Several classifications have been considered for hand gestures in the literature. One taxonomy which is more suitable for human computer interaction applications divides hand gestures into three groups. These groups are: communicative gestures, manipulative gestures, and controlling gestures. Communicative gestures are intended to express an idea or a concept. These gestures are either used together with speeches or are a substitute for verbal communications which on the other hand requires a high structured set of gestures such as those defined in sign languages. Manipulative gestures are used for interaction with objects in an environment. These gestures are mostly used for interaction in virtual environments such as tele operation or virtual assembly systems however; physical objects can be manipulated through gesture controlled robots. Controlling gestures are the group of gestures which are used to control a system or point and locate and object. Finger Mouse is a sample application which detects 2D finger movements and controls mouse movements on the computer desktop. Analyzing hand gestures is completely application dependant and involves analyzing the hand motion, modeling hand and arm, mapping the motion features to the model and interpreting the gesture in a time interval.
Hand gestures can be divided into two categories. Static gestures utilize only spatial information and dynamic gestures utilize both spatial and timed information. With static gestures, as number of predefined gestures is increased, the differences between gestures become harder to distinguish. In the case of dynamic gestures, they are easier and more comfortable to express and larger number of gestures can be predefined, but there are some difficulties with extracting proper data from load of meaningless information.
Glove based techniques and computer vision techniques are the two well-known means of recognizing hand gestures. The first utilizes sensor-detached mechanical glove devices that directly measure hand and/or arm joint angles and spatial position. But glove-based gestural interfaces require users to wear cumbersome patch of devices. The latter approach suggests using a set of video cameras and computer vision techniques to interpret gestures providing more natural way of interactions. However, since it is troublesome to analyze hand movements and recognize postures from complex images, methods such as putting certain colored marker on hands or wearing special types of gloves in restricted set of backgrounds are widely acknowledged limitations. In this paper, we propose a method of hand gestures recognition based on computer vision techniques but, without restricting backgrounds or using any markers.
4.1 Finding A Hand Gesture
Gesture finding from online Camera: From video stream one frame is captured in each 1/3 second. Target is to identify the frame that contains hand gesture shown by human. For this we are searching the frame in which there is no movement. Required frame is identified by comparing three continuous captured frames. Motion parameter is determined for each frame by counting total pixels of mismatch. If motion parameter is less than the specified threshold value, it is considered as a frame having less movement of object i.e. the frame contains some hand gesture, user wants to show. Analysis of frames to find the frame of interest is done by converting captured frame into a binary frame. Differences between newly captured frame and two previously captured frames are determined and they are added together to find motion parameter. Differences between values of corresponding pixels are counted with both frames and added to find motion parameter. Since binary image has values of one or zero, XOR function can give locations where mismatches occur. If frame1, frame2, and frame3 are three matrixes containing three frames captured in three continuous time slots respectively then:
fr1= frame1 XOR frame3
fr2 = frame2 XOR frame3
mismatch_matrix = fr1 OR fr2
Here r and c are the number of rows and columns in image frames. Threshold value is set as 0.01. i.e. if total pixels of mismatch is less than 1% of total pixels in a frame, then it is considered as frame of interest. Required frame is forwarded for further process.
4.2 Hand Gesture Extraction
The frame with a gesture contains extra part along with required part of hand i.e. background objects, blank spaces etc. For better result in pattern matching, unused part must be removed. Therefore hand gesture is cropped out from obtained frame. Cropping of hand gesture from the obtained frame contains three steps: First step is to convert selected frame into black-and-white image using global thresholding. Second step is to extract object of interest from the frame. In our case, object of interest is the part of human hand showing gesture. For this, extra part other than the hand is cropped out so that pattern matching can give more accurate results. For cropping extra parts row and column number is determined, from where object of interest appears. This is done by searching from each side of binary image and moving forward until white pixels encountered are more than the offset value. Experimental results shows that offset value set to 1% of total width gives better result for noise compensation. If size of selected image is mXn then:
• Min_col= minimum value of column number where total number of white pixels are more than Hor_offset.
• Max_col= maximum value of column number where total number of white pixels are more than Hor_offset.
• Min_row= minimum value of row number where total number of white pixels are more than Ver_offset.
• Max_row= maximum value of row number where total number of white pixels are more than Ver_offset.
Third step is to remove parts of hand not used in gesture presentation i.e. removal of wrist, arm etc. Because theses extra parts are of variable length in image frame, pattern matching with gesture database gives unwanted results, due to limitations of gesture database. Therefore, parts of hand before the wrist need to be cropped out.
Statistical analysis of hand shape shows that either we pose palm or fist, width is lowest at wrist and highest at the middle of palm. Therefore extra hand part can be cropped out from wrist by determining location where minimum width of vertical histogram is found. Figure 3.c and 3.d show global maxima and cropping points for hand gestures in figure 3.a and 3.b respectively.
Cropping point is calculated as:
Global Maxima = column number where height of histogram is highest (i.e. column number for global maxima as shown in figure 3).
Cropping point = column number where height of histogram is lowest in such a way that cropping point is in between first column and column of Global Maxima
If gesture is shown from opposite side (i.e. from other hand) then cropping point is searched between column of Global Maxima and last column. Direction of the hand is detected using continuity analysis of object during hand gesture area determination. Continuity analysis shows that whether the object continues from column of Global maxima to first column or to last column. i.e. whether extra hand is left side of palm or right side of palm.
4.3 Using 3-D Images
The precise hand region can be obtained through the fusion of the hand geometric features and the 3D depth information in real-time. The gesture analysis can be generally classified into two categories: one is the 2D image data, the other is the 3D depth data. By using the 2D camera, many vision-based algorithms are exploited to realize hand tracking and gesture recognition. However, the performance will be decreased tremendously when confronted with complicated environment, such as complex backgrounds or variable illuminations. Therefore a more intuitive way is using the 3D depth data to eliminate the noise. To get the useful 3D depth information, one kind of method is to use more than one camera, i.e. the stereo vision methods, such as [8–11]. And another alternative way is to use the 3D sensor, which can provide the dense range image.
With the depth data captured by 3D camera, the first step is to detect coarse hand region. For this purpose, some candidate regions can be acquired by pursuing the most frontal connected regions in depth range image. The obtained hand region usually contains not only hand, but also forearm part.
In most HCI environments, such assumption that the depth of hand is smaller than forearm is usually holds. Therefore, the rough hand location can be estimated through the statistical information of the pixels with smaller or shallower depth. Here, we use the mean position of these pixels as the rough hand center
where dpi is the depth of the pixel pi and dT is the depth corresponding to the N-nearest pixels.
The palm location in the coarse hand region is determined by extracting circle features. Here the scale space feature detection is adopted.The scale space representation of image I is shown as
is a Gaussian kernel with scale t . X = (x, y) and ξ = (x, y) are coordinates of pixel in image. The circle features are detected as local maxima in scale space of the square of the normalized Laplacian operator ,BnormS = t(∂xx + ∂yy), here S is the scale space representation. The radius r of detected circle feature is proportional to its scale. For planar hand shape, the scale-space feature detection is effective to find palm area. In practice, the detection is executed at a wide range of scale to find palm of any size. So there are always many circle features detected in coarse hand region To find the exact circle feature for palm area, several steps are taken as follows:
(1) Among all the extracted circles, only the circle features having strong response to detectors are maintained to form a cluster ball.
(2) Select the circle feature bmax with maximum scale tmax in the cluster ball .
(3) Denote the circle feature bmax with its center P, if the distance between P and C is below a threshold ( set to 0.5r experientially, here r is the radius of circle corresponding to bmax), then bmax is the desired circle feature for palm. Otherwise, delete bmax from cluster ball and jump to step (2).
Forearm cutting:- Since palm has been located in the coarse hand region, the remaining problem is to determine the cutting direction and cutting position with the results of above steps. The forearm cutting can be implemented in the following steps:
1) Determine whether the cutting direction is horizontal or vertical according to the aspect ratio of the coarse hand region:r = w/h, where w and h are width and height of bounding rectangle of this region respectively. If r < T , then forearm cutting is along a horizontal direction, on the contrary, a vertical direction is determined. Here, T is a pre-defined threshold, and in our case T = 1.2. 2) The spatial relationship between P and C, as given below, is used to determine the cutting position.
(a) If r < T and C is above P ,then the forearm is cut at the bottom of palm circle in horizon.
(b) If r < T and C is below P, then the forearm is cut at the top of palm circle in horizon.
© If r ≥ T and C is in the left of P , then the forearm is cut at the right end of palm circle in vertical.
(d) If r ≥ T and C is in the right of P, then the forearm is cut at the left end of palm circle in vertical.
Finally, morphological close-open operation is performed to the resulted binary image to get more elaborate hand region.
Many applications in image processing and computer vision require finding a particular pattern in an image. This task is referred to as pattern matching and may appear in various forms. Some applications require detection of a set of patterns in a single image, for instance, when a pattern may appear under various transformations or when several distinct patterns are sought in the image. Other applications require finding a particular pattern in several images. The pattern is usually a 2D image fragment, much smaller than the image. In video applications, a pattern may also take the form of a 3D spatio-temporal fragment, representing a collection of 2D patterns.
The pattern matching task can be regarded as a degenerated classification problem where a nonpattern class is to be distinguished from a single point in the pattern class or where the probability distribution of the pattern class is Gaussian. Nevertheless, this task is required in other applications beyond the scope of classification. In recently popular patch-based texture synthesis methods, a massive search for specified patterns is applied for generating a new texture patch which is perceptually similar to an example texture. Due to the high complexity and time consuming requirements of this task, several approximated search methods were suggested . In other studies, pattern matching schemes are used in image denoising, image sharpening and resolution enhancement, texture transfer, image compression , and imagebased rendering.
Finding a given pattern in an image is typically performed by scanning the entire image and evaluating the similarity between the pattern and a local 2D window about each pixel.
5.1 Principal Component Analysis
Principal component analysis (PCA) involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. Depending on the field of application, it is also named the discrete Karhunen–Loève transform (KLT), the Hotelling transform or proper orthogonal decomposition (POD).
With minimal additional effort PCA provides a roadmap for how to reduce a complex data set to a lower dimension to reveal the sometimes hidden, simplified structure that often underlie it. Our approach treats gesture recognition as a two dimensional recognition problem, taking advantage of the fact that gestures are normally upright and thus may be described by a small set of 2-D characteristics values.
The starting point of numerous studies and lines of investigation were algorithms based on principal component analysis. PCA based algorithm are popular because of the ease of implementing them and their reasonable performance level. PCA is a statistical method for reducing the dimensionality of the data set while retaining the majority of the variation present in the data set.
5.2 Gesture Matching
After cropping out the gesture, cropped gesture is scaled to fit in frame of 60X80 pixels so that it can be matched with databases without distorting original shape of hand gesture. Figure 4.b shows cropped hand gesture part from figure 4.a. And figure 4.c shows hand gesture part after scaling and fitting with 60X80 pixels without changing the aspect ratio of cropped input gesture of figure 4.a Figure 4.d represent final cropped and resized gesture of desired dimension.
Principal Component Analysis (PCA) method is used for pattern matching. There are two reasons behind using PCA:
(i) PCA method is suitable for pattern matching as human hand is used for gesture expression and components present in hand (e.g. fingers, palm, fist, etc.) are large enough, when compared to noise.
(ii) PCA method is very fast compared to neural network method which requires training database and more time along with high computation power.
CONTROL INSTRUCTION GENERATION
Different functions corresponding to each meaningful hand gesture are written and stored in database for controlling robot or machine. Whenever a gesture is matched with meaningful gesture from database, instruction set corresponding to that gesture is identified and passed to robot for execution of those commands. In this way robotic system can be controlled by hand gesture using live camera as eyes. Figure 5 shows movements of robotic hand corresponding to specific hand gestures.
Figure 7: Movement of PUMA 762 corresponding to some specific gesture
This system can be extended to different application by making corresponding changes in the control instruction generated for each meaning full gesture recognized.
USES OF GESTURE RECOGNITION
Gesture recognition is useful for processing information from humans which is not conveyed through speech or type. As well, there are various types of gestures which can be identified by computers.
Sign language recognition. Just as speech recognition can transcribe speech to text, certain types of gesture recognition software can transcribe the symbols represented through sign language into text.
For socially assistive robotics. By using proper sensors (accelerometers and gyros) worn on the body of a patient and by reading the values from those sensors, robots can assist in patient rehabilitation. The best example can be stroke rehabilitation.
Directional indication through pointing. Pointing has a very specific purpose in our society, to reference an object or location based on its position relative to ourselves. The use of gesture recognition to determine where a person is pointing is useful for identifying the context of statements or instructions. This application is of particular interest in the field of robotics.
Control through facial gestures. Controlling a computer through facial gestures is a useful application of gesture recognition for users who may not physically be able to use a mouse or keyboard. Eye tracking in particular may be of use for controlling cursor motion or focusing on elements of a display.
Alternative computer interfaces. Foregoing the traditional keyboard and mouse setup to interact with a computer, strong gesture recognition could allow users to accomplish frequent or common tasks using hand or face gestures to a camera.
Immersive game technology. Gestures can be used to control interactions within video games to try and make the game player's experience more interactive or immersive.
Virtual controllers. For systems where the act of finding or acquiring a physical controller could require too much time, gestures can be used as an alternative control mechanism. Controlling secondary devices in a car, or controlling a television set are examples of such usage.
Affective computing. In affective computing, gesture recognition is used in the process of identifying emotional expression through computer systems.
Remote control. Through the use of gesture recognition, "remote control with the wave of a hand" of various devices is possible. The signal must not only indicate the desired response, but also which device to be controlled.
In today’s digitized world, processing speeds have increased dramatically, with computers being advanced to the levels where they can assist humans in complex tasks. Yet, input technologies seem to cause a major bottleneck in performing some of the tasks, under-utilizing the available resources and restricting the expressiveness of application use. Hand Gesture recognition comes to rescue here. Robotic control is dependent on accurate hand gesture detection and hand gesture detection directly depends on lighting quality. Besides robustness of the system, proposed method for controlling robot using hand gesture is very fast. This methodology can be extended for more complex robots in the fields of computer vision and robotics.
• Jagdish Lal Raheja, Radhey Shyam, Umesh Kumar, P Bhanu Prasad, Real-Time Robotic Hand Control using Hand Gestures, Second International Conference on Machine Learning and Computing. 2010
• Xiujuan Chai, Yikai Fang, Kongqiao Wang, robust hand gesture analysis and application in gallery browsing, School of Information Engineering, Beijing University of Posts and Telecommunication, Beijing,China System Research Center, Nokia Research Center, Beijing, China
• KernelsYacov Hel-Or, and Hagit Hel-Or, Real-Time Pattern Matching Using Projection, Transactions on pattern analysis and machine intelligence, vol. 27, no. 9, september 2005