Home >> Projects >> Face Detection & 3D Trajectory Generation	Asim Shankar

Face Detection & 3D Trajectory Generation

As part of my final year undergraduate research project, we (Priyendra Singh Deshwal and I, with Dr. Amitabha Mukherjee) developed a system to generate 3D trajectories of actors moving in a video sequence and textual commentary on these trajectories.

As a first step, we began by implementing a face detection system in still images based on the system described by Henry Rowley (see CMU Face Group in references) in his doctoral thesis. We supplement the base detector described in that thesis with a "clustering" technique that works for both still images and across frames of a video that helps improve accuraccy and put multiple detections of the same face together. We later came across a much faster detector (a Haar detector proposed by Viola and Jones in 2001). For tracking we use the Continuously-Adaptive Mean-Shift algorithm. For more details, read on!

System Description
References & Links
Some results
Tools & Resources used
More Details

System Description

Face Detection

We use two basic face detectors in our system, one that uses a neural network as a classifier and is based on Henry Rowley's thesis (see CMU Face Group in references). The other is a face detector based on Haar-like features and is implemented in Intel's open source computer vision library (OpenCV). We observed that our implementations had a high degree of false positives, and detected the same face multiple times. In order to improve accuracy and to distinguish between different faces we use our own clustering technique.

Clustering Faces

A face detected by the network is in a rectangular region. We describe a face individually by a 4-tuple - (top-left x-coord of rectangle, top-left y-coord, size, frame) where size is the size of the rectangular region in pixels and frame is the frame number of the image in the video sequence. In this 4-dimensional space we then cluster faces together according to the euclidean distance between the 4-tuples. The strength of a cluster is the sum of the strengths of the individual faces (i.e., the value of the network output of these faces) in the cluster. We consider only clusters with a strength above a certain threshold (the value of the threshold is dependent on the number of frames in the sequence) as a face, thus each accepted cluster corresponds to a distinct face in the video. The clustering works as follows:

F = set of all the faces (4-tuples) detected by the base detector (say N faces were detected).
We construct "edges" between pairs of these N faces if the euclidean distance between the corresponding 4-tuples is less than a certain threshold.
Each connected component of the graph formed by the set of vertices F and edges as described above form a single cluster corresponding to a single face

Tracking across frames

Applying the detection system on each and every frame of the sequence as a means of tracking is entincing. However, the detection systems can detect only frontal, upright faces (i.e., it cannot detect people not looking straight into the camera) and hence doing this wouldn't really help. Instead, we use a tracking algorithm that once initialized with the detection region keeps track of it (even as the face rotates or turns) through the video.

The algorithm we use is the Continuously-Adaptive Mean-Shift or CamShift algorithm. An implementation of this can also be found in Intel's OpenCV library.

3D Trajectory Generation

Detection gives us the initial face, tracking gives us the (x,y,scale) coordinates of the detected face as it moves along in image, where (x,y) are the coordinates of the center of the rectangular region being tracked and scale is the size of the rectangular box. We then convert these coordinates into real-world (x,y,z) coordinates using simple transformations and some calibration information from the camera. Calibration information from the camera includes factors like the true distance of a face from the camera and the scale of the detection, the aspect ratio the camera image etc.

(Back to top)

References & Links

More information on face detection:

Face Detection Homepage
Face Group at Carnegie-Mellon University
CiteSeer - A pretty good source of many papers in various fields. Search for "Face Detection" to get some of the latest works in the field.

Ofcourse, Google and other search engines will always be a good source of information!

(Back to top)

Results

Some sample results:

Four of four faces detected by our neural network detector.


		The images above show three of the many frames in the video. The person detected in the first frame was tracked through the others and the generated trajectory is shown in the figure on the left. The pyramid in the figure shows the field of view of the camera. Commentary generated was "Actor 0 moves from left to right. Actor 0 moves from right to near the camera".

(Back to top)

Resources

Some of the tools we used:

Intel Image Processing Library (IPL) - A high-performance library for image processing. This seems to have been taken off the Intel website now and replaced with Intel Integrated Performance Primitives (IPP).
Intel Math Kernel Library (MKL) - Was used for some matrix functions. Again, information on this can be found at http://www.intel.com/software/products/perflib/
Intel's Open Source Computer Vision Library (OpenCV) - http://www.sourceforge.net/projects/opencvlibrary
annie - Neural network library developed by me and used as the base face detector. Free to download and open source

(Back to top)

More Details

This single web-page doesn't do justice to the amount of work that went into creating this sytem and how things were achieved. I will be putting up our final report on this work soon, so you can read that for the finer details.

(Back to top)

Last modified: Mon Apr 21 01:48:12 India Standard Time 2003