[
Research Description |
Recent Projects |
Research Interests |
Demos & Presentations |
Previous Projects
]
My research areas are computer vision and pattern recognition. Specifically, my recent focus is computer recognition of human action and
interaction in video imagery.
Video has become an increasingly useful medium of communication in everyday life, and the volume of video archives is growing rapidly.
Unfortunately, our ability to computationally model and process video information lags far behind. Several key problems have slowed our
advance in automated video processing: (1) our lack of any robust method of tracking human activity in video, (2) our lack of a
human-activity model that dynamically learns from video, and (3) our lack of an efficient method to represent video events at a semantic
level. My research is deeply involved in the above and related problem areas, as described in my recent projects as below.
Tracking objects and persons in video is a crucial step in automated video processing. Detailed video computation requires the
segmentation of the body into parts and then the tracking of those multiple body parts. I have worked on a project on simultaneously
tracking the multiple body parts of interacting humans in color videos. Moreover, in the process of modeling human activity from video
images, many ambiguities arise from occlusions and shadows. To resolve such ambiguities, I have developed a method using optimization
and statistical inference techniques.
Human activity is goal-oriented behavior in specific contexts. Understanding the intention and the visual attention of a person provides
useful contextual information about the person's behavior. The human intention may be signaled by the direction of visual attention, and
visual attention in turn may be indexed by head orientation. I have developed a method that automatically detects multiple heads in a 3D
space and estimates the individual head orientations in that space. From each head orientation, the direction of visual attention is
estimated, and the intention of the person is inferred as an explanation of the person's behavior.
Patterns of human action and interaction are very diverse, including positive behaviors such as "hugging" and "hand-shaking" and
negative behaviors such as "punching" and "kicking." A dynamic-learning model of human activity depicted in video data is essential to
building an adaptive computer recognition system for human actions and interactions. To develop such a system, I have worked on
statistical learning methods using a hierarchical Bayesian network for recognizing human activity patterns. The current system can learn
and recognize human interaction patterns between two persons and can distinguish among positive behaviors such as "hand shaking,"
"standing hand-in-hand," and "hugging"; neutral behaviors such as "approaching," "departing," and "pointing"; and negative behaviors
such as "pushing," "punching," and "kicking."
One of the goals in video computation is to provide a user-friendly interface between a computer system and ordinary users. Syntactic
representation such as natural-language-based verbal description is desired for an efficient human-computer interface. I have worked on
developing an event-description methodology that provides syntactic event structure and event semantics. My approach is to represent
human action as an intentional operation made toward a target and to represent human interaction as a pair of individual actions. In
this framework, human action is automatically represented in terms of verbal description according to "subject + verb + object" syntax,
and human interaction is represented in terms of "cause + effect" semantics between the actions.
My broad research interests include the following:
-
1. Image & Video Processing
Image segmentation. Tracking of deformable object in video. Color processing.
-
2. Pattern Recognition and Computer Vision
Statistical and structural pattern recognition frameworks.
Graphical models. Neural networks.
Human body modeling. Motion tracking and understanding. Human activity recognition.
-
3. Human Vision in the context of Sensory neuroscience and Psychophysics
Perceptual organization in biological vision.
Psychophysics on Visual search and Eye movement.
-
4. Computational Modeling of Visual Processing
Computational modeling of biological visual processing.
Comparative vision in evolution and its application to artificial vision
-
Appearance-based method:
Simultaneous Segmentation and Tracking of Multiple Deformable Body Parts
Simultaneous segmentation and tracking of multiple deformable body parts.
Each image has a link to the corresponding video clip, which is compressed with Microsoft MPEG4 v2 encoder- 15fps.
-
Model-based method:
Model-based Human Motion Capture
3D cylinder model is projected to 2D image projection plane and fitted to monocular video sequences.
-
Model-based method:
Video Retrieval of Human Interactions using Model-based Motion Tracking and Multi-Layer Finite State Automata
Poster (PDF) (big: 6.9M)
-
Syntactic pattern-recognition method:
Event Semantics for High-level Understanding of Two-Person Interactions
Poster (PDF)
-
Head Detection and Pose Estimation in 3D Space
View-based detection and estimation of multiple heads in grayscale video imagery
(Preliminary study)
Poster (PDF)
Discrimination Enhancement by Perceptual Organization
Psychological Disturbance caused by Letters in Double Image
back
to home