Research of Sangho Park

[ Research Description | Recent Projects | Research Interests | Demos & Presentations | Previous Projects ]

Research Description

My research areas are computer vision and pattern recognition. Specifically, my recent focus is computer recognition of human action and interaction in video imagery. Video has become an increasingly useful medium of communication in everyday life, and the volume of video archives is growing rapidly. Unfortunately, our ability to computationally model and process video information lags far behind. Several key problems have slowed our advance in automated video processing: (1) our lack of any robust method of tracking human activity in video, (2) our lack of a human-activity model that dynamically learns from video, and (3) our lack of an efficient method to represent video events at a semantic level. My research is deeply involved in the above and related problem areas, as described in my recent projects as below.

Recent Research Projects

Tracking Human Activity

Tracking objects and persons in video is a crucial step in automated video processing. Detailed video computation requires the segmentation of the body into parts and then the tracking of those multiple body parts. I have worked on a project on simultaneously tracking the multiple body parts of interacting humans in color videos. Moreover, in the process of modeling human activity from video images, many ambiguities arise from occlusions and shadows. To resolve such ambiguities, I have developed a method using optimization and statistical inference techniques.

Pose Estimation Using Visual Context

Human activity is goal-oriented behavior in specific contexts. Understanding the intention and the visual attention of a person provides useful contextual information about the person's behavior. The human intention may be signaled by the direction of visual attention, and visual attention in turn may be indexed by head orientation. I have developed a method that automatically detects multiple heads in a 3D space and estimates the individual head orientations in that space. From each head orientation, the direction of visual attention is estimated, and the intention of the person is inferred as an explanation of the person's behavior.

Dynamic-Learning Models of Human Actions and Interactions

Patterns of human action and interaction are very diverse, including positive behaviors such as "hugging" and "hand-shaking" and negative behaviors such as "punching" and "kicking." A dynamic-learning model of human activity depicted in video data is essential to building an adaptive computer recognition system for human actions and interactions. To develop such a system, I have worked on statistical learning methods using a hierarchical Bayesian network for recognizing human activity patterns. The current system can learn and recognize human interaction patterns between two persons and can distinguish among positive behaviors such as "hand shaking," "standing hand-in-hand," and "hugging"; neutral behaviors such as "approaching," "departing," and "pointing"; and negative behaviors such as "pushing," "punching," and "kicking."

Representation of Event Semantics

One of the goals in video computation is to provide a user-friendly interface between a computer system and ordinary users. Syntactic representation such as natural-language-based verbal description is desired for an efficient human-computer interface. I have worked on developing an event-description methodology that provides syntactic event structure and event semantics. My approach is to represent human action as an intentional operation made toward a target and to represent human interaction as a pair of individual actions. In this framework, human action is automatically represented in terms of verbal description according to "subject + verb + object" syntax, and human interaction is represented in terms of "cause + effect" semantics between the actions.

Broader Research Interests

My broad research interests include the following:

1. Image & Video Processing

Image segmentation. Tracking of deformable object in video. Color processing.

2. Pattern Recognition and Computer Vision

Statistical and structural pattern recognition frameworks.
Graphical models. Neural networks.
Human body modeling. Motion tracking and understanding. Human activity recognition.

3. Human Vision in the context of Sensory neuroscience and Psychophysics

Perceptual organization in biological vision.
Psychophysics on Visual search and Eye movement.

4. Computational Modeling of Visual Processing

Computational modeling of biological visual processing.
Comparative vision in evolution and its application to artificial vision

 

Demos & Presentations

  1. Appearance-based method:
    Simultaneous Segmentation and Tracking of Multiple Deformable Body Parts

    Simultaneous segmentation and tracking of multiple deformable body parts.
    Each image has a link to the corresponding video clip, which is compressed with Microsoft MPEG4 v2 encoder- 15fps.



  2. Model-based method:
    Model-based Human Motion Capture

    3D cylinder model is projected to 2D image projection plane and fitted to monocular video sequences.


  3. Model-based method:
    Video Retrieval of Human Interactions using Model-based Motion Tracking and Multi-Layer Finite State Automata

    Poster (PDF) (big: 6.9M)
  4. Syntactic pattern-recognition method:
    Event Semantics for High-level Understanding of Two-Person Interactions

    Poster (PDF)
  5. Head Detection and Pose Estimation in 3D Space

    View-based detection and estimation of multiple heads in grayscale video imagery
    (Preliminary study)
    Poster (PDF)

Previous Projects

Discrimination Enhancement by Perceptual Organization


Psychological Disturbance caused by Letters in Double Image


back to home