Robust and Efficient Classification for Videos captured in the Wild
|Author:||Behrooz Mahasseni Oregon State University|
|Date:||October 20, 2016|
Understanding videos of human actions, recorded in an uncontrolled setting, is an open problem in computer vision. Video surveillance, content retrieval, autonomous driving and sports analysis are examples of practical applications. We focus our research on efficiency and robustness of action recognition in real-world videos.
My initial work has been aimed at advancing traditional approaches which use hand-crafted video features. Specifically, in our initial work on robustness, we have relaxed the viewpoint dependence of existing methods and developed a multitask learning approach for view-invariant activity recognition. Also, regarding efficiency, we formulated an approximate policy iteration for budgeted semantic video segmentation.
Next, inspired by the successful application of deep learning in computer vision, we present a multimodal deep learning framework which improves the robustness of activity recognition via a deep fusion of multimodal data, where diverse sensors (e.g., video camera, 3D skeleton, Kinect camera, audio recordings) capture important clues about the ongoing events. For fusing multimodal data, we define a new hybrid method to regularize LSTMs across different sources of data.
Finally, we extend our initial work on efficient semantic video segmentation to develop a deep long short term memory (LSTM) policy iteration for cost-efficient semantic video segmentation.
We believe these research projects advance computer vision because the developed approaches are able to: 1) Meet stringent runtime requirements of many applications, and 2) Work in less sanitized settings with small datasets or data coming from heterogeneous sources.
Behrooz Mahasseni is a 5th year Ph.D. student studying at Oregon State University. He works under the supervision of Prof. Sinisa Todorovic and his main research is video analysis and representation. He started his Ph.D. working on view-invariant activity recognition and feature space learning considering videos recorded from different viewpoints. Since 2014 his main focus is understanding video content using deep learning techniques.
His latest work is regularizing long short-term memory for action classification in uncontrolled settings. In his summer 2016 internship in NVIDIA research, he worked on temporal video segmentation using deep temporal attention models.