Is it possible to implement an artificial intelligence (AI) in metallic devices? It was the question remained after taking a class called "Computer and Mind." The question induced another question that wonders possibility of the AI which could perceive the physical world fundamentally through the criterion of survival. I was also fascinated by the fact from a book "On Intelligence" that a single unit of computation in the brain - the neuron - and its connectivity can process various types of input data. From the moment of fascination, combined with interests in visual data, I decided to study computer vision as the first step of building an artificial intelligence. My primary research goal is to understand underlying nature of input data and to predict meaningful structures using efficient representations.
My first exposure to the area of computer vision was during my Master’s. I extensively analyzed the effect of bounding-box based representation of an object, which, due to its simplicity, is widely used for object tracking or object detection. Particularly, I focused on handling ambiguity induced by discordance between the shape of the object and the bounding-box. Appearance models for accurate discrimination of the object region from the background were proposed in my thesis. I also participated in the study that used two bounding-boxes to avoid using information from the ambiguous region around the conventional single bounding-box, which was presented in ECCV 2014. After graduation, I joined the Korea Institute of Science and Technology (KIST) as a research staff and conducted research on scene flow estimation from a pair of RGB-D data. Dense correspondence estimation is an essential problem in modeling a dynamic 3D object such as human. With a help of an RGB-D camera, I had an access to both image and depth data. I tried to generalize a total variation (TV), a widely used motion prior which is robust near boundary. Employing total generalized variation (TGV) made the estimator prefer natural solutions. Furthermore, I adopted a deformation graph, a graph that efficiently leverages the geometry of surface, for estimation of motion with large displacement.
Participating in these research projects, I realized that efficient representation is of significant importance in many computer vision problems and I decided to developing methods of combining multiple structures with informative data. For instance, superpixel, the element I used as a segment in my thesis, had irregular shape and less information than a patch. Although it reduced the computational cost, they lack distinctive information such as contours and repeatability. Therefore, an additional computation dealing with superpixels was required. In 3D motion estimation, combining two different types of information, texture and geometry, was a challenging problem, since each data was represented and transformed into an energy term independently while they are correlated. It is not assured that the solution of a simple weighted sum of energy terms is a true global solution when energy terms have discordant solutions. Furthermore, we heavily rely on motion prior, which is not data dependent; we only use our conjecture instead of using numerous data available on the web.