Previous Work

Recent developments and new approaches to computer vision have opened the door to even more educated and sophisticated algorithms. We wish to pursue the implications behind these new approaches and try to generalize the methods to a greater degree, allowing for much looser assumptions and more robust applications.

Feature Tracking and 3D Reconstruction

Prior to these advances, we've been restricted to understanding the relationships between images through track-able features within the images. Although feature-tracking is relatively quick and robust, it is very limited in nature. Scale-Invariant Feature Transform (SIFT) is a commonly used method that boils an image down to highly track-able features and find "pattern" matches. It describes a set of features in a way that is not affected by rotations and scaling. This method works great with a video sequence, where one frame is very similar to the frame before and after. It is also very useful to find an object in a scene.

Based off of the work done by SIFT, Structure from Motion (SFM) algorithms can infer camera parameters, such as relative movements, zoom, focal point, etc., and then build a rough (sparse) structure as a point cloud of a rigid scene. The point cloud describes a general 3D shape for the raw data (images). This algorithm works great where the images are in a relatively small vicinity and looking at the same things, as used with Photo Tourism.

The problems with these techniques lie in the assumptions. SIFT requires the images to remain mostly unchanged or at least largely overlapping. Large differences in the scene causes improper matching, and manual segmentation is required to find objects. SFM only works on relatively rigid scenes and bodies, so any non-rigid movement (cloth, facial movement, etc.) cannot be accurately reconstructed. Additionally SFM can only produce a very sparse point cloud. Assumptions must be made about the object in order to reproduce a solid structure, and too often these assumptions produce poor general results. Finally, these techniques require a feature-rich scene. If an object has very few track-able features, it will not be reconstructed properly. Shadows, transparencies, and specular highlights (reflections in general) cause problems.

Surface Reconstruction

Unwrap Mosaics (broken link on Microsoft Research) has taken a different approach toward information retrieval from and modification of an image sequence. Unwrap Mosaics produces a mapping from a video sequence to a single image where an object's 3D surface texture is flattened into a 2D surface. As an analogy, it produces a world map from a video of a spinning globe. This mapping is reversible, so the user may edit the mosaic and re-project the editions back onto the video sequence. The unwrapping process is fairly automated, requiring user intervention only if there is an apparently bad mapping (where a person has an extra ear). Additionally, unwrap mosaics works quite well on non-rigid, deforming objects.

There are several problems with this technique that limit its usefulness in the general case. The first problem requires the user the segment the object from the scene prior to unwrapping. There are some effective tools that segment a video sequence quickly, but it is one more step in the process. The second problem is that the method assumes that the object has a cylindrical topology. As long the object has no overlapping protrusions or appendages and that the camera circles the object in a plane (object rotates along a single axis relative to the camera), unwrap mosaics does a great job. Once this no longer holds, the flattened image falls beyond repair. This method only works well if there is "nice" continuity between adjacent frames. Motion blur and texture patterns cause the algorithm to produce poor results. Finally, although unwrap mosaics may aid in the 3D reconstruction of a scene, it does not allow the user to modify the 3D structure of the data, such as adding or removing objects or modifying the geometry of the surface.

Image Segmentation

Image Segmentation is a well explored area of image processing. There are many different methods developed, each with their own varying effectiveness.

Some recent research on the gradient domain of an image have implications that will greatly assist edge detection and patchwork building. GradientShop approaches image filters from a new direction, where the gradients behind the pixels are incorporated into the filter. This allows artists to manipulate the low-order regions of an image without affecting the high-order regions, or to exaggerate the edges within an image without increasing overall noise.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License