acm-header
Sign In

Communications of the ACM

Technical opinion

Video Image Stabilization and Registration Technology


Video Image Stabilization and Registration (VISAR) is a software algorithm that corrects for zoom, tilt, and jitter in flawed digital video. This technology allows users to dramatically improve videotape sequences and still images extracted from moving video.1 It works by taking the numeric-coded grid of pixels in one image from the video and matching it to subsequent ones. The result is a more steady, clear, and logical series of images. VISAR will enable homeowners, law enforcement, and others to dramatically enhance videotape sequences on desktop computers.

David Hathaway and Paul Meyer of the National Space Science and Technology Center developed VISAR at NASA's Marshall Space Flight Center. Hathaway, a solar physicist, was interested in a way to better study the violent explosions of the Sun. Meyer, a meteorologist and computer scientist, needed to more precisely track weather phenomena for launch safety and effectiveness. The tool is a natural fit with their research, since telescopes are traditionally jittery instruments. For the inventors, each successive case has brought new technical challenges, opportunities to improve the technology, and satisfaction in contributing to society.

The development of VISAR began years before 1996, when the Southeast Bomb Task Force unit of the FBI requested help with video evidence of the bombing during the 1996 Olympic Games in Atlanta. Among other sources, the primary piece of evidence was contained on over 400 frames (approximately 13 seconds) of images captured using a handheld camcorder. The images were filmed at night and were too dark, fuzzy, and unstable to interpret. Investigators found that in this case, VISAR easily overpowered the alternatives. Since the bombing, the software has generated digital video that has been the primary evidence in dozens of significant criminal cases involving murder, kidnapping, and robbery.

It is clear that this information technology will have a tremendous effect on quality of life through advances in science [2]. VISAR was featured on the January 26, 2002 edition of the television program "America's Most Wanted." In that program, VISAR was used in the case to clarify the build, face, and height of the murderer of a convenience store clerk. At the conclusion of the first war in Iraq, VISAR was used by ABC News to compare video of Saddam Hussein with images known to be the dictator: the network wanted to know if it was Hussein or a body double in the telecast [3]. Using Video AnalystTM, it took 90 minutes to determine, with 99% confidence, that it was Hussein. Table 1 lists VISAR's awards, and indicates its expected promise as a highly marketable government-developed software product.


VISAR works by combining images in a video sequence, thus leveraging the compilation of available information.


Back to Top

How VISAR Works

Before VISAR can be used, the candidate video sequence must satisfy several fundamental requirements. First, at least two images are needed in which the subject is not significantly different in presentation. For instance, if two images are available, one of the front of a face and the other of the side, VISAR cannot help. Second, a minimal amount of stabilization is required before further purification can be done. Experience has demonstrated that the frames-per-second rate is not important.

Although the point of the VISAR software is the automation of digital video correction, a meticulous and skilled system operator is essential to project success. The user must go through a trial-and-error problem-solving process that requires patience and a scientific mind-set. These qualities are essential, as iterating through several steps multiple times attains the best results in most cases. Hathaway is not yet comfortable in completely delegating projects to others in the most critical projects [1].

VISAR works by combining images in a video sequence, thus leveraging the compilation of available information. VISAR becomes more powerful with more information available, because it averages out noise. The deliverable of a VISAR product can be either a single image or a video sequence. When used to produce another video (with less frames per second than the original), VISAR works very well in controlling horizontal and vertical jitter. It also works in two dimensions that have not been addressed in preexisting video refinement technologies by uniquely correcting for both erratic rotation and zoom effects.

Registration is the first step in the VISAR problem-solving methodology. Figure 1 indicates how VISAR registers an image in sets (boxes) of pixels. The image indicates the significance of user skill. Choosing boxes that are too small might result in inaccurate tilt and zoom corrections. Selecting boxes that are too big might include unnecessary information, such as a bird flying in the foreground or background that is not of interest. Enhancement, whereby frames are co-added, is the second step. VISAR is unique in that it adds information from multiple frames. Corrections are made based on an optimization routine utilizing the standard correlation coefficient allowing each pixel to be coupled with that of another image and compared. Thus, the user's initial registration of pixels is critical because it might cause a situation where the correlation coefficients might not be useful.

The benefits of VISAR make the software valuable for a variety of uses. Benefits include clearer and enhanced still images, smoother edges, and reduced video noise. Perhaps its most valuable contribution is that it stabilizes rotation and zoom effects in video—even when the background and foreground move at different rates (as when the camera is moving through space, such as over water or through a wooded area). VISAR technology has no comparable substitute in the field of video reproduction. Four sample videos showing the corrective power of VISAR are available for download at science.nasa.gov/ newhome/headlines/ast04may99_1.htm.

Figure 2 shows how VISAR is used to clarify images by combining a series of video frames. The single frame on the left shows the noise enhanced when it is brightened using standard image software. The picture on the right shows how VISAR improves clarity by leveraging information contained on approximately one second (50 frames) of video.

Back to Top

Commercialization

Table 2 indicates the broad range of uses for VISAR technology. Two examples illustrate the tremendous implications VISAR has for impacting society. First, it was used jointly by NASA, the FBI, and the U.S. Department of Defense to solve the "Russian Mafia Gold Heist" crime after processing just two images [1]. It will eventually enable police to compile a composite picture of a suspect's face by focusing on the reflection in the rearview mirror of an automobile the suspect is driving. In the lab, VISAR can be used to match fingerprints, or better yet, enable the matching of face-prints. Given the growing amount of video evidence becoming available to police, it will have an increasing prominence in law enforcement.

In another application, the Casey Eye Institute at the Oregon Health Sciences University in Portland uses VISAR in a research program that studies video of cell movements associated with immune system diseases in the eye. Ultrasounds are notorious for their poor-quality videos, which often leave even experienced doctors puzzled. Medical diagnoses could be made with much greater precision and confidence if imaging were more clear and stable. The technique provides, for the first time, clear indications of individual cell movement. The resulting data provides much clearer indications of how the immune system works in the human eye.

Future commercialization opportunities for the technology are abundant. Public safety applications will include toll booths, airports, emergency vehicles, commercial security, and surveillance. In another example, it is common that home videos of tornados are frequently distorted. As a meteorologist, Paul Meyer would like to use VISAR to more closely monitor changes in cloud formations, hurricanes, and tornados. Perhaps the greatest commercialization opportunity for VISAR is the home video market. VISAR provides users with the ability to repair problems associated with zoom, jitter, tilt, and focus. Due to the inherently subjective nature of imaging, the capability requires some user supervision during video debugging with VISAR. As new versions are developed, it is impractical to think of the full capabilities of VISAR residing in handheld cameras. Because VISAR has such command over each pixel in a series of images, special effects add-ins will become greatly enhanced in the amateur movie market.

Back to Top

Conclusion

The commercialization success of VISAR has been unlikely, considering the specialized tradition of most NASA software innovations. For instance, these two inventors were more concerned with observations of the sun and weather systems than elegant software design. VISAR was originally written in Interactive Data Language (IDL), which required five minutes to process one image frame. Converting to C++ reduced the processing time to 15 seconds. Eventually, the inventors would like the product to generate real-time corrections as the subject is being filmed. Expect prices in the video-processing market to decrease rapidly, as more computing platforms facilitate the capability and as digital camcorders prevail.

VISAR exemplifies how important diversity is in the software development process. In addition, it shows how special-purpose R&D projects can result in highly commercializable products. The product is a great example of some of the software innovations now being created by the U.S. federal government that can be licensed for a nominal fee by U.S. companies and organizations. Prominent corporations, like Intel, which wants to make a VISAR chip, are already showing an interest in this technology. NASA encourages companies to help customize VISAR for niche products, and demand has been strong. Companies wishing to license VISAR should contact the Technology Transfer Department in NASA's Marshall Space Flight Center at techtran.msfc.nasa.gov/ working/how.html.

Back to Top

References

1. Hathaway, D. Personal interview (July 9, 2003) National Space Science and Technology Center, Huntsville, AL.

2. Nabors, S. Personal interview (July 10, 2003) Marshall Space Flight Center, Redstone Arsenal, Huntsville, AL.

3. Rowell, E.D. Getting rid of the shakes (Nov. 3, 2000); abcnews.go.com/sections/ tech/CuttingEdge/cuttingedge001103.html.

Back to Top

Author

Gary F. Templeton ([email protected]) is an assistant professor of Information Systems at Mississippi State University in Starkville, MS.

Back to Top

Footnotes

1 See U.S. Patents 6,459,822 and 6,560,375.

Back to Top

Figures

F1Figure 1. How VISAR registers an image.

F2Figure 2. A single frame vs. 50 frames added together using VISAR.

Back to Top

Tables

T1Table 1. VISAR's awards.

T2Table 2. The versatile uses of VISAR

Back to top


©2006 ACM  0001-0782/06/0200  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc.


 

No entries found