The Translational Genomics Research Institute (TGen) has developed a new computer data compression technique for storing, analyzing, and sharing huge volumes of genomic sequencing data. In tests, Genomic SQueeZ (G-SQZ) compressed data by as much as 80 percent, while maintaining the relative order of the data and allowing for selective content access. G-SQZ, which will be made freely available for research and academic use, has the potential to save researchers and others millions of dollars.
TGen's solution takes a novel approach to the Huffman coding of information, which uses shorter codes for most frequently occurring pieces of information. G-SQZ analyzes the frequency of the ACGT letters that make up DNA, can encode the annotation information, and its indexing system allows access at regular intervals so all the information does not need to be decoded from the start.
TGen plans to accommodate parallel computing in its design. "While indexed and compressed representation is ready, the parallel computing functionality is undergoing a testing phase," says TGen scientist Waibhav Tembe. "But this is where it is headed."
From TGen News
View Full Article
Abstracts Copyright © 2010 Information Inc., Bethesda, Maryland, USA
No entries found