Yale computer scientists recently demonstrated HadoopDB, their new open source system for managing huge amounts of data, at the VLDB conference in Lyon, France. The computer scientists used the gathering to discuss the results of a performance analysis they conducted and to provide an overview of its characteristics, run-time performance, loading time, fault tolerance, and scalability dimensions.
HadoopDB combines parallel database management systems (DBMS) technology with the MapReduce software framework to handle petabytes of data. DBMS technology is good for managing structured data that might have tables with trillions of rows of data, while MapReduce, which is used by Google to search data on the Web, allows for greater control in retrieving data. "We get the performance of parallel database systems with the scalability and ease of use of MapReduce," says Yale professor Daniel Abadi.
Some tasks, such as those involved in finding patterns in the stock market, earthquakes, consumer behavior, and outbreaks, will now only take hours rather than days. "People have all this data, but they're not using it in the most efficient or useful way," Abadi says.
From Yale University
View Full Article
Abstracts Copyright © 2009 Information Inc., Bethesda, Maryland, USA
No entries found