Prajwal Tuladhar’s Blog
 
programming, life and some random thoughts

Archive for the 'Hadoop' Category

Jan 14 2010

One crucial difference between MapReduce and SQL query

Published by Prajwal Tuladhar under Hadoop

MapReduce is a linearly scalable programming model. The programmer writes two functions—a map function and a reduce function—each of which defines a mapping from one set of key-value pairs to another. These functions are oblivious to the size of the data or the cluster that they are operating on, so they can be used unchanged for a small dataset and for a massive one. More importantly, if you double the size of the input data, a job will run twice as slow. But if you also double the size of the cluster, a job will run as fast as the original one. This is not generally true of SQL queries. - Excerpt from Hadoop - The Definite Guide


Comments

RSS Feed
Subscribe by email
Follow me @ Twitter