Prajwal Tuladhar’s Blog
 
programming, life and some random thoughts

Archive for the 'Hadoop' Category

Mar 16 2010

Nice write up on HBase

Published by Prajwal Tuladhar under Hadoop

The two part articles On HBase is a must read if you are interested in NoSQL technology.

HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argument that “HBase is not a good choice because it is complex” is irrelevant. The advantages far outweigh the problems. Relying on decoupled components plays nice with the Unix philosophy: do one thing and do it well.

There has been quite a bit of war going on between Cassandra and HBase and of course, they have different design philosophies (first one give emphasis on Consistency while other for Availability under CAP theorem)

Happy Reading!


Comments

Jan 14 2010

One crucial difference between MapReduce and SQL query

Published by Prajwal Tuladhar under Hadoop

MapReduce is a linearly scalable programming model. The programmer writes two functions—a map function and a reduce function—each of which defines a mapping from one set of key-value pairs to another. These functions are oblivious to the size of the data or the cluster that they are operating on, so they can be used unchanged for a small dataset and for a massive one. More importantly, if you double the size of the input data, a job will run twice as slow. But if you also double the size of the cluster, a job will run as fast as the original one. This is not generally true of SQL queries. - Excerpt from Hadoop - The Definite Guide


Comments

RSS Feed
Subscribe by email
Follow me @ Twitter