Prajwal Tuladhar’s Blog
 
programming, life and some random thoughts

Feb 25 2010

Yet another database variation talk

Published by Prajwal Tuladhar under MongoDB, Scalability

Recently there has been lots of talk about using hybrid databases for a system like: using traditional SQL based database for storing static data and using Key-Value stores (Cassandra, HBase) and document based databases (MongoDB, CouchDB) for storing data domain with high magnitude of frequency of changes.

This approach seems more pragmatic as compared to using a single database implementation. And once again, one should not forget that one solution does not fit every contexts.

Presentation from Pycon:

The presenter is telling that Redis is the only of its kind in NoSQL ecosystem which is not true because MongoDB is also in-memory database but unlike Redis its document based while Redis is more key-value values based.

Apart from that, the talk is worth watching!!!


Comments

Jan 18 2010

SQL and NoSQL - the rant continues

Published by Prajwal Tuladhar under Scalability

It’s been quite sometime I’ve been subscribing to Planet CouchDB and it’s a great resource for getting new information about NoSQL technologies especially CouchDB.

From the same source, I got chance to read these two interesting blog posts. One was about criticizing Amazon SimpleDB and overall NoSQL technologies and other one being answer to that criticism.

One can find large number of articles and blog posts arguing SQL and NoSQL group. It seems like whole database world has been divided into two camps just like during Cold War: capitalism and socialism (I won’t select which one is capitalism and communism, decide yourself :) ).

In my opinion, all these arguments and counter arguments are kind of unnecessary because both of these tools are quite powerful in their respective context.

I often find people giving example of foo company using foo tools / technology and are doing great job scaling their overall architecture.

People often give example of Google, Yahoo! and Facebook when they have to make points about SQL and NoSQL but it is also to be considered that these companies are being able to scale with such an efficiency by not using only SQL or only using NoSQL technology.

Google, for example uses its BigTable, a column based database technology (one of the instances of NoSQL horizon) for indexing the web while they also use MySQL in significant ratio, in fact they have also provided patches for MySQL. And same is true for Facebook and Yahoo!.

Databases are hammers; MapReduce is a screwdriver

The article is quite interesting read differentiating normal databases (SQL) and MapReduce, a Google developed technology for aggregating large sets of data in distributed environment which is also used by number of NoSQL technologies like: MongoDB, CouchDB and many others.

I think the same concept can be used for SQL and NoSQL.

SQL is a hammer while NoSQL is a screwdriver

So, instead of ranting which is superior to each other, it would be better to combine them both and use them to create scalable + robust architecture. And Technology Agnostic design and Technology Agnostic Architecture (that include database in abstract term i.e. using SQL and/or NoSQL as demand by the context) are the most important things to consider when talking about scalability.

Update: When people used the term NoSQL, it would be better if they mean Not Only SQL rather than No SQL.


Comments

Jan 14 2010

One crucial difference between MapReduce and SQL query

Published by Prajwal Tuladhar under Hadoop

MapReduce is a linearly scalable programming model. The programmer writes two functions—a map function and a reduce function—each of which defines a mapping from one set of key-value pairs to another. These functions are oblivious to the size of the data or the cluster that they are operating on, so they can be used unchanged for a small dataset and for a massive one. More importantly, if you double the size of the input data, a job will run twice as slow. But if you also double the size of the cluster, a job will run as fast as the original one. This is not generally true of SQL queries. - Excerpt from Hadoop - The Definite Guide


Comments

Nov 15 2009

MongoDB’s performance as compared to others

Published by Prajwal Tuladhar under MongoDB

Click to view the full size

I haven’t used PostgreSQL and TokyoTyrant so, can’t say much about them. And technically, I really don’t think that one should compare MySQL which is relational database with document based non-relational databases like: CouchDB and MongoDB.

In my opinion, MongoDB out-performs CouchDB in terms of querying, insertion and ease of usage but CouchDB’s support for MVCC and transaction are quite interesting. One of the crons of MongoDB is it’s data size grow at freaking high rate.

Thoough great to see that, NOSQL (NOt Only SQL) is on full swing.

Download OpenSQL comparison PDF (Don’t forget to read the conclusion though) via HackerNews.


Comments

Nov 15 2009

MapReduce API for MongoDB

Published by Prajwal Tuladhar under MongoDB

Currently, I’ve been doing some stuffs using MongoDB. If you don’t know or haven’t use it, it’s a document based key-value database systems, that means it’s fundamentally different from traditional DBMS like MySQL, Oracle.

Systems like MongoDB along with similar technologies like CouchDB make significant use of MapReduce. MapReduce is basically a two step process consisting of Map and Reduce where Map is used for reducing a dataset to smaller sub-sets while Reduce is used for for some specific operations into that mapped or grouped data. You can find more information about it all over the web.

Since, PHP driver MongoDB does not provide any specific MapReduce API, I’ve created mine own using MongoDB::command. You can find it @ Github.

Simple Usage:


<?
$db_name = "test_dbs";
$mongodb = new MongoDB(new Mongo(), $db_name);

$map = <<<MAP
	function()	{
		this.tags.forEach(
			function(x)	{
				emit(x, 1);
			}
		);
	}
MAP;

$reduce = <<<REDUCE
	function(key, values)	{
		return {count: values.length };
	}
REDUCE;

$map_reduce = new MongoMapReduce($map, $reduce);
$collection_name = "animal_tagsaa";
$response = $map_reduce->invoke($mongodb, $collection_name);
print_r($response->getRawResponse());
if ($response->valid())	{
	echo "Total Execution Time: {$response->getTotalExecutionTime()} Milli Seconds\n";
	$count_data = $response->getCountsData();

	echo "Count Data\n";
	foreach ($count_data as $key=>$value)	{
		echo "{$key}: {$value}\n";
	}
	echo "********************\n";
	foreach ($response->getResultSet() as $tag)	{
		echo "{$tag["_id"]}\n";
		echo "Count: {$tag["value"]["count"]}\n";
		echo "****************\n";
	}
}

Usage with Mongo Collections


<?php

function __autoload($class_name) {
    require_once "../lib/".$class_name . '.php';
}

$db_name = "test_dbs";
$mongodb = new MongoDB(new Mongo(), $db_name);

class AnimalTag extends XMongoCollection	{

	const COLLECTION_NAME = "animal_tags";

	public function __construct(MongoDB $mongoDB)	{
		$this->collectionName = self::COLLECTION_NAME;
		parent::__construct($mongoDB, $this->collectionName);
	}
}

$animal_tags = new AnimalTag($mongodb);

$map = <<<MAP
	function()	{
		this.tags.forEach(
			function(x)	{
				emit(x, 1);
			}
		);
	}
MAP;

$reduce = <<<REDUCE
	function(key, values)	{
		return {count: values.length };
	}
REDUCE;

$response = $animal_tags->mapReduce(new MongoMapReduce($map, $reduce));
if ($response->valid())	{
	foreach ($response->getResultSet() as $tag)	{
		echo "{$tag["_id"]}\n";
		echo "Count: {$tag["value"]["count"]}\n";
		echo "****************\n";
	}
}

Enjoy!!!


Comments

Next »

RSS Feed
Subscribe by email
Follow me @ Twitter