Prajwal Tuladhar’s Blog
 
programming, life and some random thoughts

Archive for the 'Programming' Category

Jan 18 2010

SQL and NoSQL – the rant continues

Published by under Scalability

It’s been quite sometime I’ve been subscribing to Planet CouchDB and it’s a great resource for getting new information about NoSQL technologies especially CouchDB.

From the same source, I got chance to read these two interesting blog posts. One was about criticizing Amazon SimpleDB and overall NoSQL technologies and other one being answer to that criticism.

One can find large number of articles and blog posts arguing SQL and NoSQL group. It seems like whole database world has been divided into two camps just like during Cold War: capitalism and socialism (I won’t select which one is capitalism and communism, decide yourself :) ).

In my opinion, all these arguments and counter arguments are kind of unnecessary because both of these tools are quite powerful in their respective context.

I often find people giving example of foo company using foo tools / technology and are doing great job scaling their overall architecture.

People often give example of Google, Yahoo! and Facebook when they have to make points about SQL and NoSQL but it is also to be considered that these companies are being able to scale with such an efficiency by not using only SQL or only using NoSQL technology.

Google, for example uses its BigTable, a column based database technology (one of the instances of NoSQL horizon) for indexing the web while they also use MySQL in significant ratio, in fact they have also provided patches for MySQL. And same is true for Facebook and Yahoo!.

Databases are hammers; MapReduce is a screwdriver

The article is quite interesting read differentiating normal databases (SQL) and MapReduce, a Google developed technology for aggregating large sets of data in distributed environment which is also used by number of NoSQL technologies like: MongoDB, CouchDB and many others.

I think the same concept can be used for SQL and NoSQL.

SQL is a hammer while NoSQL is a screwdriver

So, instead of ranting which is superior to each other, it would be better to combine them both and use them to create scalable + robust architecture. And Technology Agnostic design and Technology Agnostic Architecture (that include database in abstract term i.e. using SQL and/or NoSQL as demand by the context) are the most important things to consider when talking about scalability.

Update: When people used the term NoSQL, it would be better if they mean Not Only SQL rather than No SQL.


Comments Off

Jan 16 2010

The six costs of Data

Published by under Scalability

Excerpt from The Art of Scalability Chapter 27,  Too Much Data

  1. Storage costs to store data
  2. People and software to manage data
  3. Power and space to make storage work
  4. Capital to ensure the proper power infrastructure
  5. Processing power to traverse the data
  6. Backup time and costs

Comments Off

Jan 15 2010

Rough estimates of cost of scaling

Published by under Links,Programming

Some quite interesting stats of the cost of monetization with the ratio of scaling in different platforms.

But I’m not so sure, why it includes web servers (Nginx, Apache), languages (PHP) and web frameworks (Rails, Django, Mochiweb, Jetty) as a single group, doesn’t make sense.

http://journal.dedasys.com/2010/01/12/rough-estimates-of-the-dollar-cost-of-scaling-web-platforms-part-i


Comments Off

Jan 14 2010

One crucial difference between MapReduce and SQL query

Published by under Hadoop

MapReduce is a linearly scalable programming model. The programmer writes two functions—a map function and a reduce function—each of which defines a mapping from one set of key-value pairs to another. These functions are oblivious to the size of the data or the cluster that they are operating on, so they can be used unchanged for a small dataset and for a massive one. More importantly, if you double the size of the input data, a job will run twice as slow. But if you also double the size of the cluster, a job will run as fast as the original one. This is not generally true of SQL queries. – Excerpt from Hadoop – The Definite Guide


Comments Off

Dec 03 2009

Node.js: Changing the way we do I/O

Published by under JavaScript

Node.js might be the most exciting single piece of software in the current JavaScript universe. Ryan received standing ovations for his talk and he really deserved it!

http://jsconf.eu/2009/video_nodejs_by_ryan_dahl.html

I tried Node.js last week and I was quite impress. Truly, Node.js has the potential to change the way we do I/O. Basically it’s an evented programming concept build on underlying stack of Python and C++ while using Google’s V8 (JavaScript Engine) as an interfacing language capable of performing network programming (using polling) in different way. Here are some of the key design goals of this technology:

  • No function should direct perform I/O (Use Callback to do so)
  • Stream everything, never force the buffering of data (Somewhat like comet)
  • Have built-in support for the most important protocols: HTTP, DNS, TCP
  • Support many HTTP features: Chunked requests, Keep-alive, Hang Requests for comet applications
  • Bridge API somewhere between flexibility of client side JavaScript and old Unix tools
  • Platform independent
Example:
Simple Twitter Client I wrote last week:

var http = require("http"),
	sys = require("sys");
var connection = http.createClient(80, "twitter.com");
var since_id = '';
var interval = 0;
function getTweets()	{
	var url = "/statuses/friends_timeline.json";
	if (since_id != '')	{
		url += "?since_id=" + since_id;
	}
	var request = connection.get(url, {
		"content-type": "application/json",
		"User-Agent": "NodeJS HTTP Client by Infynyxx",
		"host": "twitter.com",
		"Authorization": "Basic " + Base64.encode("infynyxx:*****")
	});
	request.finish(function(response)	{
		var responseBody = "";
		response.setBodyEncoding("utf8");
		response.addListener("body", function(chunk)	{
			responseBody += chunk;
		});
		response.addListener("complete", function()	{
			var tweets = JSON.parse(responseBody);
			if (tweets.error)	{
				sys.puts("Error: " + tweets.error);
			}
			else	{
				var length = tweets.length;
				if (length > 0)	{
					sys.puts("Getting new tweets...\n");
					sys.puts("Number of new tweets: " + length + "\n");
					var str = "";
					tweets.reverse();
					tweets.forEach(function(element, index) {
						str += element.text + "\n";
						str += element.user.name + "\n";;
						str += element.created_at + "\n";;
						str += "*************************\n";
					});
					sys.puts(str);
				}
			}
		});
	});
	setTimeout(getTweets, interval);
	interval = 300000;  //5 minutes
}

getTweets();

It also uses Base64 library. Full Code @ Gitgub


Comments Off

« Prev - Next »

RSS Feed
Subscribe by email
Follow me @ Twitter