Sunday, July 26, 2015

What is NoSQL?

I recently read Paul Ford's article, "What is code?", and it is a very compelling look into the ideas and culture of software and the people who create. He describes the religious adherence programmers have for the many competing technologies they use. Corporate vs. Startup, East coast vs. west coast, Windows vs. Linux vs. Mac, C vs. Ruby vs. Python vs. php vs. C# vs. Java. Django vs. Rails... There also is a bit of a divide between SQL and NoSQL. SQL is more definitive, there is an ANSI standard for SQL (structured query language). Schema are defined, and it is based on linear algebra. NoSQL is not defined. In fact there isn't even a real tight definition other than NOT SQL.
The site that is top ranked by Google for the word NoSQL, nosql-database.org, has a definition:

Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.

It is an interesting definition. SQL databases are very elegant and until the advent of the Web 2.0 seemed to handle any data problem you could throw at it. Web 2.0 and specifically Social Networks, exploded data, since this was when the users of a given site were not only consuming (reading downloading the data), but also contributing to the content of the site. Currently Facebook has over 1.4 billion users and most of those users are posting and creating content, and this was where SQL databases started to fail. Elegant SQL became a bottleneck, and none of the SQL databases out there could scale to what is now called Internet Scale. So new technologies had to be developed to serve data at Internet Scale and that pretty much what NoSQL is. Some of the other features are mostly just side effects of scaling issue.

Interestingly enough, there are cases where SQL databases have been used for Internet Scale sites. A good example is Instagram. In this posting, Instagram co-founder, Mike Krieger, explains how they used PostgreSQL scaled out to handle Instagram data. However, they had employ a lot of other NoSQL products like Memcache, Redis, and a fair amount of engineering to get it to work and handle the millions of users, creating content every day.

Due to it's success with Internet Scale projects, programmers have been using NoSQL for everything, and not necessarily with good results. I know on my recent projects we are starting to experiment with NoSQL to see if we get the speeds and optimizations that we need. However, I will not pretend that these projects are even approaching Internet Scale with may 150,000 users at best. However, after years of slow and complicated queries and indexes in SQL we do get the speed we need. Wish me luck.

No comments:

Post a Comment