Friday, July 31, 2015

Control Z in a NoSQL world

I was going on a trip with my 13 year old son, and out of the blue, he said "What if you can hit 'Command Z' in the real world?"  "What?" "What if you spilled your drink and you hit 'Command Z' and it never happened?" Silence indicating my mind blowing. As a millennial raised in age of the Internet and cloud computing, he was wondering if our real world actions can be as transient as our virtual world. In the laws of physics, entropy says that energy and structure will break down, and processes tend not to be reversible. But have we defied the laws of physics in our virtual world? I remember how awesome it was to use a word processor and learn how to fix typos. In high school, typos were fixed with correction tape and there always a little smudge to indicate that mistake was made. Meanwhile, 'Control Z' just makes your actions go away as if it never happened. My son's comments made me think if I wanted to 'Control Z' any thing I have done in my life. The reality is that like everyone I have regretted actions and decisions I have made and the resulting consequences. But these same actions may have also had some wonderful consequencesor at least lessons. So I don't think I would have used 'Control Z'. I would live with smudges in my life. 

In our virtual world, 'Control Z' is not as powerful as it seems. Our Email is archived and backed up to other systems. Gmail recently came out with a limited message unsend feature, but this only works for a limited period after hitting the send button. Facebook posts can be shared and propagated pretty soon after you have posted and even if you retract a post, it lives on in some archive. 

What does this mean in a NoSQL world? NoSQL databases are powering the internet these days, and though it is not required these data stores often contain denormalized data where data facts are often stored in multiple places. For instance a user name may be stored in a user profile and also in all that user's posts, comments and likes. Let's say a user gets married and as a traditional person she changes her last name. Then all her posts, comments and likes need to be updated. SQL databases usually solve this problem by storing the name once and using an immutable primary key for the user that is stored in the posts, comments and likes. Then the two tables are joined to show the name. NoSQL usually does not join its objects and hence the need to denormalize or repeat the data. This makes 'Contol Z' much harder and reminds of us that we do live in a world where actions leave a mark and those marks are hard to erase, and we have to live with the smudges that life gives us. 

Sunday, July 26, 2015

What is NoSQL?

I recently read Paul Ford's article, "What is code?", and it is a very compelling look into the ideas and culture of software and the people who create. He describes the religious adherence programmers have for the many competing technologies they use. Corporate vs. Startup, East coast vs. west coast, Windows vs. Linux vs. Mac, C vs. Ruby vs. Python vs. php vs. C# vs. Java. Django vs. Rails... There also is a bit of a divide between SQL and NoSQL. SQL is more definitive, there is an ANSI standard for SQL (structured query language). Schema are defined, and it is based on linear algebra. NoSQL is not defined. In fact there isn't even a real tight definition other than NOT SQL.
The site that is top ranked by Google for the word NoSQL, nosql-database.org, has a definition:

Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.

It is an interesting definition. SQL databases are very elegant and until the advent of the Web 2.0 seemed to handle any data problem you could throw at it. Web 2.0 and specifically Social Networks, exploded data, since this was when the users of a given site were not only consuming (reading downloading the data), but also contributing to the content of the site. Currently Facebook has over 1.4 billion users and most of those users are posting and creating content, and this was where SQL databases started to fail. Elegant SQL became a bottleneck, and none of the SQL databases out there could scale to what is now called Internet Scale. So new technologies had to be developed to serve data at Internet Scale and that pretty much what NoSQL is. Some of the other features are mostly just side effects of scaling issue.

Interestingly enough, there are cases where SQL databases have been used for Internet Scale sites. A good example is Instagram. In this posting, Instagram co-founder, Mike Krieger, explains how they used PostgreSQL scaled out to handle Instagram data. However, they had employ a lot of other NoSQL products like Memcache, Redis, and a fair amount of engineering to get it to work and handle the millions of users, creating content every day.

Due to it's success with Internet Scale projects, programmers have been using NoSQL for everything, and not necessarily with good results. I know on my recent projects we are starting to experiment with NoSQL to see if we get the speeds and optimizations that we need. However, I will not pretend that these projects are even approaching Internet Scale with may 150,000 users at best. However, after years of slow and complicated queries and indexes in SQL we do get the speed we need. Wish me luck.