Showing posts with label MongoDB. Show all posts
Showing posts with label MongoDB. Show all posts

Monday, June 15, 2015

NoSQL - Open source equals more innovation?

As I have moved my team to using more NoSQL data stores, I found an interesting side topic to the SQL vs. NoSQL issues. The vast majority of NoSQL technology is open source. Open source has been around for a while. Started in 1985 by the Free Software Foundation and now fostered by the Open Source Initiative.

But we can look at the roots of the concept going back to the Founding Fathers with the creation of the U.S. Patent system, which may seem to be an anathema to our current views of open source. At that point, the U.S. government wanted to spur technical innovation, by allowing inventors to share their inventions and in return have the ability to license their shared inventions for a limited time (20 years which is forever in software time frames). After that time, the invention then enters the public domain. The alternative to a patent was to keep the invention secret, and not share it at all. This is what Coca-cola did, and it is generally called a trade secret.

So it is interesting, that recently patents have now taken on the light of inhibiting innovation. NoSQL has been the most significant innovation in data technology and almost all the main players are open source: MongoDB, Couchbase, CouchDB, Cassandra, Redis, Elasticsearch, Lucene, Hadoop... Here is one list from last year.

So what is it that makes open source drive innovation? I found in my entry into the world quite intriguing. I jumped on the forums to learn about how the new software I was testing. Soon I found I was poking around GitHub to understand how the software works. I noted that one of software packages did not work with the JVM I was using, and I tweaked some Java code (I am not a Java programmer). This tweak was then rolled up into GitHub where the moderates incorporated it into the next release. Similarly for another product, I found I needed some connection pooling and working with the community we came up with C# code to pool the connections and those modifications were rolled into the next release.

I really did not spend that much time on these code changes, but open source took my minor efforts and improved the overall code base in a way that proprietary code would never have done, and if thousands of programmers are doing this, you can see how powerful open source is in driving innovation and software development.

As I side note, I did just get a patent grant on some 3D integral photography software, I created with a friend. It took almost 5 years, and in those 5 years a lot software innovation has occurred!



Friday, June 12, 2015

Ventures into NoSQL

Last November, my team was sitting around our conference table trying to figure out how to take our search function to the next level. We wanted to build a mini-Google to effectively search and mine our data in any way possible. We had millions of records of data covering over 100,000 companies in our sectors.

At that point, we had already started to migrate to a Service Oriented Architecture, and so we wanted the supreme search service. Also at that point all our data was stored in a MS SQL server database which had served us well for 7 years. This server was starting to slow down noticeably. So we were ready to migrate to at least a newer version of SQL server. 

Back to our meeting, one of our guys said "how does Google do it?" And I said that they have a massive index stored on a grid of servers distributed all over the world. And at that point, we realized we have to build a massive index of our data on a much smaller scale, and it seemed that SQL was not the appropriate tool. So began our dive into the world of NoSQL. 

There is a bewildering array of technologies with key-value stores, document stores, wide column stores, graph databases, as well as several indexing engines based on Apache Lucene.

 At first we looked at graph databases, since our data tracked companies involved with mergers and acquisitions where companies folded into other companies and then spun off again. In addition, people were often serial entrepreneurs or they would be hired CEOs who would prepare companies for sale. A social graph of this world seemed appropriate. However it seemed that graph databases are highly tuned for find many levels of relationships it was not so obvious how they could tackle text searching and other types of searches.

Next we had a consultant build a small application in NodeJS using MongoDB in the cloud as a data store. MongoDB has the best reputation as a document store, and I knew a few others who were using it with their projects. However, we wanted to test it out, and our aged windows infrastructure at that point could only support Couchbase. So we figured that we could at least get a proof of concept up and running on our current environment. In addition, it had a plugin to connect it to ElasticSearch which is really powerful text and data indexing engine based on Lucene. And so we started...