Friday, June 12, 2015

Ventures into NoSQL

Last November, my team was sitting around our conference table trying to figure out how to take our search function to the next level. We wanted to build a mini-Google to effectively search and mine our data in any way possible. We had millions of records of data covering over 100,000 companies in our sectors.

At that point, we had already started to migrate to a Service Oriented Architecture, and so we wanted the supreme search service. Also at that point all our data was stored in a MS SQL server database which had served us well for 7 years. This server was starting to slow down noticeably. So we were ready to migrate to at least a newer version of SQL server. 

Back to our meeting, one of our guys said "how does Google do it?" And I said that they have a massive index stored on a grid of servers distributed all over the world. And at that point, we realized we have to build a massive index of our data on a much smaller scale, and it seemed that SQL was not the appropriate tool. So began our dive into the world of NoSQL. 

There is a bewildering array of technologies with key-value stores, document stores, wide column stores, graph databases, as well as several indexing engines based on Apache Lucene.

 At first we looked at graph databases, since our data tracked companies involved with mergers and acquisitions where companies folded into other companies and then spun off again. In addition, people were often serial entrepreneurs or they would be hired CEOs who would prepare companies for sale. A social graph of this world seemed appropriate. However it seemed that graph databases are highly tuned for find many levels of relationships it was not so obvious how they could tackle text searching and other types of searches.

Next we had a consultant build a small application in NodeJS using MongoDB in the cloud as a data store. MongoDB has the best reputation as a document store, and I knew a few others who were using it with their projects. However, we wanted to test it out, and our aged windows infrastructure at that point could only support Couchbase. So we figured that we could at least get a proof of concept up and running on our current environment. In addition, it had a plugin to connect it to ElasticSearch which is really powerful text and data indexing engine based on Lucene. And so we started...

No comments:

Post a Comment