Showing posts with label Parent Child relationship. Show all posts
Showing posts with label Parent Child relationship. Show all posts

Thursday, January 28, 2016

NoSQL Confession: I miss “joins”

I have been working with ElasticSearch and Couchbase this past year and a half, and I have a confession to make. Yes it has been amazing to see how to build a flexible, scalable and fast database that integrates sophisticated text searches with more traditional value matching queries. At Berkery Noyes, we have built, on NoSQL technology, an amazing tool to sift through millions of records of business intelligence including web visits, landing pages, emails, phone calls, merger and acquisition activity and even changes to company personnel. We use this tool to focus our efforts, and preliminary usage shows the search tool to be able to identify solid leads. But…


To do this, we have needed to de-normalize a large amount of data into our data documents. The NoSQL “join” features in both ElasticSearch and Couchbase are still too primitive to effectively use. I keep hitting problems like the inability to sort on “joined” documents, or severe latencies in updates to indices. In addition, there is the additional headache of insuring that de-normalized data is up to date. Oh for the old days of Primary Keys and Foreign Keys. However, for us, there is no turning back, the new power that our search app has is so useful that we deal with it. I guess I miss the old days of 2013 when we could have all our data tied up in a neat algebraic package on a SQL database. We live in a world that can be a little messy, so perhaps our data should reflect that.

Tuesday, December 20, 2011

Classes versus Concepts Taxonomies

As I also post these entries on LinkedIn as well as other social media, sometimes a discussion will take off on one of the social media sites, and is not shown here on this blog. One of the most interesting responses to the True relationship between parent and childpost was on LinkedIn and covered what is the nature of nodes in one's taxonomy. Does each node represent a class or a concept? Denise B. of Kent State University raised this thought, and said that often one needs two taxonomies one where the nodes are classes and another where the nodes are concepts. The nature of your taxonomy will then define the nature of the parent-child relationship. In a class based taxonomy, the child is a subclass of the parent where the child has all the kinds of attributes of the parent and then has some special attributes specific to the child class. With a concept based taxonomy, we would then see each node embodies a concept and hence the relationship to a child node is not so strict. Thanks to Denise on the response.

Friday, December 16, 2011

True relationship between parent and child

I recently had to make a change to our taxonomy search. Our previous search would match a selected node and all its children. However, we found a need to search and just match a node and not any of its child nodes. I have seen this feature in many other systems, but we never found the need to implement it in our system till recently. This highlights a deeper question of what do we expect the relationship between parent and child nodes. When we first developed our taxonomy, we had long heated discussions on this issue for our industry tree. Let's take an example of a well known software company, Oracle. Under our software category we have operating systems, business applications, desktop applications, database systems, email systems, graphics applications, etc. Oracle is a big software company that makes all kinds of software, but does not create all the types of software under our software category, but they do most. One of our team members suggested that if a company is categorized as something then it must do all the subcategories. His suggestion was to categorize multiple times for each type of product it does exactly. However, it starts get ugly when a company like Oracle is operating in most of the subcategories but not all. Our general consensus was to categorize a company in the parent category if it covers a good portion of the subcategories, but not all. If the company covers a few of the subcategories, then we will have multiple categorizations. The real key is consistency. We take the approach (and near cliche) that the parent is something different than the sum of its children.