Showing posts with label Business Space. Show all posts
Showing posts with label Business Space. Show all posts

Thursday, January 28, 2016

NoSQL Confession: I miss “joins”

I have been working with ElasticSearch and Couchbase this past year and a half, and I have a confession to make. Yes it has been amazing to see how to build a flexible, scalable and fast database that integrates sophisticated text searches with more traditional value matching queries. At Berkery Noyes, we have built, on NoSQL technology, an amazing tool to sift through millions of records of business intelligence including web visits, landing pages, emails, phone calls, merger and acquisition activity and even changes to company personnel. We use this tool to focus our efforts, and preliminary usage shows the search tool to be able to identify solid leads. But…


To do this, we have needed to de-normalize a large amount of data into our data documents. The NoSQL “join” features in both ElasticSearch and Couchbase are still too primitive to effectively use. I keep hitting problems like the inability to sort on “joined” documents, or severe latencies in updates to indices. In addition, there is the additional headache of insuring that de-normalized data is up to date. Oh for the old days of Primary Keys and Foreign Keys. However, for us, there is no turning back, the new power that our search app has is so useful that we deal with it. I guess I miss the old days of 2013 when we could have all our data tied up in a neat algebraic package on a SQL database. We live in a world that can be a little messy, so perhaps our data should reflect that.

Thursday, March 15, 2012

Run don't walk - Simplicity and Speed

One of the unique features of our taxonomy is that we like to show how near items are in the taxonomy to other items classified using our taxonomy. Since we are classifying companies, it allows for us to define a "sphere of competition". Our first version of software which determined "nearness" used a walking algorithm which would walk up and down the trees to determine distances. Searches for "nearby companies" took a long time, up to a minute. Recently we changed the algorithm to use some algebra to do the walking and now searches take less than a second!. The algebraic algorithm was simpler and faster. Check it out on mandasoft.com. Use the see comparable deals link.

Wednesday, December 14, 2011

Banyan provides an interesting view on our data.

The folks at Ongig.com, a new video enhanced job site, asked for some of our data to see who are the top 25 most efficient Tech companies based on profit per employee. The winner is SanDisk followed by Google and Apple.

http://ongig.com/blog/wall-street/profit-per-employee-tech#more-1399

We were able to easily provide this because our chief taxonomist had created a top level node in our industry tree called Information Technology which has child nodes for Computer Manufacturers, Software Companies, Electronic Information, and IT Consultants. All these nodes have other parents elsewhere in the tree, but the Banyan helps pull it all together.

I would also like to add a comment by Leo Meerman from LinkedIn. He mentioned that what I call a "Banyan Tree" is better known as a "Polyhierarchy".

Monday, December 12, 2011

Challenges of Searching a Complex Taxonomy

As I have noted, our team has developed a very complex multi-faceted taxonomy which give a very exact categorization of any given business. The complexity we have created comes with a price. It creates challenges for users searching for businesses. A company is categorized by who their clientele is, what business needs they fulfill and by how they fill those needs, and what channels they use. A typical user does not think of businesses in such a manner. For instance, a typical search would be I would like all businesses that sell education software. In early versions of our application, users would have to savvy enough to enter in the search criteria education for the clientele and to enter software for industry (the "how a business operates"). Needless to say, users had to be trained to use the system, and hence the system was only used by our in house taxonomist who would get search requests and he would then give them results in a handy report. This was not ideal, because our taxonomist has many other things to do. The challenge for us was to make our taxonomy search as easy as Google. Nobody has to be trained to use Google, and we found that if training was involved generally that part of our application was not used unless the payback was great. In addition, since we were already fielding queries, our users found it much easier to send an email to our team, rather than run the search on their own. So how did we simplify our search. Well, in an earlier post, I mentioned we found that were phrases that users wanted in our taxonomy that would not fit in a single tree or dimension of our taxonomy. I called these phrases, "Meta-Terms". The example I gave was Trade Magazines which are B2B magazines. In our taxonomy we then map this term to magazines in our Industry tree and B2B in our Clientele tree. So the key to simplifying our search was to use "Meta-Terms" as a model for Google like search, Users can now type into a text box and it will see if it matches an existing "Meta-Term" or what I call an implicit "Meta-Term". Implicit "Meta-Terms" are created  by synthesizing synonyms from our different trees' vocabularies. There are over a million possible combinations from the trees, but some of the combinations are unlikely to exist like (e-discovery software for teenagers). So we have created a list of synthesized terms from the companies we have already categorized (numbers around 23,000). From these categorized companies and all the synonyms, we get a list of about 200,000 implicit meta-terms. Our "Google" search box then matches the user input to our list of terms and runs a query on how that meta-term maps to our multiple dimensions. This is still in testing, but shows tremendous promise.

Monday, December 5, 2011

Spheres of Competition

Last week I spoke of looking at a business taxonomy in new way. Generally, people think of taxonomies as a vocabulary with perhaps a hierarchical structure of categories and sub-categories. However when you build a multi-dimensional taxonomy as our team has, you can now start to think of it as a spatial topology. There are four trees and each one defines a dimension in our business taxonomy space. This thought is analogous to the special theory of relativity from physics where you have the x, y, z dimensions plus the time dimension. An "event" is a point in the space time continuum is defined by those four dimensions. In our business taxonomy space, a "company" is a point in the spatial topology defined by our four dimensions. If you draw a small sphere around a given company's point in our taxonomy, you will get all the competitors of that company. We have seen as you widen the sphere the outlying companies are less likely to be competitors. The key to making this work is to define the distances between points in a given dimension's tree. We generally realize that the distance between parent and child is shorter the deeper you get into the tree, and the distance between siblings is slightly more than that between parent and child. We also realize that you may define siblings where some siblings are closer in meaning than others. Our distance algorithm has to take all these things into consideration. Our work has been experimental, but has returned interesting results. We have use this in our drill-down feature on mandasoft.com.  The space defined has to be tweaked, and I may leverage algorithms similar to Einstein's general relativity where actual data defining company revenue at a point in our topology could warp the spatial distances, just like physical mass warps physical space. Any thoughts?