Monday, December 12, 2011

Challenges of Searching a Complex Taxonomy

As I have noted, our team has developed a very complex multi-faceted taxonomy which give a very exact categorization of any given business. The complexity we have created comes with a price. It creates challenges for users searching for businesses. A company is categorized by who their clientele is, what business needs they fulfill and by how they fill those needs, and what channels they use. A typical user does not think of businesses in such a manner. For instance, a typical search would be I would like all businesses that sell education software. In early versions of our application, users would have to savvy enough to enter in the search criteria education for the clientele and to enter software for industry (the "how a business operates"). Needless to say, users had to be trained to use the system, and hence the system was only used by our in house taxonomist who would get search requests and he would then give them results in a handy report. This was not ideal, because our taxonomist has many other things to do. The challenge for us was to make our taxonomy search as easy as Google. Nobody has to be trained to use Google, and we found that if training was involved generally that part of our application was not used unless the payback was great. In addition, since we were already fielding queries, our users found it much easier to send an email to our team, rather than run the search on their own. So how did we simplify our search. Well, in an earlier post, I mentioned we found that were phrases that users wanted in our taxonomy that would not fit in a single tree or dimension of our taxonomy. I called these phrases, "Meta-Terms". The example I gave was Trade Magazines which are B2B magazines. In our taxonomy we then map this term to magazines in our Industry tree and B2B in our Clientele tree. So the key to simplifying our search was to use "Meta-Terms" as a model for Google like search, Users can now type into a text box and it will see if it matches an existing "Meta-Term" or what I call an implicit "Meta-Term". Implicit "Meta-Terms" are created  by synthesizing synonyms from our different trees' vocabularies. There are over a million possible combinations from the trees, but some of the combinations are unlikely to exist like (e-discovery software for teenagers). So we have created a list of synthesized terms from the companies we have already categorized (numbers around 23,000). From these categorized companies and all the synonyms, we get a list of about 200,000 implicit meta-terms. Our "Google" search box then matches the user input to our list of terms and runs a query on how that meta-term maps to our multiple dimensions. This is still in testing, but shows tremendous promise.

No comments:

Post a Comment