Friday, December 23, 2011

Winter Solstice Taxonomy

A friend of mine has his own business, and he recently had an interesting problem. He sent out to his client list a generic holiday email which did not mention Christmas or any specific holiday. One of the people who got the email replied very negatively about how Christmas is never mentioned specifically by name, and was very upset. My friend happens to have been raised by parents who come from two different religious traditions, and so he not surprisingly wants to wish everyone well. So how can his dilemma be solved by Taxonomy? As I mentioned earlier our team uses multiple dimensions to classify businesses. He could create a taxonomy of holidays. For instance, you could have a category for the midwinter holiday, and you could code each instance to given religious audience: Christian, Jewish, Muslim, Hindu, Pagan, Agnostic, Atheist, etc. which would give us imagery and words for Christmas, Hanukkah, Eid, Diwali, Saturnalia, Winter Solstice, etc. You could then continue this for the major fall, spring and summer holidays. My son's school, interestingly enough, only celebrates the equinoxes and solstices.

In all seriousness, I wish you all a Happy New Year!

Wednesday, December 21, 2011

Challenges of Classifying a Business

One of the biggest challenges our team has building a taxonomy of businesses is the actual classifying of a business. The process at best is semi-automated. Our classifiers generally have to look at the website of a given business and try to determine what they do. The website, generally speaking, is not written to describe how a business operates, but rather to sell the business's products and services. Each website has its own content, and we used to concentrate on the "About Us" page where the business defines itself. Unfortunately, the "About Us" page is usually some sort of vague mission statement. We even looked at our own website, and found the wording written by our Marketing guy was so general that you would not know what we did! So our classifiers have learned to scan around many pages of a businesses site to learn what they do, how they do it and who they sell to. We have built some scrapers, but our results have been mediocre. We are now currently looking at ways to scrape company websites and intelligently gather info data to further automate the classification. The big problem we saw was that matching words on the website against words in our thesaurus gave us too many false positives. We are now looking at weighting the value of words in our thesaurus and how they matched in the past with verified classifications. Anyone explore these types of auto-classification.

Tuesday, December 20, 2011

Classes versus Concepts Taxonomies

As I also post these entries on LinkedIn as well as other social media, sometimes a discussion will take off on one of the social media sites, and is not shown here on this blog. One of the most interesting responses to the True relationship between parent and childpost was on LinkedIn and covered what is the nature of nodes in one's taxonomy. Does each node represent a class or a concept? Denise B. of Kent State University raised this thought, and said that often one needs two taxonomies one where the nodes are classes and another where the nodes are concepts. The nature of your taxonomy will then define the nature of the parent-child relationship. In a class based taxonomy, the child is a subclass of the parent where the child has all the kinds of attributes of the parent and then has some special attributes specific to the child class. With a concept based taxonomy, we would then see each node embodies a concept and hence the relationship to a child node is not so strict. Thanks to Denise on the response.

Friday, December 16, 2011

True relationship between parent and child

I recently had to make a change to our taxonomy search. Our previous search would match a selected node and all its children. However, we found a need to search and just match a node and not any of its child nodes. I have seen this feature in many other systems, but we never found the need to implement it in our system till recently. This highlights a deeper question of what do we expect the relationship between parent and child nodes. When we first developed our taxonomy, we had long heated discussions on this issue for our industry tree. Let's take an example of a well known software company, Oracle. Under our software category we have operating systems, business applications, desktop applications, database systems, email systems, graphics applications, etc. Oracle is a big software company that makes all kinds of software, but does not create all the types of software under our software category, but they do most. One of our team members suggested that if a company is categorized as something then it must do all the subcategories. His suggestion was to categorize multiple times for each type of product it does exactly. However, it starts get ugly when a company like Oracle is operating in most of the subcategories but not all. Our general consensus was to categorize a company in the parent category if it covers a good portion of the subcategories, but not all. If the company covers a few of the subcategories, then we will have multiple categorizations. The real key is consistency. We take the approach (and near cliche) that the parent is something different than the sum of its children.

Wednesday, December 14, 2011

Banyan provides an interesting view on our data.

The folks at Ongig.com, a new video enhanced job site, asked for some of our data to see who are the top 25 most efficient Tech companies based on profit per employee. The winner is SanDisk followed by Google and Apple.

http://ongig.com/blog/wall-street/profit-per-employee-tech#more-1399

We were able to easily provide this because our chief taxonomist had created a top level node in our industry tree called Information Technology which has child nodes for Computer Manufacturers, Software Companies, Electronic Information, and IT Consultants. All these nodes have other parents elsewhere in the tree, but the Banyan helps pull it all together.

I would also like to add a comment by Leo Meerman from LinkedIn. He mentioned that what I call a "Banyan Tree" is better known as a "Polyhierarchy".

Tuesday, December 13, 2011

The Banyan Tree - a new hierarchy

Most taxonomies are set up in a hierarchical tree format. Our team consists of four distinct trees for each facet of how a business operates. Developing the taxonomy and software over the last 8 years, we found at some point that the tree structure became too strict. There were certain categories that did not want to be under just one parent category. A prime example of this situation is video game companies. These companies make software for entertainment purposes. As this industry has matured, it has become closely link with the big entertainment companies, and they employ teams of artists, writers as well as programmers. Historically, these businesses should be software, but they are also so tied closely to entertainment companies it seems odd that when searching for entertainment companies that these would not turn up in our search results. One solution is to move video game studios to under the entertainment category, but then we lose the software aspect of the business. Our team's solution was to change the nature of our trees. We now allow nodes to have multiple parents, and create what I call the "Banyan" tree, which is a tree from India that has multiple trunks to the ground. 


Looking at the Wikipedia article, we see that the banyan tree name comes from the Gujarti word for merchant, because merchant markets were often located under these great trees. It seems appropriate for a business taxonomy.

Monday, December 12, 2011

Challenges of Searching a Complex Taxonomy

As I have noted, our team has developed a very complex multi-faceted taxonomy which give a very exact categorization of any given business. The complexity we have created comes with a price. It creates challenges for users searching for businesses. A company is categorized by who their clientele is, what business needs they fulfill and by how they fill those needs, and what channels they use. A typical user does not think of businesses in such a manner. For instance, a typical search would be I would like all businesses that sell education software. In early versions of our application, users would have to savvy enough to enter in the search criteria education for the clientele and to enter software for industry (the "how a business operates"). Needless to say, users had to be trained to use the system, and hence the system was only used by our in house taxonomist who would get search requests and he would then give them results in a handy report. This was not ideal, because our taxonomist has many other things to do. The challenge for us was to make our taxonomy search as easy as Google. Nobody has to be trained to use Google, and we found that if training was involved generally that part of our application was not used unless the payback was great. In addition, since we were already fielding queries, our users found it much easier to send an email to our team, rather than run the search on their own. So how did we simplify our search. Well, in an earlier post, I mentioned we found that were phrases that users wanted in our taxonomy that would not fit in a single tree or dimension of our taxonomy. I called these phrases, "Meta-Terms". The example I gave was Trade Magazines which are B2B magazines. In our taxonomy we then map this term to magazines in our Industry tree and B2B in our Clientele tree. So the key to simplifying our search was to use "Meta-Terms" as a model for Google like search, Users can now type into a text box and it will see if it matches an existing "Meta-Term" or what I call an implicit "Meta-Term". Implicit "Meta-Terms" are created  by synthesizing synonyms from our different trees' vocabularies. There are over a million possible combinations from the trees, but some of the combinations are unlikely to exist like (e-discovery software for teenagers). So we have created a list of synthesized terms from the companies we have already categorized (numbers around 23,000). From these categorized companies and all the synonyms, we get a list of about 200,000 implicit meta-terms. Our "Google" search box then matches the user input to our list of terms and runs a query on how that meta-term maps to our multiple dimensions. This is still in testing, but shows tremendous promise.

Friday, December 9, 2011

The Pecking Order - Rules Sets for Mapping

Continuing from the last post, let us look at how mapping from one taxonomy to another can be used. I had used the analogy of mapping rules sets as a camera where the rules sets will collapse the multi-dimension taxonomy to a single dimension taxonomy just as a camera takes a three dimensional space and collapses it to a two dimensional image. I also mentioned the the mapping rules in the rules set are prioritized. Going back to our camera, we know the objects in front will block objects in the back of our image. So with our taxonomy camera, mapping rules at the top of the order will override or block rules lower in the order. Now let us look around at how our businesses are run. Most businesses have a sales team. At our business, the sales team are managing directors and they do much more than just sales. Sales teams are given jurisdictions so as to keep them from stepping on each others toes. Depending on what a businesses product is, a business can define these sales jurisdictions by geography or by specialty based on domain knowledge. Taxonomies will not help define geographic sales territories, but if your business defines sales jurisdictions based on what the clients are (as opposed to where they are), you can use Mapping Rules Sets. Our team has defined a simplified taxonomy which describes the different practice groups, and then a mapping rules set that maps from our multi-dimensional taxonomy onto the simplified taxonomy. As anyone might know, the agreed upon rules for defining sales territories can be quite intricate and often contested, and using prioritized rules sets will help to define as broadly or narrowly any businesses sales territories. This pecking order of rules can be used to allocate leads to our sales teams in a consistent and efficient manner.

Thursday, December 8, 2011

A One Way Street to Clarity and Simplicity - More on Mapping a taxonomy to a taxonomy

In my last post, I did not elaborate too much on rules sets that contain the logic to map from one space to another. These rules sets are interesting in that usually when you map from a complex multi-dimensional taxonomy space to a simpler domain specific taxonomy space it is a one way mapping. A good way to think of it is to think about how photography works. A camera has a lens that focuses an image of a three dimensional space onto a two dimensional piece of film. Needless to say, there is a loss of information when the camera takes a picture because the resulting image is just a single view of a three dimensional image. Can we recreate the three dimensional space from our two dimensional photo? Not really, though I have seen some software that guess. Nevertheless, we still love photography. I was just looking at my wedding pictures last night, and in a way photography gives us a clearer vision of our shared reality from an authorial viewpoint.Great portraits or landscapes captures a moment and gives it clarity.

Let's get back to our idea of rules sets (our taxonomy camera), and how they map from from a complex multi-dimensional taxonomy space to a simpler domain specific taxonomy space. We develop the simpler taxonomy to give us a perspective of a domain which gives us vision of clarity and simplicity. We use it to give an authorial view of certain business sectors in a way that our more general purpose taxonomy can not do.

For those with a mathematical bent, I can say that our rules sets are prioritized rules and the fact that we have rules with greater priority than other rules makes these rules sets one way, and collapse the information to a simpler view. If we ran our rules sets on companies classified using the complex taxonomy to get the simpler classification, and then ran the rules sets in reverse on the simple taxonomy to get the categorizations in the complex taxonomy, the original complex classification will not be the same as the derived categorizations.

Wednesday, December 7, 2011

Mapping a taxonomy to a taxonomy

In my last post, I talked about "meta-terms" which mapped a commonly used expressions to nodes in multiple trees. This concept could be taken much further. When our team built our four dimensional taxonomy, our goal was to be able to classify any business, and to find similarities between companies even though traditionally they  would be considered to be operating in different arenas. My favorite example is to look at Intuit which creates financial software for the consumer, and compare it to H.R. Block which provides a financial services for consumers. In the tax arena, they both provide help to people doing their taxes, and compete directly. Our taxonomy categorizes Intuit as a consumer software company for taxes, while H.R. Block is a consumer service company for taxes. As you see these companies overlap on what they do, and who they do it for, but not on how they do it. Interestingly enough, Intuit started offering a professional help service and H.R. Block started offer a software package.

What this brings up is that our taxonomy is complicated. Our team produces reports on Merger and Acquisition activity in a variety of segments (http://mandasoft.com), and each of these business segments like to break down using their own taxonomies specific to their domain. How do we reconcile the need for a taxonomy with nodes that can be used cross multiple domains, while needing to have easy to understand domain specific terms in a given domain?  The way I like to see this problem is that we have a vocabulary that works great when looking at the business world at the 50,000 foot level, but when we get down into trenches, the terms start to look vague and confusing at the lower altitudes. The way we solved this was by building a system to create 50 ft level simple taxonomies for specific domains (e.g. healthcare media and software). We then categorize each business using the 50,000 foot level taxonomy, and we then have rules sets that map from 50,000 ft level taxonomy to the 50 ft level taxonomy. The utility is especially noted when we create multiple domains with their own rules sets  (e.g. healthcare media and software, and  Cloud Computing) and a business which may reside in both domains, only needs to be categorized once at the 50,000 ft level taxonomy. We can create as many domains as we need and not have to reclassify companies as our domain views evolve!

Tuesday, December 6, 2011

"Meta-terms" in a multi-faceted taxonomy

Most taxonomies have synonyms for their nodes. A good example may be if you have a node for hospitals. Hospitals also could be known as Medical Centers, Clinics, Surgery, Health Service, etc. These are the synonyms typical of any given taxonomy. In the multi-faceted taxonomy, our team uses we have such synonyms, but sometimes we find certain key industry terms that actually span across multiple dimensions. For instance, Trade Books, Trade Magazines, HIMS (Healthcare Information and Management Systems) are terms that can be like a synonym except, they point to nodes in multiple trees. Trade Books are books for consumers. In our taxonomy we then map this term to books in our Industry tree and consumer in our Clientele tree.  Trade Magazines are B2B magazines. In our taxonomy we then map this term to magazines in our Industry tree and B2B in our Clientele tree. We call these special terms "Meta-terms", and they act as little mini-maps in our taxonomy, and expands our vocabulary like synonyms do. Later I will describe how we can build more complicated maps and rule sets to build mini-taxonomies that can use new sets of key words to define  new ontologies which set on top of our main taxonomy.

Monday, December 5, 2011

Spheres of Competition

Last week I spoke of looking at a business taxonomy in new way. Generally, people think of taxonomies as a vocabulary with perhaps a hierarchical structure of categories and sub-categories. However when you build a multi-dimensional taxonomy as our team has, you can now start to think of it as a spatial topology. There are four trees and each one defines a dimension in our business taxonomy space. This thought is analogous to the special theory of relativity from physics where you have the x, y, z dimensions plus the time dimension. An "event" is a point in the space time continuum is defined by those four dimensions. In our business taxonomy space, a "company" is a point in the spatial topology defined by our four dimensions. If you draw a small sphere around a given company's point in our taxonomy, you will get all the competitors of that company. We have seen as you widen the sphere the outlying companies are less likely to be competitors. The key to making this work is to define the distances between points in a given dimension's tree. We generally realize that the distance between parent and child is shorter the deeper you get into the tree, and the distance between siblings is slightly more than that between parent and child. We also realize that you may define siblings where some siblings are closer in meaning than others. Our distance algorithm has to take all these things into consideration. Our work has been experimental, but has returned interesting results. We have use this in our drill-down feature on mandasoft.com.  The space defined has to be tweaked, and I may leverage algorithms similar to Einstein's general relativity where actual data defining company revenue at a point in our topology could warp the spatial distances, just like physical mass warps physical space. Any thoughts?

Friday, December 2, 2011

Why is a multi-dimensional faceted taxonomy better for Business?

As we have seen, we can more closely describe a company and how they operate by using four different hierarchical trees to categorize their clientele, their methodologies, their solutions and their channels. This faceted taxonomy definitely gives us a better view of any particular business, but at what cost.  Our team has definitely found using this taxonomy has been challenging to categorize a given company. The taxonomist must research thoroughly a given business, and then be able to abstract that understanding into the four different dimensions. Another challenge we found in our first search tool, was that the user using the search tool needed to understand how to abstract the kinds of businesses they were looking for into the four different dimensions. These two issues makes us wonder whether it is worth it. (Currently, we are working on software algorithms to make the search more user friendly and to make suggestions from web scraping to help the categorization.) 


We find it is worth it. The reason why is that, unlike NAICS codes, our taxonomy allows us to dig into micro-market views. NAICS is good for broad markets, but not for close views of a given market segment. Looking back to the 2004 Presidential Election, we see that President Bush's team was able to efficiently direct resources by using Microtargeting. The impetus of our taxonomy is Mergers & Acquisitions, and an important part of that process is valuation. To find the potential target company of an acquisition, investment bankers search for recent comparable deals. A task that is impossible using NAICS codes.  With our taxonomy, we can do this. The reason is because we have moved away from the traditional way of viewing "Business" which is in terms of a vocabulary.  Instead, we now can view "Business" as a multi-dimensional space where we can define "Spheres of Competition" to determine "comparables". Stay tuned for more.

Thursday, December 1, 2011

Requirements for a better Business Taxonomy Part 4

My last post talked about how we can categorize a company in three different ways:  1) Who their customers or audience are. 2). How they serve their clientele. 3). What business need do they fulfill for their clientele. This is what I have heard called a faceted taxonomy. But I prefer it to be called a multidimensional taxonomy. Now we must ask ourselves are there anymore dimensions which could be useful. Our team has found one which is a little goofy. This is what I call the channel dimension. A company will provide a service, but with all these new means of reaching clients via mobile or internet. Maybe we can have a small domain which defines these different ways of reaching people. So for our example of healthcare EMR software, this company could channel their services through licensed software installed at the client, or via a subscription of hosted software also known as Software as a Service (SaaS). Could anyone think of another useful dimension for a Business Taxonomy?