Follow the exploration on how to build a better taxonomy for business. I believe current taxonomies are currently lacking. Using a spatial concept of taxonomy, our team has built a taxonomy that is used to power research into Comparables for Mergers & Acquisitions in the evolving sector of Information, Software and Media, and I would like to share our thoughts on how to build a better taxonomy.
Showing posts with label Taxonomy. Show all posts
Showing posts with label Taxonomy. Show all posts
Thursday, March 15, 2012
Run don't walk - Simplicity and Speed
One of the unique features of our taxonomy is that we like to show how near items are in the taxonomy to other items classified using our taxonomy. Since we are classifying companies, it allows for us to define a "sphere of competition". Our first version of software which determined "nearness" used a walking algorithm which would walk up and down the trees to determine distances. Searches for "nearby companies" took a long time, up to a minute. Recently we changed the algorithm to use some algebra to do the walking and now searches take less than a second!. The algebraic algorithm was simpler and faster. Check it out on mandasoft.com. Use the see comparable deals link.
Tuesday, January 17, 2012
Taxonomy Evolution Conundrum
Our team has been developing our taxonomy for almost ten years now. Our goal is to classify businesses by looking at how they operate, who they serve, and what they do, and our focus has been on media and software businesses. Needless to say over the last ten years, there have been major changes to the media and software industries with the introduction of smart phones, tablet computers, cloud computing, SaaS, virtualization, etc. To handle this evolution of the content we are classifying, we need to make sure our framework was solid and that the taxonomy could change with abilities to add nodes, merge nodes, link nodes, and to make sure our classifications migrated with the changes. However, change is never apparent when it happens. When we saw the first business operating in Social Networking, we originally had them classified basically as forums of user generated content, as opposed to editorial content. But as the business and technology took off, and showed itself to be a new business model, we realized we had to add the term Social Networking to our taxonomy. Now our problem was that we had to go back and re-evaluate our companies that were classified as forums and see if they were really Social Networks. One way to fix this problem is to have an auto-classifier, and you set up a new set of rules to recognize Social Networking. Then you re-run the auto-classifier on those companies. But here is the conundrum, we noticed this evolution in business models because we had human eyes seeing the trend. How can you expect an auto-classifier to see that? What are your thoughts on this problem?
Monday, January 9, 2012
A Multi-Dimensional Quandary - Social Networking
Today our team had an interesting issue come up. As our taxonomy tries to model business, we try to keep on top of how business models evolve, and we find we have to reconsider terms and what they mean. The term that gave us pause today was Social Networking. We use a multi-dimensional taxonomy, where we have four distinct trees that model different aspects of a business and each one roughly answers the following questions: 1) who is the clientele of a business, 2) how does a company do business, 3) what problems does a business solve or subject area it specializes in, and 4) what channel does the business use to reach customers. So our problem was that we had Social Networking defined in our business solution tree, but then we found that Social Networking started to morph into something beyond Facebook, Twitter and LinkedIn. Shopping sites started to incorporate Social Networking into their businesses, and then Social Networking, no longer seemed like an "end: but a "means to an end". So our solution, which is far from the only solution was to add social networking to our channel tree, where it sits along with mobile and online terms. I hate having the same term in multiple trees, but the term is now used in multiple contexts, and we do have repeated terms for different contexts. What are your thoughts?
Wednesday, January 4, 2012
Flying a Kite - Starting a Taxonomy
For Christmas, I got this wonderful book about the Brooklyn Bridge. It is called "The Great Bridge" by David McCullough. One part, he wrote about the first bridge to cross the Niagara Gorge built by Charles Ellet. The way Ellet started the bridge was to offer five dollars the first American boy who could fly a kite over to the Canadian side of the gorge. The bridge span was 1,010 feet, and young Homer Walsh won the prize. Ellet took the kite string that spanned the gorge, and tied successively heavier cords and pulled them across the gorge until he had a heavy cable spanning the gorge and from that he built his bridge.
This story reminded me of our team's first efforts of building a business taxonomy. We started with a simple flat set of categories, and then added a second set of categories. After that we migrated to hierarchical trees, and then to banyans, and today we are ever adding features and complexity to our business taxonomy. But we could not have gotten to where we are today, unless we had first tried our first simple solution to span our own problem.
This story reminded me of our team's first efforts of building a business taxonomy. We started with a simple flat set of categories, and then added a second set of categories. After that we migrated to hierarchical trees, and then to banyans, and today we are ever adding features and complexity to our business taxonomy. But we could not have gotten to where we are today, unless we had first tried our first simple solution to span our own problem.
Tuesday, December 20, 2011
Classes versus Concepts Taxonomies
As I also post these entries on LinkedIn as well as other social media, sometimes a discussion will take off on one of the social media sites, and is not shown here on this blog. One of the most interesting responses to the True relationship between parent and childpost was on LinkedIn and covered what is the nature of nodes in one's taxonomy. Does each node represent a class or a concept? Denise B. of Kent State University raised this thought, and said that often one needs two taxonomies one where the nodes are classes and another where the nodes are concepts. The nature of your taxonomy will then define the nature of the parent-child relationship. In a class based taxonomy, the child is a subclass of the parent where the child has all the kinds of attributes of the parent and then has some special attributes specific to the child class. With a concept based taxonomy, we would then see each node embodies a concept and hence the relationship to a child node is not so strict. Thanks to Denise on the response.
Friday, December 16, 2011
True relationship between parent and child
I recently had to make a change to our taxonomy search. Our previous search would match a selected node and all its children. However, we found a need to search and just match a node and not any of its child nodes. I have seen this feature in many other systems, but we never found the need to implement it in our system till recently. This highlights a deeper question of what do we expect the relationship between parent and child nodes. When we first developed our taxonomy, we had long heated discussions on this issue for our industry tree. Let's take an example of a well known software company, Oracle. Under our software category we have operating systems, business applications, desktop applications, database systems, email systems, graphics applications, etc. Oracle is a big software company that makes all kinds of software, but does not create all the types of software under our software category, but they do most. One of our team members suggested that if a company is categorized as something then it must do all the subcategories. His suggestion was to categorize multiple times for each type of product it does exactly. However, it starts get ugly when a company like Oracle is operating in most of the subcategories but not all. Our general consensus was to categorize a company in the parent category if it covers a good portion of the subcategories, but not all. If the company covers a few of the subcategories, then we will have multiple categorizations. The real key is consistency. We take the approach (and near cliche) that the parent is something different than the sum of its children.
Monday, December 12, 2011
Challenges of Searching a Complex Taxonomy
As I have noted, our team has developed a very complex multi-faceted taxonomy which give a very exact categorization of any given business. The complexity we have created comes with a price. It creates challenges for users searching for businesses. A company is categorized by who their clientele is, what business needs they fulfill and by how they fill those needs, and what channels they use. A typical user does not think of businesses in such a manner. For instance, a typical search would be I would like all businesses that sell education software. In early versions of our application, users would have to savvy enough to enter in the search criteria education for the clientele and to enter software for industry (the "how a business operates"). Needless to say, users had to be trained to use the system, and hence the system was only used by our in house taxonomist who would get search requests and he would then give them results in a handy report. This was not ideal, because our taxonomist has many other things to do. The challenge for us was to make our taxonomy search as easy as Google. Nobody has to be trained to use Google, and we found that if training was involved generally that part of our application was not used unless the payback was great. In addition, since we were already fielding queries, our users found it much easier to send an email to our team, rather than run the search on their own. So how did we simplify our search. Well, in an earlier post, I mentioned we found that were phrases that users wanted in our taxonomy that would not fit in a single tree or dimension of our taxonomy. I called these phrases, "Meta-Terms". The example I gave was Trade Magazines which are B2B magazines. In our taxonomy we then map this term to magazines in our Industry tree and B2B in our Clientele tree. So the key to simplifying our search was to use "Meta-Terms" as a model for Google like search, Users can now type into a text box and it will see if it matches an existing "Meta-Term" or what I call an implicit "Meta-Term". Implicit "Meta-Terms" are created by synthesizing synonyms from our different trees' vocabularies. There are over a million possible combinations from the trees, but some of the combinations are unlikely to exist like (e-discovery software for teenagers). So we have created a list of synthesized terms from the companies we have already categorized (numbers around 23,000). From these categorized companies and all the synonyms, we get a list of about 200,000 implicit meta-terms. Our "Google" search box then matches the user input to our list of terms and runs a query on how that meta-term maps to our multiple dimensions. This is still in testing, but shows tremendous promise.
Friday, December 9, 2011
The Pecking Order - Rules Sets for Mapping
Continuing from the last post, let us look at how mapping from one taxonomy to another can be used. I had used the analogy of mapping rules sets as a camera where the rules sets will collapse the multi-dimension taxonomy to a single dimension taxonomy just as a camera takes a three dimensional space and collapses it to a two dimensional image. I also mentioned the the mapping rules in the rules set are prioritized. Going back to our camera, we know the objects in front will block objects in the back of our image. So with our taxonomy camera, mapping rules at the top of the order will override or block rules lower in the order. Now let us look around at how our businesses are run. Most businesses have a sales team. At our business, the sales team are managing directors and they do much more than just sales. Sales teams are given jurisdictions so as to keep them from stepping on each others toes. Depending on what a businesses product is, a business can define these sales jurisdictions by geography or by specialty based on domain knowledge. Taxonomies will not help define geographic sales territories, but if your business defines sales jurisdictions based on what the clients are (as opposed to where they are), you can use Mapping Rules Sets. Our team has defined a simplified taxonomy which describes the different practice groups, and then a mapping rules set that maps from our multi-dimensional taxonomy onto the simplified taxonomy. As anyone might know, the agreed upon rules for defining sales territories can be quite intricate and often contested, and using prioritized rules sets will help to define as broadly or narrowly any businesses sales territories. This pecking order of rules can be used to allocate leads to our sales teams in a consistent and efficient manner.
Thursday, December 8, 2011
A One Way Street to Clarity and Simplicity - More on Mapping a taxonomy to a taxonomy
In my last post, I did not elaborate too much on rules sets that contain the logic to map from one space to another. These rules sets are interesting in that usually when you map from a complex multi-dimensional taxonomy space to a simpler domain specific taxonomy space it is a one way mapping. A good way to think of it is to think about how photography works. A camera has a lens that focuses an image of a three dimensional space onto a two dimensional piece of film. Needless to say, there is a loss of information when the camera takes a picture because the resulting image is just a single view of a three dimensional image. Can we recreate the three dimensional space from our two dimensional photo? Not really, though I have seen some software that guess. Nevertheless, we still love photography. I was just looking at my wedding pictures last night, and in a way photography gives us a clearer vision of our shared reality from an authorial viewpoint.Great portraits or landscapes captures a moment and gives it clarity.
Let's get back to our idea of rules sets (our taxonomy camera), and how they map from from a complex multi-dimensional taxonomy space to a simpler domain specific taxonomy space. We develop the simpler taxonomy to give us a perspective of a domain which gives us vision of clarity and simplicity. We use it to give an authorial view of certain business sectors in a way that our more general purpose taxonomy can not do.
For those with a mathematical bent, I can say that our rules sets are prioritized rules and the fact that we have rules with greater priority than other rules makes these rules sets one way, and collapse the information to a simpler view. If we ran our rules sets on companies classified using the complex taxonomy to get the simpler classification, and then ran the rules sets in reverse on the simple taxonomy to get the categorizations in the complex taxonomy, the original complex classification will not be the same as the derived categorizations.
Let's get back to our idea of rules sets (our taxonomy camera), and how they map from from a complex multi-dimensional taxonomy space to a simpler domain specific taxonomy space. We develop the simpler taxonomy to give us a perspective of a domain which gives us vision of clarity and simplicity. We use it to give an authorial view of certain business sectors in a way that our more general purpose taxonomy can not do.
For those with a mathematical bent, I can say that our rules sets are prioritized rules and the fact that we have rules with greater priority than other rules makes these rules sets one way, and collapse the information to a simpler view. If we ran our rules sets on companies classified using the complex taxonomy to get the simpler classification, and then ran the rules sets in reverse on the simple taxonomy to get the categorizations in the complex taxonomy, the original complex classification will not be the same as the derived categorizations.
Wednesday, December 7, 2011
Mapping a taxonomy to a taxonomy
In my last post, I talked about "meta-terms" which mapped a commonly used expressions to nodes in multiple trees. This concept could be taken much further. When our team built our four dimensional taxonomy, our goal was to be able to classify any business, and to find similarities between companies even though traditionally they would be considered to be operating in different arenas. My favorite example is to look at Intuit which creates financial software for the consumer, and compare it to H.R. Block which provides a financial services for consumers. In the tax arena, they both provide help to people doing their taxes, and compete directly. Our taxonomy categorizes Intuit as a consumer software company for taxes, while H.R. Block is a consumer service company for taxes. As you see these companies overlap on what they do, and who they do it for, but not on how they do it. Interestingly enough, Intuit started offering a professional help service and H.R. Block started offer a software package.
What this brings up is that our taxonomy is complicated. Our team produces reports on Merger and Acquisition activity in a variety of segments (http://mandasoft.com), and each of these business segments like to break down using their own taxonomies specific to their domain. How do we reconcile the need for a taxonomy with nodes that can be used cross multiple domains, while needing to have easy to understand domain specific terms in a given domain? The way I like to see this problem is that we have a vocabulary that works great when looking at the business world at the 50,000 foot level, but when we get down into trenches, the terms start to look vague and confusing at the lower altitudes. The way we solved this was by building a system to create 50 ft level simple taxonomies for specific domains (e.g. healthcare media and software). We then categorize each business using the 50,000 foot level taxonomy, and we then have rules sets that map from 50,000 ft level taxonomy to the 50 ft level taxonomy. The utility is especially noted when we create multiple domains with their own rules sets (e.g. healthcare media and software, and Cloud Computing) and a business which may reside in both domains, only needs to be categorized once at the 50,000 ft level taxonomy. We can create as many domains as we need and not have to reclassify companies as our domain views evolve!
What this brings up is that our taxonomy is complicated. Our team produces reports on Merger and Acquisition activity in a variety of segments (http://mandasoft.com), and each of these business segments like to break down using their own taxonomies specific to their domain. How do we reconcile the need for a taxonomy with nodes that can be used cross multiple domains, while needing to have easy to understand domain specific terms in a given domain? The way I like to see this problem is that we have a vocabulary that works great when looking at the business world at the 50,000 foot level, but when we get down into trenches, the terms start to look vague and confusing at the lower altitudes. The way we solved this was by building a system to create 50 ft level simple taxonomies for specific domains (e.g. healthcare media and software). We then categorize each business using the 50,000 foot level taxonomy, and we then have rules sets that map from 50,000 ft level taxonomy to the 50 ft level taxonomy. The utility is especially noted when we create multiple domains with their own rules sets (e.g. healthcare media and software, and Cloud Computing) and a business which may reside in both domains, only needs to be categorized once at the 50,000 ft level taxonomy. We can create as many domains as we need and not have to reclassify companies as our domain views evolve!
Monday, December 5, 2011
Spheres of Competition
Last week I spoke of looking at a business taxonomy in new way. Generally, people think of taxonomies as a vocabulary with perhaps a hierarchical structure of categories and sub-categories. However when you build a multi-dimensional taxonomy as our team has, you can now start to think of it as a spatial topology. There are four trees and each one defines a dimension in our business taxonomy space. This thought is analogous to the special theory of relativity from physics where you have the x, y, z dimensions plus the time dimension. An "event" is a point in the space time continuum is defined by those four dimensions. In our business taxonomy space, a "company" is a point in the spatial topology defined by our four dimensions. If you draw a small sphere around a given company's point in our taxonomy, you will get all the competitors of that company. We have seen as you widen the sphere the outlying companies are less likely to be competitors. The key to making this work is to define the distances between points in a given dimension's tree. We generally realize that the distance between parent and child is shorter the deeper you get into the tree, and the distance between siblings is slightly more than that between parent and child. We also realize that you may define siblings where some siblings are closer in meaning than others. Our distance algorithm has to take all these things into consideration. Our work has been experimental, but has returned interesting results. We have use this in our drill-down feature on mandasoft.com. The space defined has to be tweaked, and I may leverage algorithms similar to Einstein's general relativity where actual data defining company revenue at a point in our topology could warp the spatial distances, just like physical mass warps physical space. Any thoughts?
Wednesday, November 30, 2011
Requirements for a better Business Taxonomy Part 3
Having discussed that we can have multiple dimensions for a detailed Business Taxonomy, lets see what dimensions we might want to have. The first two dimensions we discussed about described 1) Who is the company's clientele (for media we should look at the audience) 2)How the company services their clientele. I suggest we also give a dimension for 3)What business need the company accomplishes for their clientele. For instance, our hypothetical healthcare software could accomplish a particular process. A big new push in healthcare is Electronic Medical Records (EMR). If we have this third dimension, we can now classify a Healthcare Consulting company specializing in EMR. Now, if we search for businesses providing EMR solutions, we will get results for any company who are working in that space weather they are software or a consultant. Most taxonomies that "solve" this problem by searching on a mix of keywords and their tree structure. This third dimension gives a way to tie in companies that are working on related subjects but using different methodologies. Look at HR Block and Intuit. One is a service company and one is a software company, but both provide tax solutions. We will look at more dimensions in the next post.
Tuesday, November 29, 2011
Requirements for a better Business Taxonomy Part 2
Following my previous post, we see that a business can be classified in a parent child hierarchical taxonomy, but sometimes one could create a sub-category which is really expressing a not a sub-type of the parent category, but rather a different aspect of the business. As in healthcare software is not really a sub-category of software. Healthcare defines the customer base or subject matter of the software. A true sub-type of software would be infrastructure software or business application software. An improved business taxonomy would then categorize a company in multiple ways or dimensions. For instance you could have, a dimension to describe the clientele or market that. So our healthcare software company would have its clientele be set to healthcare. Another dimension would describe how the company solves the business problems in the case of our healthcare software the company would be categorized as software. In my next post we will discuss other possible dimensions for a business taxonomy.
Monday, November 28, 2011
Requirements for a better Business Taxonomy Part 1
Most business taxonomies, I have seen, are hierarchical with parent and child relationships. A good example may be cloud computing. The term is new and kind of vague, but it can be divided into various sub categories like cloud computing hosting like Amazon Web Services or Microsoft Azure which will host virtual machines on the cloud, or cloud computing infrastructure services like cloud based backup services or antivirus services. However, if you see a cloud application that is specially oriented towards a given vertical like Healthcare which has special requirements like HIPPA, do you want to create a cloud subcategory for healthcare. What if there is a licensed software which manages Electronic Medical Records and is HIPPA compliant but does not run in the cloud. Do you create a subcategory for Healthcare under licensed software too? Then how do you find all the HIPPA compliant solutions? It seems that you need to categorize businesses in multiple ways. In effect, you should have a multi-dimensional hierarchical taxonomy to be able to better categorize and hence find businesses in any database. More on what these categories should be in my next post!
Wednesday, November 23, 2011
Why the North American Industry Classification System (NAICS) is no good?
When navigating a database of businesses, you need a taxonomy in order to find companies in an industry you are interested in. You would think that the NAICS would be ideal, however in practice none of the commercial databases use it. The reason is found on the US Census web site.
As stated on US Census web site, "The North American Industry Classification System (NAICS) is the standard used by Federal statistical agencies in classifying business establishments for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy." and "NAICS was developed under the auspices of the Office of Management and Budget (OMB), and adopted in 1997 to replace the Standard Industrial Classification (SIC) system. It was developed jointly by the U.S. Economic Classification Policy Committee (ECPC), Statistics Canada
, and Mexico's Instituto Nacional de Estadistica y Geografia
, to allow for a high level of comparability in business statistics among the North American countries."
The reason that it is not useful is that it is used to track broad trends. When you need to analyze business segments of our market you will see that a finer grained and richer taxonomy is needed. I have started this blog to explore this issue as my team and I continue to tackle the issues. Stay tuned.
As stated on US Census web site, "The North American Industry Classification System (NAICS) is the standard used by Federal statistical agencies in classifying business establishments for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy." and "NAICS was developed under the auspices of the Office of Management and Budget (OMB), and adopted in 1997 to replace the Standard Industrial Classification (SIC) system. It was developed jointly by the U.S. Economic Classification Policy Committee (ECPC), Statistics Canada


The reason that it is not useful is that it is used to track broad trends. When you need to analyze business segments of our market you will see that a finer grained and richer taxonomy is needed. I have started this blog to explore this issue as my team and I continue to tackle the issues. Stay tuned.
Subscribe to:
Posts (Atom)