Friday, December 16, 2011

True relationship between parent and child

I recently had to make a change to our taxonomy search. Our previous search would match a selected node and all its children. However, we found a need to search and just match a node and not any of its child nodes. I have seen this feature in many other systems, but we never found the need to implement it in our system till recently. This highlights a deeper question of what do we expect the relationship between parent and child nodes. When we first developed our taxonomy, we had long heated discussions on this issue for our industry tree. Let's take an example of a well known software company, Oracle. Under our software category we have operating systems, business applications, desktop applications, database systems, email systems, graphics applications, etc. Oracle is a big software company that makes all kinds of software, but does not create all the types of software under our software category, but they do most. One of our team members suggested that if a company is categorized as something then it must do all the subcategories. His suggestion was to categorize multiple times for each type of product it does exactly. However, it starts get ugly when a company like Oracle is operating in most of the subcategories but not all. Our general consensus was to categorize a company in the parent category if it covers a good portion of the subcategories, but not all. If the company covers a few of the subcategories, then we will have multiple categorizations. The real key is consistency. We take the approach (and near cliche) that the parent is something different than the sum of its children.

2 comments:

  1. The decision as to whether the search & retrieval of a parent term (broader term) should automatically retrieve everything that was tagged with any of its child/narrower terms, should depend on whether you can can and want to use the broader term to mean the broader category "in general" and not its specific narrower terms. This distinction is easier to make in manual indexing/tagging than auto-classification.
    Your example of Oracle brings up different issues of hierarchies, facets, and tagging. Your "software category" sounds like a hierarchy of product types. Oracle is either a company name or a brand type. It belongs in a separate hierarchy/facet than the generic product types. That facet could be called Manufacturers, Brands, Companies, Vendors, whatever makes sense to you. Then a content resource gets tagged with both terms for the type of software, such as database systems, and the company Oracle.
    I'm not exactly sure what you are referring to when you say "categorize" a company many times. Do you mean put the taxonomy term for the company name in multiple locations in the taxonomy, or do you mean tag a content resource about the company with multiple taxonomy terms covering different software products?

    ReplyDelete
  2. Very impressive thoughts on this quandry. I especially like Darin's coextensive classification (see discussion on LinkedIn). Also to Heather's point, internally we have discussed on whether to classify a business by classifying its products as Heather suggests. The origins of our design concerns Mergers and Acquisitions. We tend to look at businesses as salable entitities, and so we see companies as having "Lines of Business" rather than products. Looking back at Oracle which recently purchased Sun Microsystems, we see that Oracle took on a new Line of Business, specifically, server hardware and operating systems as well as Java.

    ReplyDelete