Wednesday, December 21, 2011

Challenges of Classifying a Business

One of the biggest challenges our team has building a taxonomy of businesses is the actual classifying of a business. The process at best is semi-automated. Our classifiers generally have to look at the website of a given business and try to determine what they do. The website, generally speaking, is not written to describe how a business operates, but rather to sell the business's products and services. Each website has its own content, and we used to concentrate on the "About Us" page where the business defines itself. Unfortunately, the "About Us" page is usually some sort of vague mission statement. We even looked at our own website, and found the wording written by our Marketing guy was so general that you would not know what we did! So our classifiers have learned to scan around many pages of a businesses site to learn what they do, how they do it and who they sell to. We have built some scrapers, but our results have been mediocre. We are now currently looking at ways to scrape company websites and intelligently gather info data to further automate the classification. The big problem we saw was that matching words on the website against words in our thesaurus gave us too many false positives. We are now looking at weighting the value of words in our thesaurus and how they matched in the past with verified classifications. Anyone explore these types of auto-classification.

3 comments:

  1. Hi Keith - Interesting topic. The "big problem" you are encountering is very common and is caused by the ambiguity of language. No matter how much you tweek the thesaurus, the ambiguity will remain in the source material. Have you considered using a software tool that is built using linguistic semantics? This will help fix the problem of word meaning.

    ReplyDelete
  2. Bryan thanks for the comment. I agree semantics is the way to go. We did all sorts of tricks using weighting, porter stemming algorythm, etc. Plus we suffer from garbage in garbage out with a lot of the texts!. What packages have you used?

    ReplyDelete
  3. Hi Keith,

    I have an environmental, sustainable, and educational background; I offer an opinion from a different side of the coin.

    It sounds like you may need to work alongside the client or maybe a representative and learn a little about their product/service (at least to design the webpage), what they represent, and a taste of the company's personality or mission.

    It is my thought that the customer is a "new" generation. I find that older people don't have time to review a website, or as my dad calls it, "the machine." Today, people want fast service, easy to read, "to the point attitude," and what the company can do for them to better their life. Website designers and companies need to remember that people do not all read the same way either. Some may have reading disorders, such as ADD or ADHD.

    It is my thought that people may not care how a business operates, but if the company represents environmentally-friendly products.

    Like you mentioned, if the "About Us" page offers some sort of vague mission statement, the company needs to work on their description and what they want to convey to the world.

    ReplyDelete