that I wouldn't presume to add anything to it, except that the first question you need to ask your vendor is, what is your intercoder reliability score. If its under 80% run in the opposite direction because you'll never be able to adequately defend your data in front of the board or the c-suite. You need human verification to make sure what you're actually reporting on is even relevant to your business.


I can see that you are an expert at your field! I am launching a website soon, and your information will be very useful for me.. Thanks for all your help and wishing you all the success in your business.
Posted by: Ed Hardy Clothing Sale | December 21, 2009 at 12:45 AM
I concur with Mike, particularly about regionality and sentiment. Thanks for pointing out the post: it really does summarize my experiences. Text analytic algorithms are very limited and (ask a linguist) very crude in their approach to the complexities of human language. We won't see a reliable, valid solution in our lifetime.
Posted by: Andrew Laing | August 17, 2009 at 12:41 PM
The post's a very good summation of all that's wrong with automated analysis tools. Intercoder reliability, btw, is simply the measure of how closely two or more coders will analyse and rate the same piece of text. This is an especially crucial issue in multi-lingual programmes, where cultural norms and language use can be very distinct.
Aside from the lack of accuracy in sentiment rating (which won't be solved any time soon for automated tools, even for Google - for free or otherwise), the main issue, as pointed out in the original post, is the overwhelming volume of material that has to be filtered because it is irrelevant, spam or immaterial. An enormous, and un-acknowledged issue.
Thanks, Katie, for spotting that post...
Posted by: Mike Daniels | August 12, 2009 at 12:42 PM
My question reveals my own ignorance but also why you are a measurement goddess. What is an "intercoder reliability score"?
Posted by: Lucas Held | August 12, 2009 at 11:52 AM