Social sentiment analysis is simply the most overrated of the Hadoop applications, which should be no surprise, given that we breathe in a world with a constantly connected and expressive population. This use of Hadoop leverages all kind of contents from content management systems, blogs, forums and other social media tools to generate a sense of what individuals are doing (for instance, life events) and how they react to the people around them (sentiment). Since text-based data doesn’t usually fit into a relational database (RDMS’s), Hadoop is a perfect destination to explore and analyse this kind of data.
Language is difficult to interpret, even for human beings at times — especially if we are reading text written by people in a social group that’s different from our own. This group of people may be speaking our language, but their expressions and style are completely foreign, so we have no idea whether they’re talking about a good experience or a bad one. For example, if we heard the phrase bomb with reference to a movie, we may conclude that the movie was not good (or good, if we are part of the youth movement that recognizes “its bomb” as a compliment); also, if we are in the airline security business, this phrase bomb would led us to a different interpretation. The thing is that linguistics is used in variety of distinct ways and is constantly evolving.
When we analyse sentiment on social media, we can choose from multiple approaches. The basic method programmatically parses the text phrases and expressions, extracts strings, and applies logics or rules. In most common conditions, this mechanism is practical and reasonable. But as a requirement varies and rules get more complicated, manually coding text-extractions clearly becomes no longer effectively feasible from the point of view for code maintenance, especially for performance optimization. Grammar- and rules-based strategies to text processing are computationally expensive, which is an important constraint in large-scale extraction in Hadoop. The greater involved the rules (which are inevitable for complex purposes such as sentiment extraction), the more processing that’s needed.
In software development, a statistics-based alternative approach is becoming increasingly common for sentiment analysis. Rather than manually write complex rules, we can use the classification-oriented machine-learning models in Apache Mahout. The catch here is that we will need to train our models with examples of positive and negative sentiment. The more training data we provide (for example, text from tweets and your classification), the more accurate our results. The social sentiment analysis can be applied across a wide range of industries for example, food safety, health care etc.