Problem Statement

Understand the sentiment of user reviews and provide useful information for the end-user as well as the product manufacturer regarding public opinion of the product.

Abstract

Sentiment Analysis is a widely addressed Natural Language Processing task wherein the semantic orientation of a text unit is adjudged. However, a major challenge in Sentiment Analysis is the identification of entities towards which the opinion is expressed. Sentitool ( Aspect Based Sentiment Analysis system ) receives as input a set of texts ( product reviews) discussing a particular entity (e.g., a new model of a mobile phone). The systems attempt to detect the main (e.g., the most frequently discussed) aspects (features) of the entity (e.g., ‘battery’, ‘screen’) and to estimate the average sentiment of the texts per aspect (e.g., how positive or negative the opinions are on average for each aspect). It involves the extraction of the aspect term from a sentence and secondly the polarity of the opinion corresponding to that aspect is adjudged. We adopted an approach based on Probabilistic Graphical Models(PGMs). A linear-chain CRF is trained with features based on word vectors and text processing techniques(POS, dependency parse) to sequentially label the aspect term in a sentence. SVM classifier then identities the polarity corresponding to the aspect, with features based on cosine similarity with words from sentiwordnet.

Introduction

The term aspect refers to the features or aspects of a product, service or topic being discussed in a text. Sentiment analysis refers to identification and extraction of subjective impressions from text sources. It aims to determine the attitude of a person with respect to something in particular or the overall contextual polarity of a document. In general, a binary composition of opinions is assumed: for/against, like/dislike, good/bad etc. However, an opinion may also be categorized into a neutral sentiment.

When a review or a social media post talks about a product or service, the user might want to discuss multiple aspects or sub-topics related to the product or service being discussed. For example, in a restaurant review, while the customer might have good things to say about the food quality offered at a restaurant, she might be disappointed with the service offered to her, and she might think the decor needs to be revamped. So a general sentiment analyzer that determines the overall sentiment towards the product or service might not be able to capture the full essence of the review. Hence the need for Aspect-based Sentiment Analysis, for better and more fine-grained analysis of user feedback, which would enable service providers and product manufacturers to identify those business aspects that needs improvement.

The ultimate End goal is to be able to generate summaries listing all the aspects and their overall polarity. The outcome will be average sentiment for each aspect of an entity.

Related Work

We came across various works that were on the same lines as our project. Our basic inspiration for this problem statement is from [4]. We further found a blend of other ideas in [1] and [2] in their papers and use the same framework as theirs albeit with some changes. Further we have taken some considerations based on [3] as they present a new approach to phrase-level sentiment analysis which determines if an expression is neutral or polar and then disambiguates the polarity of the polar expressions.

Challenges

Applications

Parameter-based sentiment analysis of user reviews would allow us to give a detailed feedback to the manufacturer. Such a feedback would help them understand if the general public is unhappy with a certain aspect of their product and hence can help them modify it accordingly. For example, users may be unhappy with the screen resolution in the new iPhone 6s mobile). It can also be used to develop new products with emphasis on those particular parameters.

Such an analysis also helps us provide a targeted recommendation system for the users. For example, we can provide suggestions for products with good sentiment on screen resolution to users who might have complained about the same in their previous reviews.

Dataset

The site was crawled to obtain the reviews. The data extracted had to be cleaned again to remove reduplicated data (over 2gb). It is divided into various categories (books, toys, movies, etc.)

Approach

The project can be divided into three major tasks namely data extraction and processing, aspect and its category detection and assigning sentiment polarity.

Data Extraction involves collecting data (user reviews and other meta-data) from popular ecommerce websites.

Processing step converts unstructured data (raw html) into a structured format (relational tables) which can be used by our tool to determine the various aspects and their corresponding sentiments for each product.

Aspect Category (Entity and Attribute). Identify every entity E and attribute A pair E#A towards which an opinion is expressed in the given text. E and A should be chosen from predefined inventories of Entity types (e.g. laptop, keyboard, operating system, restaurant, food, drinks) and Attribute labels (e.g. performance, design, price, quality) per domain. Each E#A pair defines an aspect category of the given text.

Sentiment Polarity. Each identified E#A pair of the given text has to be assigned a polarity, from a set P = {positive, negative, neutral}.

Data Extraction and Processing

For the purposes of this project, user reviews were collected from e-commerce website: flipkart.

Tools used :

The crawling process was divided into 2 steps:

The data was stored in a relational database to allow for easy access in the future.

The following section explains the process of crawling in detail using the example of flipkart:

Aspect Detection

Category Detection

Category should be chosen from predefined inventories of Attribute labels (e.g. design, performance, price, quality) per domain.

Some examples highlighting these annotations are given below :

Sentiment Polarity

To obtain metric for features of the products we need to identify the opinion of that feature from various reviews available. We view the goal of reading multiple reviews as finding widely-held opinions and weighing the positive against the negative, and we wish to automate this sort of task using NLP and machine-learning techniques. Each identified Aspect and Category pair of the given text has to be assigned a polarity, from a set P = {positive, negative, neutral}. The neutral label applies to mildly positive or mildly negative sentiment as in examples 3 and 4 below. When the polarities of the aspects are found, the corresponding polarities of the categories is tuned accordingly and the final polarity of a category is maintained and updated with each review.

Challenges

Conclusion

We ran our model on Amazon Review dataset crawled by us. The tool works perfectly on any type of reviews single or compound reviews. It generates summaries listing all the aspects and their overall polarity. The outcome is be average sentiment for each aspect of an entity.

The only drawback is that it uses SentiWordNet which gives polarity of the aspects and the polarity accuracy is not so good. This work can be extended to include other data sets. A more accurate polarity generator can be used instead of SentiWordNet.

Tags

IRE #IIIT-H #Information #Retrieval #Extraction #Amazon #AmazonReviews #Reviews #SentimentAnalysis #MachineLearning #ML #Sentiment #CustomerReviews #Products #ProductReviews #SMAI #CustomerReviewAnalytics #Analysis #Customer #Rating