Project

Using Yelp dataset, we have built a Predictive Analytics application that help us to define which emotions have a higher influence in the Yelp ratings in Charlotte, based in the correlation between the sentiment analysis of the users reviews and average stars of each business.

TECHNOLOGIES USED:

R

Sentiment analysis conducted on users reviews using Syuzhet library.

Pandas

Datasets cleansing, filtering and merging to a single dataset for further analysis.

Matplotlib

MultiLinear Regression visualization applied on dataset.

SciKits

MultiLinear Regression and Decision Tree analysis applied on dataset.

HTML

Project Website development.

Bootstrap

Project Website responsiveness and design.

Dataset

Dataset used: Yelp Reviewer Dataset

Compiles information on 11 metropolitan areas: Edinburgh (UK), Stuttgart (Germany), Montreal (Canada), Toronto (Canada), Pittsburgh, Charlotte, Champaign-Urbana, Phoenix, Las Vegas, Madison, and Cleveland.

NRC Emotion Lexicon

List of words and their associations with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).

Extra variables

Yelp allows readers of reviews to tag reviews with 3 attributes: “cool”, “useful”, and “funny” These were included to gain additional context as to how the reviews were interpreted.

6,000,000

Yelp Reviews Analyzed

MultiLinear Regression

Decision Tree