How to use rated reviews for sentiment classification

How to use rated reviews for sentiment classification

Sentiment classification is a fascinating use case for machine learning. Regardless of complexity – you need two core components to deliver meaningful results; a machine learning engine and a significant volume of structured data to train that engine.

Last month, we added the new “rating” field for rated review sites covered in the Webz.io threaded discussions data feed. With millions of rated reviews, anyone can access high quality structured datasets that include a natural language string and its respective numerical representation of sentiment classification – the familiar star rating of 1 through 5.

In this blog post, we show you how to collect your own training datasets of rated reviews and use them to train a model classification (we worked with Stanford NLP, but you can use the classification engine that makes sense for your model). For simplicity, any review of 4 stars and above (rating:>4) is assigned a positive sentiment, while 2 and below (rating:<2) is considered negative.

For our demo, we put together five datasets; Two pairs of train/test split 80% / 20% respectively and another test dataset:

  • General domain model training dataset (80% subset)
  • General domain model test dataset (remaining 20% subset)
  • Domain specific training dataset (80% subset)
  • Domain specific test dataset (remaining 20% subset)
  • Domain specific “blind” dataset never introduced during the training to run the final test

Domain specificity can dramatically improve the results of a sentiment classification engine. For example, a reference to “bugs” in a hotel review is very likely negative. However, a discussion of bugs in a software code review won’t necessarily trigger a negative signal to a sentiment classification engine.

All code samples are freely available on our Sentiment Classifier library on Github. Here’s what you’ll need to set it up yourself:

1. Setup

Let’s get the basics taken care of:
Install the Webz.io Python SDK

Install Apache-Maven and Create a project template:

2. Rated Review Dataset Collection

The first component of our code foundation is a python script that uses the Webz-python SDK to collect the rated reviews that will make up our datasets.

The output of this script is a ‘resources’ directory, which will contain the train/test files for our engine.

2.1 Set the project directory via Terminal

2.2 Create the python file which will collect the training/testing data

2.2 Edit the file ‘collect_data.py’ with a Text Editor or an IDE:

2.2.1 First step of the script is to cover our imports (3rd-party modules), so add those imports to the top of the script

2.2.2 Initialize the Webz.io SDK with your private TOKEN

2.2.3 Set the relative location of the train/test files

2.2.4 Build the generic function that will get the necessary data for us from Webz.io, after getting the data the function will create the relevant files inside the ‘resources’ directory.

2.2.4 Build the queries for the relevant data, and create the files.

Add the ‘__main__’ section of the code, in every call for the ‘collect()’ function, we are passing the filename we want the train/test files to be called, the actual query to Webz.io for the specific data, the limit of lines of text we want to proccess and save, the sentiment class (positive/negative) for the current query and the partition of the recieved data between the train and the test file (80%/20% train/test split)

2.3 Finally let’s run the script from the Terminal to collect the data and create the files:

  1. Build the classifier models
    We can now build 2 classifier models with the collected datasets above. For this demonstration we chose the stanford-nlp classifier. In this case our two identified classes were: Positive and Negative, and the respective strings of text.
    The classification project is going to be written in java using maven, so let’s open the project and start working.

3.1 Get into the project directory via Terminal

3.2 Add the project Dependencies (3rd-party packages), by adding the following to the file ‘src/main/pom.xml’ under the ‘<dependencies>’ tag

3.3 Create a  properties file to initiate the classification models.

Let’s create that file for both of our models inside the ‘resources’ directory from stage 2, and save it as review-sentiment.prop.
Copy and paste the following properties and save the file:

3.3 Edit the JAVA code

3.3.1 Imports

3.3.2 Declare the stanford-nlp ‘Column Data Classifier’ class variable inside the ‘App’ class that was generated

3.3.3 Create the ‘getSentimentFromText’ function to retrieve text and a classifier object that returns the sentiment class of the given text

3.3.4 Create the ‘setScore’ function which retrieves a test file and a classifier object and returns the precision, recall and F1-score for both positive and negative classes 

3.3.5 Create the ‘mainfunction which initiates the general and the domain-specific machine and tests their score with the hotels input, and print the results

4. Evaluating Performance and Results

We can evaluate the performance of each model using the F1-Score method (essentially a harmonic average of precision and recall) for positive and negative sentiment classification produced by each model.

4.1. Test the score of each model using expedia.test dataset.
4.2. View the results and evaluate

As expected, the results clearly show that the domain-specific model generated by rated reviews of hotels delivers more precise performance.

General model
80/20 train/test on Booking.com data
 Precision    Recall     F1-Measure  
Positive 0.864 0.843 0.8536585366
Negative 0.852 0.872 0.8620689655
General Model test
on Expedia.com data
Positive 0.878 0.805 0.8400520156
Negative 0.659 0.77 0.7105882353
Domain specific model
80/20 train/test on Booking.com data
Positive 0.878 0.908 0.8926553672
Negative 0.902 0.871 0.8862275449
Domain specific Model test
on Expedia.com data
Positive 0.899 0.892 0.8926553672
Negative 0.825 0.836 0.8307692308

Rated reviews AI machine learning datasets

2.2.4 Build the queries for the relevant data, and create the files.

Add the ‘__main__’ section of the code, in every call for the ‘collect()’ function, we are passing the filename we want the train/test files to be called, the actual query to Webz.io for the specific data, the limit of lines of text we want to proccess and save, the sentiment class (positive/negative) for the current query and the partition of the recieved data between the train and the test file (80%/20% train/test split)

2.3 Finally let’s run the script from the Terminal to collect the data and create the files:

  1. Build the classifier models
    We can now build 2 classifier models with the collected datasets above. For this demonstration we chose the stanford-nlp classifier. In this case our two identified classes were: Positive and Negative, and the respective strings of text.
    The classification project is going to be written in java using maven, so let’s open the project and start working.

3.1 Get into the project directory via Terminal

3.2 Add the project Dependencies (3rd-party packages), by adding the following to the file ‘src/main/pom.xml’ under the ‘<dependencies>’ tag

3.3 Create a  properties file to initiate the classification models.

Let’s create that file for both of our models inside the ‘resources’ directory from stage 2, and save it as review-sentiment.prop.
Copy and paste the following properties and save the file:

3.3 Edit the JAVA code

3.3.1 Imports

3.3.2 Declare the stanford-nlp ‘Column Data Classifier’ class variable inside the ‘App’ class that was generated

3.3.3 Create the ‘getSentimentFromText’ function to retrieve text and a classifier object that returns the sentiment class of the given text

3.3.4 Create the ‘setScore’ function which retrieves a test file and a classifier object and returns the precision, recall and F1-score for both positive and negative classes 

3.3.5 Create the ‘mainfunction which initiates the general and the domain-specific machine and tests their score with the hotels input, and print the results

4. Evaluating Performance and Results

We can evaluate the performance of each model using the F1-Score method (essentially a harmonic average of precision and recall) for positive and negative sentiment classification produced by each model.

4.1. Test the score of each model using expedia.test dataset.
4.2. View the results and evaluate

As expected, the results clearly show that the domain-specific model generated by rated reviews of hotels delivers more precise performance.

General model
80/20 train/test on Booking.com data
 Precision    Recall     F1-Measure  
Positive 0.864 0.843 0.8536585366
Negative 0.852 0.872 0.8620689655
General Model test
on Expedia.com data
Positive 0.878 0.805 0.8400520156
Negative 0.659 0.77 0.7105882353
Domain specific model
80/20 train/test on Booking.com data
Positive 0.878 0.908 0.8926553672
Negative 0.902 0.871 0.8862275449
Domain specific Model test
on Expedia.com data
Positive 0.899 0.892 0.8926553672
Negative 0.825 0.836 0.8307692308

Rated reviews AI machine learning datasets

We put this tutorial together as a high level demonstration of the kind of machine learning models you can train using Webz.io data. You could apply your own models to a wide variety of use cases – business intelligence, cybersecurity, financial analysis, and much more. In fact, we would love to receive feedback from you to learn more about creative use of our data in machine learning models!

Spread the News

Subscribe to our newsletter for more news and updates!

By submitting you agree to Webz.io's Privacy Policy and further marketing communications.

Feed Your Machines the Data They Need

Feed Your Machines the Data They Need

GET STARTED