The Blog

How to use rated reviews for sentiment classification

Posted on February 9, 2017 by

Sentiment classification is a fascinating use case for machine learning. Regardless of complexity – you need two core components to deliver meaningful results; a machine learning engine and a significant volume of structured data to train that engine.

Last month, we added the new “rating” field for rated review sites covered in the threaded discussions data feed. With millions of rated reviews, anyone can access high quality structured datasets that include a natural language string and its respective numerical representation of sentiment classification – the familiar star rating of 1 through 5.

In this blog post, we show you how to collect your own training datasets of rated reviews and use them to train a model classification (we worked with Stanford NLP, but you can use the classification engine that makes sense for your model). For simplicity, any review of 4 stars and above (rating:>4) is assigned a positive sentiment, while 2 and below (rating:<2) is considered negative.

For our demo, we put together five datasets; Two pairs of train/test split 80% / 20% respectively and another test dataset:

  • General domain model training dataset (80% subset)
  • General domain model test dataset (remaining 20% subset)
  • Domain specific training dataset (80% subset)
  • Domain specific test dataset (remaining 20% subset)
  • Domain specific “blind” dataset never introduced during the training to run the final test

Domain specificity can dramatically improve the results of a sentiment classification engine. For example, a reference to “bugs” in a hotel review is very likely negative. However, a discussion of bugs in a software code review won’t necessarily trigger a negative signal to a sentiment classification engine.

All code samples are freely available on our Sentiment Classifier library on Github. Here’s what you’ll need to set it up yourself:

1. Setup

Let’s get the basics taken care of:
Install the Webhose Python SDK

 $ git clone
 $ cd webhose-python
 $ python install

Install Apache-Maven and Create a project template:

 $ mvn archetype:generate -DgroupId=com.webhose.reviewSentiment-DartifactId=review-sentiment -DarchetypeArtifactId=maven-archetype-quickstart-DinteractiveMode=false

2. Rated Review Dataset Collection

The first component of our code foundation is a python script that uses the webhose-python SDK to collect the rated reviews that will make up our datasets.

The output of this script is a ‘resources’ directory, which will contain the train/test files for our engine.

2.1 Set the project directory via Terminal

$ cd PROJECT_LOCATION/review-sentiment

2.2 Create the python file which will collect the training/testing data

$ touch

2.2 Edit the file ‘’ with a Text Editor or an IDE:

2.2.1 First step of the script is to cover our imports (3rd-party modules), so add those imports to the top of the script

from __future__ import division

import os

import re

import time

import webhose

2.2.2 Initialize the webhose SDK with your private TOKEN




2.2.3 Set the relative location of the train/test files

resources_dir = './src/main/resources'

2.2.4 Build the generic function that will get the necessary data for us from, after getting the data the function will create the relevant files inside the ‘resources’ directory.

def collect(filename, query, limit, sentiment, partition):

   lines = set()

   # Collect the data from with the given query up to the given limit

   response =

   while len(response.posts) > 0 and len(lines) < limit:

       # Go over the list of posts returned from the response

       for post in response.posts:

           # Verify that the length of the text is not too short nor too long

           if 1000 > len(post.text) > 50:

               # Extracting the text from the post object and clean it

               text = re.sub(r'(\([^\)]+\)|(stars|rating)\s*:\s*\S+)\s*$', '', post.text.replace('\n', '').replace('\t', ''), 0, re.I)

               # add the post-text to the lines we are going to save in the train/test file



       print 'Getting %s' %

       # Request the next 100 results from

       response = response.get_next()

   # Build the train file (first part of the returned documents)

   with open(os.path.join(resources_dir, filename + '.train'), 'a+') as train_file:

       for line in list(lines)[:int((len(lines))*partition)]:

           train_file.write('%s\t%s\n' % (sentiment, line))

   # Build the test file (rest of the returned documents)

   with open(os.path.join(resources_dir, filename + '.test'), 'a+') as test_file:

       for line in list(lines)[int((len(lines))*partition):]:

           test_file.write('%s\t%s\n' % (sentiment, line))

2.2.4 Build the queries for the relevant data, and create the files.

Add the ‘__main__’ section of the code, in every call for the ‘collect()’ function, we are passing the filename we want the train/test files to be called, the actual query to for the specific data, the limit of lines of text we want to proccess and save, the sentiment class (positive/negative) for the current query and the partition of the recieved data between the train and the test file (80%/20% train/test split)

if __name__ == '__main__':

   # Create the resources directory if not exists

   if not os.path.exists(resources_dir):


   # Get reviews from various sources for training and testing the general classifier, overall of 400 lines,

   # split the lines 80%/20% between the general.train file and the general.test file

   collect('general', 'language:english AND rating:>4 -site:expedia.*', 400, 'positive', 4/5)

   collect('general', 'language:english AND rating:<2 -site:expedia.*', 400, 'negative', 4/5)

   # Get reviews from for training and testing the domain-specific classifier, overall of 400 lines,

   # split the lines 80%/20% between the booking.train file and the booking.test file

   collect('booking', 'language:english AND rating:>4 AND', 400, 'positive', 4/5)

   collect('booking', 'language:english AND rating:<2 AND', 400, 'negative', 4/5)

   # Get reviews from for a later tests, overall of 300 lines all lines will be saved on the expedia.test

   collect('expedia', 'language:english AND rating:>4 AND', 300, 'positive', 0)

   collect('expedia', 'language:english AND rating:<2 AND', 300, 'negative', 0)

2.3 Finally let’s run the script from the Terminal to collect the data and create the files:

$ python PROJECT_LOCATION/review-sentiment/
  1. Build the classifier models
    We can now build 2 classifier models with the collected datasets above. For this demonstration we chose the stanford-nlp classifier. In this case our two identified classes were: Positive and Negative, and the respective strings of text.
    The classification project is going to be written in java using maven, so let’s open the project and start working.

3.1 Get into the project directory via Terminal

$ cd PROJECT_LOCATION/review-sentiment

3.2 Add the project Dependencies (3rd-party packages), by adding the following to the file ‘src/main/pom.xml’ under the ‘<dependencies>’ tag



























3.3 Create a  properties file to initiate the classification models.

Let’s create that file for both of our models inside the ‘resources’ directory from stage 2, and save it as review-sentiment.prop.
Copy and paste the following properties and save the file:


# Features









# Printing




# Mapping





# Optimization








3.3 Edit the JAVA code

Edit ‘src/main/java/com/webhose/reviewSentiment/’

3.3.1 Imports


import edu.stanford.nlp.classify.Classifier;

import edu.stanford.nlp.classify.ColumnDataClassifier;

import edu.stanford.nlp.ling.Datum;

import edu.stanford.nlp.objectbank.ObjectBank;


import java.text.NumberFormat;

3.3.2 Declare the stanford-nlp ‘Column Data Classifier’ class variable inside the ‘App’ class that was generated

 private static ColumnDataClassifier cdc;

3.3.3 Create the ‘getSentimentFromText’ function to retrieve text and a classifier object that returns the sentiment class of the given text

private static String getSentimentFromText(String text, Classifier<String,String> cl)  throws Exception {

   Datum<String, String> d = cdc.makeDatumFromLine("\t" + text);

   return cl.classOf(d);


3.3.4 Create the ‘setScore’ function which retrieves a test file and a classifier object and returns the precision, recall and F1-score for both positive and negative classes 

private static String setScore(String testFileName, Classifier<String,String> cl) {

   String results = "";

   // Calculate the score of 'positive' class

   int tp = 0;

   int fn = 0;

   int fp = 0;

   for (String line : ObjectBank.getLineIterator(Resources.getResource(testFileName).getPath(), "utf-8")) {

       try {

           Datum<String, String> d = cdc.makeDatumFromLine(line);

           String sentiment = getSentimentFromText(line.replace(d.label()+"\t", ""), cl);

           // true-positive

           if (d.label().equals("positive") && sentiment.equals("positive")) {

           // false-positive

           else if (d.label().equals("positive") && sentiment.equals("negative")) {

           // false-negative

           else if (d.label().equals("negative") && sentiment.equals("positive")) {

       } catch (Exception e) {


   NumberFormat percentFormatter = NumberFormat.getPercentInstance();


   double precision = (double)tp/(double)(tp+fp);

   double recall = (double)tp/(double)(tp+fn);

   results += "\nPositive Results:\n";

   results += "Precision: " + percentFormatter.format(precision) + "\n";

   results += "Recall: " + percentFormatter.format(recall) + "\n";

   results += "F1: " + (2*precision*recall)/(precision+recall) + "\n";

   // Calculate the score of 'negative' class

   tp = 0;

   fn = 0;

   fp = 0;

   for (String line : ObjectBank.getLineIterator(Resources.getResource(testFileName).getPath(), "utf-8")) {

       try {

           Datum<String, String> d = cdc.makeDatumFromLine(line);

           String sentiment = getSentimentFromText(line.replace(d.label()+"\t", ""), cl);
           // true-positive

           if (d.label().equals("negative") && sentiment.equals("negative")) {

           // false-positive

           else if (d.label().equals("negative") && sentiment.equals("positive")) {

           // false-negative

           else if (d.label().equals("positive") && sentiment.equals("negative")) {

       } catch (Exception e) {



   precision = (double)tp/(double)(tp+fp);

   recall = (double)tp/(double)(tp+fn);

   results += "\nNegative Results:\n";

   results += "Precision: " + percentFormatter.format(precision) + "\n";

   results += "Recall: " + percentFormatter.format(recall) + "\n";

   results += "F1: " + (2*precision*recall)/(precision+recall) + "\n";

   return results;


3.3.5 Create the ‘mainfunction which initiates the general and the domain-specific machine and tests their score with the hotels input, and print the results

public static void main( String[] args ) throws IOException {

   // Constructing the ColumnDataClassifier Object with the properties file

   cdc = new ColumnDataClassifier(Resources.getResource("review-sentiment.prop").getPath());

   // Declare and Construct the General Classifier with the general train file

   Classifier<String,String> generalCl = cdc.makeClassifier(cdc.readTrainingExamples(Resources.getResource("general.train").getPath()));

   // Declare and Construct the Domain-Specific Classifier with the general train file

   Classifier<String,String> hotelsCl = cdc.makeClassifier(cdc.readTrainingExamples(Resources.getResource("booking.train").getPath()));

   // General Classifier self test (using the 20% data-set from various sources)

   System.out.println("General Classifier stats:");

   System.out.println(setScore("general.test", generalCl));


   // Domain-Specific Classifier self test (using the 20% data-set from

   System.out.println("Domain-Specific Classifier stats:");

   System.out.println(setScore("booking.test", hotelsCl));


   // Compare both of the classifiers with the estranged data-set (using the data from

   System.out.println("Comparison Results:");

   System.out.println("General Classifier score:");

   System.out.println(setScore("expedia.test", generalCl));


   System.out.println("Domain-Specific Classifier score:");

   System.out.println(setScore("expedia.test", hotelsCl));



4. Evaluating Performance and Results

We can evaluate the performance of each model using the F1-Score method (essentially a harmonic average of precision and recall) for positive and negative sentiment classification produced by each model.

4.1. Test the score of each model using expedia.test dataset.
4.2. View the results and evaluate

As expected, the results clearly show that the domain-specific model generated by rated reviews of hotels delivers more precise performance.

General model
80/20 train/test on data
 Precision    Recall     F1-Measure  
Positive 0.864 0.843 0.8536585366
Negative 0.852 0.872 0.8620689655
General Model test
on data
Positive 0.878 0.805 0.8400520156
Negative 0.659 0.77 0.7105882353
Domain specific model
80/20 train/test on data
Positive 0.878 0.908 0.8926553672
Negative 0.902 0.871 0.8862275449
Domain specific Model test
on data
Positive 0.899 0.892 0.8926553672
Negative 0.825 0.836 0.8307692308



Rated reviews AI machine learning datasets


We put this tutorial together as a high level demonstration of the kind of machine learning models you can train using data. You could apply your own models to a wide variety of use cases – business intelligence, cybersecurity, financial analysis, and much more. In fact, we would love to receive feedback from you to learn more about creative use of our data in machine learning models!

Share this:
Share on FacebookTweet about this on TwitterPin on PinterestShare on LinkedInShare on TumblrShare on Google+

This entry was posted in Big Data, Technology. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *