Using Natural Language Processing (AWS Comprehend) to Analyze Text (Cardi B Lyrics)

Humans spend a lot of time reading, analyzing, and responding through text (emails, chats, etc). A lot of this is inefficient or not for pleasure (such as the amount of payroll companies spend to read through feedback emails or the amount of time I spend sifting through Outlook each day). Using Natural Language Processing (NLP), we can reduce the inefficient and not-for-pleasure reading we do so that time can be re-invested into something more productive or fulfilling.

For fun, I scrapily ran the lyrics to Cardi B’s “I Like It” through AWS Comprehend to see what its response would be. I also ran a review of Mission Impossible: Fallout through the same service.  The full output for Cardi B can be viewed here.  The full output for Mission Impossible: Fallout can be viewed here.

While these are low value examples, a more real-world use case for Comprehend could be using AWS Comprehend to detect the language of emails sent to your company to adjust the routing destination in real-time (should they go to you English team or your Spanish?). Another example would be using Comprehend to collect feedback on your new product launch or ad campaign. For example, we could easily capture Twitter mentions for a brand, funnel those into an S3 bucket, and run the contents of that buck to split out negative vs positive vs neutral vs mixed sentiment mentions. From there, we could surface the most frequent adjectives and entities mentioned for each group of each sentiment bucket. It’s a cheap, quick way to customer capture and analyze feedback that would otherwise be ignored.

Initial Setup
In each of the last couple of posts, I’ve outline how to create an IAM user for your project so I won’t repeat that again.  After we have our IAM user created and the credentials added to our /.aws/credentials file, we’ll import the AWS PHP SDK and ComprehendClient.  Next, we’ll create a Comprehend client, define the API version, region, and credentials profile to use:

<?php

require '/home/vendor/autoload.php'; 
use Aws\Comprehend\ComprehendClient;

$client = new Aws\Comprehend\ComprehendClient([
    'version'     => '2017-11-27',
    'region'      => 'us-west-2',
    'profile'     => 'fk-comprehend',
]);
$review="Your Comprehend text";

Exploring the Options
In this example, we’ll detect the language(s), entities (objects, businesses, etc), key phrases, sentiment, and syntax (parts of speech) of our sample texts (Cardi B lyrics and a movie review).  For all of these except DetectDominantLanguage, the language is a required input.  If we use Comprehend to identify that first, then we can simply repeat its output in later functions.  For each output, Comprehend also spits out a confidence score which basically  tells you how confident it is in the output.  This could be used to ignore low-confidence suggestions, thus increasing the accuracy of the models you build using Comprehend.

DetectDominantLanguage Example
This will detect the language and spit out the ISO abbreviation.

//Detecting Dominant Language
$result = $client->detectDominantLanguage([
    "Text" => "$review",
]);

echo "<h1>DetectDominantLanguage</h1><pre>";
print_r($result);
echo "</pre>";

foreach ($result['Languages'] as $phrase) {
    echo "Language ".$phrase['LanguageCode']." has a confidence score of ".round($phrase['Score']*100)."%.<br />";
}

DetectSentiment Example

//Detecting Sentiment
$result = $client->detectSentiment([
    "LanguageCode" => "en",
    "Text" => "$review",
]);

echo "<h1>DetectSentiment</h1><pre>";
print_r($result);
echo "</pre>";

echo "Sentiment: ".$result['Sentiment']."<br />";
echo "Positive: ".round($result['SentimentScore']['Positive']*100)."%<br />";
echo "Negative: ".round($result['SentimentScore']['Negative']*100)."%<br />";
echo "Neutral: ".round($result['SentimentScore']['Neutral']*100)."%<br />";
echo "Mixed: ".round($result['SentimentScore']['Mixed']*100)."%<br />";

DetectKeyPhrases Example

//Detecting KeyPhrases
$result = $client->detectKeyPhrases([
    "LanguageCode" => "en",
    "Text" => "$review",
]);

echo "<h1>DetectKeyPhrases</h1><pre>";
print_r($result);
echo "</pre>";

foreach ($result['KeyPhrases'] as $phrase) {
    echo "Phrase ".$phrase['Text']." has a score of ".round($phrase['Score']*100)."%.<br />";
}

DetectSyntax Example

//Detecting Syntax
$result = $client->detectSyntax([
    "LanguageCode" => "en",
    "Text" => "$review",
]);

echo "<h1>DetectSyntax</h1><pre>";
print_r($result);
echo "</pre>";

foreach ($result['SyntaxTokens'] as $syntax) {
    echo "Phrase ".$syntax['Text']." is as ".$syntax['PartOfSpeech']['Tag']." (with ".round($syntax['PartOfSpeech']['Score']*100)."% confidence).<br />";
}

DetectEntities Example

//Detecting Entities
$result = $client->detectEntities([
    "LanguageCode" => "en",
    "Text" => "$review",
]);

echo "<h1>DetectEntities</h1><pre>";
print_r($result);
echo "</pre>";

foreach ($result['Entities'] as $syntax) {
    echo "Phrase ".$syntax['Text']." is as ".$syntax['Type']." (".round($syntax['Score']*100)."% confidence).<br />";
}

The Results for Cardi B and Tom Cruise

The full output for Cardi B can be viewed here.  This one is the most interesting of the two as “I like it” has a Spanish verse.  You can see how Comprehend dealt with it when it was passed as English.  It also does a good job of determining when “bitch” is a noun vs an adjective except in the line “Where’s my pen? Bitch I’m signin'” — I’m unsure as to why.

The full output for Mission Impossible: Fallout can be viewed here.  The interesting piece here is the sentiment analysis: NEUTRAL (8% positive, 36% negative, 39% neutral, and 17% mixed).  After reading the review, I would say this is pretty in-line with the reviewer and Comprehend did a good job of identifying the overall sentiment of the article.

Leave a Reply