Google have released SyntaxNet to the public, and as an analyst let me say this: this is the biggest news we’ve heard in a long time. Here’s why.
SyntaxNet is a library of tools that can help machines understand language, a process called Natural Language Understanding (NLU). To understand language, a machine needs to be able to:
- break down a sentence in its components (words, mostly),
- analyse all components,
- see how they relate to each other, and
- understand the role of each component in the sentence.
This process is called parsing. Parsing is second-nature to us: we know that in the sentence “Ben is drinking wine”, Ben is the subject, the action is drinking, and the object here is wine. Simple, right? Not so simple for machines, which find this task very complex. There are a few tools out there that are built to run this process, and these are called parsers.
Alongside SyntaxNet, Google have open-sourced a parser for the English language, called Parsey McParseface. (Google and its sense of humour.)
This is a significant step towards language understanding, but it’s not very useful in itself. Sure, it can call out nouns and verbs in sentences, and it can tell which words are connected to others within the same sentence, but that’s not where the value of Parsey is. Parsey’s full potential comes out when used with other technologies, or to solve other problems in computational linguistics – problems like sentiment analysis.
Being able to parse a sentence correctly is instrumental to accurate sentiment analysis. Parsey, in particular, can now help solve one of sentiment analysis’ biggest problems: comparative opinions.
Sentiment Analysis and Comparative Opinions
Comparative opinions are essentially comparisons, where one topic is compared to one or more topics. In the case of “A works better than B”, A and B are the topics, also known as “targets”, while better is the sentiment shifter, as it changes the sentiment that the verb (“works”) holds.
Sentiment analysis hasn’t been great at identifying the actual sentiment behind these comparisons. Take the following sentence:
“Coke is better than Pepsi.”
A sentiment analysis of that sentence will return it as positive, mainly thanks to the positive sentiment shifter (better). If I were the social analyst for Coke, that sentence would totally work in my favour. That would be the opposite, however, if I were the social analyst for Pepsi. The sentiment changes depending on which topic (target) we’re shifting our focus on: Coke is being talked about positively here, but Pepsi is being talked about negatively.
Another example is multiple comparisons in one sentence, a condition that often sends most social listening tools in a frenzy. Take this sentence for example:
“Coke is better than Pepsi, but they’re both worse than Dr Pepper.”
The sentiment for this sentence as a whole isn’t easily identifiable. The first clause (“Coke is better than Pepsi”) is positive, as we’ve already established; however, the second clause (“but they’re both worse than Dr Pepper“) is negative. As a result, most social listening tools will just tag the whole sentence as “neutral”. Technically, neutral means that a sentence doesn’t express any sentiment. However, a lot of social listening tools also use neutral for anything that their algorithm cannot easily categorise: the tool doesn’t know which side to pick (positive vs. negative), so it just throws the sentence in the “neutral” bucket.
This parser changes a lot of things: first of all, it can identify all the topics, verbs and sentiment shifters in the sentence. Then, the parser can help tell which direction sentiment is shifting, where from and where to. So, in the last sentence, the parser does the following:
- it identifies the two verbs in the sentence (“is” and “are”)
- it then splits the sentence into two clauses: the main clause “Coke is better than Pepsi”, and the subordinate “but they’re both worse than Dr Pepper”.
- FYI it’s called a subordinate as it only makes sense in the presence of the main clause it depends on – “but they’re both worse than Dr Pepper” doesn’t make sense on its own – but add “Coke is better than Pepsi” and it all makes perfect sense.
- it identifies the two sentiment shifters as adverbs (“better” and “worse”), and it attaches them to their respective verbs, forming two couples (binomials): (is – better) and (are – worse)
- it picks up on the topics of this sentence (“I”, “Coke”, “Pepsi”, “Dr Pepper”)
- it identifies that the adverb “both” groups “Coke” and “Pepsi” together: (Coke – Pepsi)
- it then determines the direction of the sentiment shifters:
- the binomial (is – better) shifts the sentiment from Coke to Pepsi,
- the binomial (are – worse) shifts the sentiment from (Coke – Pepsi) to Dr Pepper.
The parser does a lot more than that, but when it comes to sentiment analysis, these are the most relevant steps.
Now, it’s important to note that this parser isn’t a sentiment analysis tool. It parses clauses, thus breaking down sentences, paragraphs and whole pages in their various components – topics, verbs, connectives, etc. As such, the parsing tool doesn’t tell us that sentiment is shifting from one topic to another, but it does call these shifters out and the topics they relate to.
The Flaws of Parsey McParseface
This English parser is the most accurate in the world, even beating Google’s previous algorithms, and that’s great. However, at 94% accuracy, it is not perfect. While there’s space for improvement, it probably will never get to 100%, mostly because language is complex. We’re not immune to this complexity either. Take, for example, this sentence:
“I saw a guy on a hill with a telescope.”
A simple sentence, right? Not so much when you start analysing its alternate meanings:
- There’s a guy on a hill, and I’m watching him with a telescope
- I’m seeing a guy, who is on a hill, and he has a telescope
- There’s a guy, he’s on a hill, and this hill has a telescope on it
- I’m on a hill, and I saw guy using a telescope
Or perhaps we could get dark and interpret that sentence as “There’s a guy on a hill, and I’m sawing him with a telescope”. (Yeah, I know.)
There are so many sentences with ambiguous meanings, sentences that humans could interpret differently. As we cannot unanimously agree on these interpretations, it’s rational to have the same expectations of tools like Parsey.
Now, on to the big question: what does this mean for sentiment analysis, particularly in the context of social listening? And why is this the biggest news I’ve heard in a long time?
SyntaxNet and Social Listening
While SyntaxNet is a great addition to Natural Language Processing (NLP) technology, it shines brightest when used with other tools and algorithms, making them even stronger and more reliable. There are so many reasons why this excites me.
Here are some of them:
- Less neutral: Parsey can help social listening tools provide more accurate sentiment analysis, reducing the number of times mentions are unnecessarily tagged as neutral when they really aren’t. A tool can pass complex sentences to Parsey, which can then do its magic and break the sentences down in manageable clauses; these are then passed back to the tool for sentiment analysis on each clause. You’ll then end up with a sentiment for each clause, as well as an overall sentiment score for the whole sentence, paragraph or document, depending on how advanced the social listening tool is.
- Competitor analysis: a lot of social listening tools claim excellence in competitor analysis, yet a lot of them either aren’t aware of the issues around comparative opinions (as we’ve discussed) or perhaps they don’t quite know how to fix it. Parsey can help here, and while not 100% accurate, it can at least give you a much better breakdown of how people talk about you vs. your competitors. This will give you a better breakdown than what you’re used to with your current social listening tool, I’m sure.
-
Aspects and features: social listening tools can start playing with aspect-based sentiment analysis, i.e. determining the sentiment expressed on different aspects of entities. So, if someone wrote a blog post reviewing your latest product, imagine if your social listening could break down the product features listed in the post, and quickly tell you the sentiment towards each aspect? Just knowing that your brand or your product is mentioned positively or negatively without knowing what aspects make people jump to those conclusions isn’t good enough – if anything, you only have half the picture.
I hope this piques the interests of several social listening vendors to provide better text analysis and sentiment analysis.
With all these great innovations coming out this year, I can confidently say that there’s never been a better time to be a social analyst.
But I’ll write on that some other day.
(You can read Google’s announcement here, and more on sentiment analysis here.)
Pingback: Text Feature Extraction in Frequency Matrix | BigSnarf blog()