When doing social listening, how do you track mentions of products with similar names? Or better, how do you do social listening for two products when one’s name is just a variant of the other? Products like the iPhone 6S and iPhone 6S Plus, or the Microsoft Surface and Microsoft Surface Book, or the Lumia 950 and Lumia 950 XL…
If you’re looking for mentions of both products together, then you won’t have any issues with a query like this:
“iphone 6S” OR “iphone 6S plus”
You won’t have any issues looking for mentions of the variant either
“iPhone 6S plus”
What is tricky, however, is searching for mentions of the product whose name is included in the variant name. In other words, searching for mentions of “iPhone 6S Plus” is much easier than searching for mentions of “iPhone 6S” , as searching for the latter will return mentions for both products.
In other words,
“iPhone 6S” = (“iphone 6S” OR “iphone 6S plus”)
To make it easier for this post, we’ll refer to the shorter term as “Variant A”, and the longer variant (which contains Variant A) “Variant A B”. So, “iPhone 6S” is Variant A, while “iPhone 6S Plus” is Variant A B (as it contains Variant A and more).
So, what if you only want to look at mentions about Variant A without bringing up mentions of “Variant A B”?
Real Case Scenario: Microsoft Surface
I recently wanted to see mentions of the new Surface Pro 4 during its launch event on 6th October. While I could’ve just used this query…
"surface pro 4"
…I knew that it would only limit my search. As I soon found out, only a small percentage of people used the full name of the new Surface (“surface pro 4”) throughout the event.
During the same event Microsoft released another product called Surface Book. Due to that, doing a search for “Surface” would return mentions of both the Surface (tablet) as well as the Surface Book (laptop). Two different products with similar names, the first one being “Variant A” and the second being “Variant A B”.
The problem here is that I don’t want to see mentions of the Surface Book unless it’ s for a good reason, like people comparing the Surface to the Surface Book: while the Surface Book is being mentioned, the overall mention is about the Surface. So, how do we approach the query?
Here’s where the NEAR operator comes to the rescue. The query would then be:
(surface -"surface book")
OR (surface NEAR/15f "surface book”)
OR (“surface book” NEAR/15f surface)
What we’re saying with this query is essentially:
“I want to find all mentions of Surface as long as the Surface Book isn’t mentioned, or mentions of the Surface as long as it’s mentioned close to the Surface Book.”
Why “NEAR” and not “AND surface book"?
Could we use the following query instead?
(surface -“surface book”) OR (surface AND “surface book”)
This query tells the tool:
“I want to find all mentions of Surface as long as the Surface Book isn’t mentioned, or mentions of the Surface as long as it’s mentioned with Surface Book in the same page.”
That wouldn’t work for two reasons: relevance and Boolean interpretation.
First of all, relevance. As mentioned in my previous post, if we’re looking for mentions of the Surface and Surface Book together in a relevant context, then the NEAR operator is a much better alternative to AND. We don’t just want to see mentions of the two keywords mentioned in the same page, we’re only interested in the two keywords mentioned together in a relevant context (e.g. used together in comparison).
Secondly, Boolean interpretation is a key factor, i.e. how the query is interpreted through Boolean logic. When doing a search for "surface AND book", we’re telling a tool to do the following:
- scan a page: does it contain the keyword surface? If so, continue;
- scan the same page: does it also contain the keyword book? If so, return as a valid mention.
However, when doing a search for "surface book" AND surface (which is the same as surface AND “surface book”), we’re telling the tool to do this:
- scan a page: does it contain the "surface book"? If so, continue;
- scan the same page: does it also contain the keyword surface? If so return as a valid mention.
Due to that logic, a page that mentions “surface book” also mentions “surface” by default, as the word “surface” is contained in “surface book”.
All of that is different from “surface NEAR/xf “surface book””, where x is any number higher than or equal to 0, and where f tells the tool to look for the presence of that keyword following the first one. Thus, this part of our original query…
(surface NEAR/15f "surface book”)
OR (“surface book” NEAR/15f surface)
tells the social listening tool to do the following:
- scan a page: does it contain the keyword surface? if so, continue;
- scan x words before and after surface - is surface book present in either direction? If so, return as a valid mention.
Why the direction parameter?
In other words, why use the following:
(surface NEAR/15f "surface book") OR ("surface book" NEAR/15f surface)
when we could use
(surface NEAR/15 “surface book”)
Because the latter, while shorter than the former, has a very similar logic to what we previously discussed about the AND operator.
Try it: search for…
surface NEAR/x “surface book”
…and choose any number to replace the x. It can be 0, 1, 5, 10, 20, 50 - you’ll get the same number of results.
(surface NEAR/15 “surface book”)
returns 93,519 mentions for the past 8 days. I can swap the 15 with any other number, and I’ll still get 93,519. I’ll also get exactly the same results using the following:
(surface AND “surface book”)
We only see a difference and a considerably lower number of results when we add a direction parameter (f):
(surface NEAR/nf "surface book") OR ("surface book" NEAR/nf surface)
If I choose 15 as my distance parameter (n=15), I get 22,692 mentions. Compare that to our initial 93,519 and you’ll see that only 24% of those mentions are the ones we really need.
How Does This Affect Me?
This post focused on the Microsoft Surface, but you can apply the same logic to other products with “Variant A” and “Variant A B” naming. While this might seem as a niche search, there are so many other products out there that use a similar naming convention, where one product name is just a variant of another.
Back to the Surface example, I had a feeling that just searching for “surface pro 4” would limit the number of mentions, and I was right: only 15% of people who talked about the new Surface called it the “Surface Pro 4”, and most of them weren’t actually “people” but just press articles that had to refer to the product with its full name. Using the NEAR operator I was able to find all the mentions I needed.
This example shows the importance of having a social listening tool that can help you when it comes to complex queries like that, to make sure that you don’t have to sift through unnecessary mentions, and to make sure that you can create queries that are as targeted and accurate as possible. After all, the quality of the mentions you receive is just as good as the quality of the query you create.
Disclaimer: I actually ended up with a longer query than that. As you can imagine, a search for just "surface" and no context around it would bring up mentions about all kinds of surfaces (instead of only Microsoft's device). To fix that I contextualised the query with (microsoft OR windows10devices), the latter being the official hashtag of the event.