Proximity Operators (and Why You Need Them)

There are so many features to look out for when shopping for social listening tools, but one thing that should definitely be on your list of requirements is this: proximity operators. Here’s what they are and why you need them.

What are proximity operators?

If you’ve ever had to build a social listening query then you may be familiar with the AND, OR, and NOT Boolean operators (technically the parentheses and quotes are operators too, but they don’t get as much recognition).

There’s another Boolean operator: NEAR, the proximity operator. This specifies that two words must be within a certain distance of each other. It’s useful for when you need to contextualise your search, or if you’re looking for two keywords/topics that may sometimes appear in the same page but not in relation to each other.

For example, there have been over 33,400 Tweets about the Microsoft Surface so far this month, yet only 22% of them have the words “Microsoft” and “Surface” right next to each other. In 78% of those 33.4k mentions, there’s at least one word (or letter) separating “Microsoft” from “Surface”:

42% of people who mention the Surface also mention Microsoft nearby, almost to clarify which tablet is being talked about (this is not unusual for products that are named after a common name):

Apple’s iPad Pro proves Microsoft was always right about the Surface

— The Next Web ()

Microsoft may reveal their Surface Pro 4 at its special event on October 6.

— GameSpot ()

See prepare for crunch time in this new commercial for the .

— Los Angeles Rams ()

35% of all Tweets call the tablet “Microsoft’s Surface”, thus separating the two keywords with the possessive “’s”:

All eyes on Microsoft's Surface Pro 4 event. Apple admitted today that the Surface idea is a good one

— Tom Warren ()

Not everyone is going to mention a product, a brand or a topic the same way that you would. Merely searching for “Microsoft Surface” would’ve surfaced not even a third of all the mentions you need. That’s when the proximity operator comes in handy: with it you can just tell your tool that you’re looking for the words “Microsoft” and “Surface” as long as they’re close by, even if they’re not adjacent:

microsoft NEAR surface

How do proximity operators work?

The NEAR operator is normally made up of two components:

distance parameter, usually a number (≥0) (e.g. keyword1 NEAR/5 keyword2); the number dictates the proximity between words, i.e. how many words separate keyword1 from keyword2;
direction parameter, usually just a letter (e.g. keyword1 NEAR/5f keyword2) or a predefined set of words (e.g. keyword1 FOLLOWED BY keyword2); this dictates the direction where the proximity is sought, i.e. to the left or to the right of keyword1.

For instance:

juice NEAR/5 orange returns any mentions that contain “juice” and “orange” within 5 words of each other;
juice NEAR/5f apple (we can interpret “f” as “following” or “forward”) returns any mentions where “apple” appears after “juice”, with 5 or fewer terms in between.

This isn’t limited to social listening tools: this can be done in most tools that let you search through text using Boolean logic. While the format of proximity operators may differ from tool to tool, its presence is crucial for contextual searches, so much that numerous databases, utilities, and web search engines let you use proximity operators. For instance:

you can use the format keyword1 near:n keyword2 in Bing, e.g. “apple near:5 “operating system”” will only find search results where Apple and “operating system” are within 5 words of each other;
Yandex omits the word “NEAR” altogether but you can define the distance parameter using the format keyword1 /n keyword2, e.g. “apple /5 “operating system””.

If either parameter is missing from the operator, then the parameter is irrelevant. So, if I write juice NEAR/5 orange I don’t care if “orange” is found within 5 words before “juice” or after it. While the same applies to the distance parameter (e.g. juice NEAR orange), it’s quite rare to find tools that don’t require a distance parameter nowadays.

Every search/monitoring/listening tool is different, and as such the format of the proximity operator will differ from tool to tool. Check with your vendor on how to make the most of this operator and how best to format it for your searches.

How is NEAR better than AND?

NEAR is a lot better than AND for so many reasons, but here’s the main one: AND looks out for two keywords in the same page, while NEAR makes sure that those keywords are close by for relevance and context.

NEAR helps you weed out most (if not all) irrelevant mentions, making sure you’re putting the right context to words that may otherwise be too ambiguous or irrelevant together.

NEAR is incredibly valuable when looking out for mentions outside of Twitter and Facebook. When using “AND” you may find a lot of press releases, blog posts or forum posts where two words just happen to be mentioned in the same page, without being related.

For example, if you search for microsoft AND iPhone you’ll find this article from The Verge for one reason alone: while the article talks about Microsoft, you’ll find the “more from The Verge” section right at the end of the article where one link just happens to contain the word “iPhone” (first link from bottom).

The iPhone clearly has nothing to do with the article, but you’ll still end up with this as a search result for your query – after all both Microsoft and iPhone are mentioned in the same page.

Meanwhile, the query microsoft NEAR/5 iPhone will give you a link like this one, where the article talks about Microsoft and the iPhone. The two keywords are related due to proximity, making NEAR a much better option than AND.

On Distance Parameters

How close should two words be to be relevant? In other words, what number should you put after NEAR/?

I usually go for the following rule of thumb:

0: to find keywords right next to each other;
between 3-5: to find keywords in the same phrase;
10, 15: to find keywords in the same sentence;
30, 40, 50: to find keywords in the same paragraph.

“Isn’t NEAR/0 the same as using the quotes operator?”

Writing keyword1 NEAR/0 keyword2 is not the same as writing as “keyword1 keyword2, although the results may overlap.

This is for two reasons: first of all, while two keywords under quotes will be found in that same order (e.g. “keyword1 keyword2”), using the proximity operator and 0 as the distance parameter will return any occurrences of those words whether one directly follows or precedes the other. So,

keyword1 NEAR/0 keyword2 = 
("keyword1 keyword2") 
OR ("keyword2 keyword1")

Secondly, NEAR/0 can often make your query more manageable, especially if you’re dealing with repetitive keywords or phrases. For example, if we’re looking for mentions of the main Microsoft Office applications, instead of writing the following

"microsoft word" 
OR "microsoft excel" 
OR "microsoft access" 
OR "microsoft powerpoint" 
OR "microsoft outlook"...

…you can start with the constant (the main word that keeps repeating, “microsoft”) and put the variables (the only words that change, e.g. “excel”, “access”, “word”) linked together by the OR operator, within parentheses:

microsoft NEAR/0 
(word OR excel OR access 
OR powerpoint OR outlook)

(cue the distributive property.)

This is just one example of how the NEAR operator makes queries not only more accurate but more manageable too, essential if you’re under a character limit constraints for building queries.

Proximity searches are invaluable when doing social listening. Should your social listening tool have a way to let you do this? Yes, and if it doesn’t already I’d ask your vendor if this is something they’re planning to do anytime soon.

Using ‘NOT’ and Brandwatch Lists to Filter Data

Social Listening for Products with Similar Names

Chris McCormick

another nice article ben, one thing i’d add to the final comparison to using quotes is that you can use wildcards with the near operator where you can’t with quotes… eg: (microsoft NEAR/3 tech*)
Also notice you’re using disqus down here so we’ll be bringing in this comment as of this morning’s disqus data integration! this mention will be coming to a dashboard near you in three… two… one…
Pingback: Social Listening for Products with Similar Names()

brnrd.me

Everything Digital

Recent Posts

Categories

Archives