On Word Clouds

Word clouds are graphs that visualise the distribution of words in a body of text. In other words, if you have some text, they show you which words appear more frequently, or which words are more important, or which words are more relevant. 

Typically a word cloud shows you the frequency of words: the more often a word is mentioned, the bigger it is in the cloud. Frequency isn’t the only parameters; in fact, you can also use: 

  • the colour of words, e.g. words used in a positive connotation can be coloured in green, while words used in a negative connotation can be coloured in red;
  • the intensity of the colour, e.g. a light shade of green can mean that a word has a lower positive sentiment, while a thick shade of red can mean that a word has a stronger negative sentiment; 
  • the place of a word on an axis, the axis determining time or any other gradable parameter, e.g. recency of word usage from left to the right on the x axis.

There are so many ways to show a word cloud and so many ways to make it useful. However, word clouds do have their pitfalls.

The usefulness of word clouds usually depends on (1) the tool that generates the graph, (2) the final graph (your end product), and (3) the audience the graph is for (i.e. those who are going to read the graph).

Oftentimes, people put more importance on the aesthetics of a word cloud at the expense of its usefulness - a pretty word cloud can be the worst thing for you to show in a presentation, and not every representation of data is accurate or intelligible or useful. 

These are some of the most common pitfalls when it comes to word clouds:

  • No context: have you ever looked at a word cloud and thought to yourself, “what does it all mean?”. If so, then the graph most likely didn’t have any context attached to it. What is this word cloud telling you? Maybe it’s a word cloud from a Facebook thread - in which case, what are the words telling you? Are these the most frequently used words or the words that held the most weight in a conversation? Does this graph show you a static view of words as they are, or does it show you a progression of how these words have been used, from fading themes to trending themes?
  • Not easy to read: some tools cram way too many parameters to create poorly designed word clouds - the shape of the words, their colours (do they denote something, or have they been chosen at random?), or other visual cues might not be obvious. Other tools squash words together or put words too far apart (with no clear reason justifying the gaps), or they use a font size too small or a font type that is just not adequate (I mean, do you really need a word cloud in Papyrus?). Sadly, I’ve seen way too many word cloud generators use designs that pay no respect whatsoever to accessibility. 
  • Not representative of the data: just because you’ve built a word cloud from text data, doesn’t mean that you’ve actually represented your data sample. A lot of word clouds suffer from what we call overfitting in statistics. Overfitting happens when you try to represent data so accurately without paying attention to the bigger picture, thus making your visualisation useless if you’re trying to infer meaning. How many word clouds have you seen where the biggest words were conjunctions (and, but, if), articles (the, a, an), or prepositions (to, in, on, after, before)? I’m sure those were some of the most mentioned words, but that’s only because those words are the common building blocks in the English language - so if the word cloud’s purpose is to “visualise the most frequently mentioned words”, a word cloud like that wouldn’t necessarily be wrong, but it would definitely be useless and not fit for purpose. When you follow the formula to the letter so much so that it cannot be applied for pretty much anything, you have a clear case of overfitting. 

For a word cloud to work it needs to pay attention to (1) linguistics, (2) social context, and (3) readability.

Let’s go through each one, and to make it practical let’s use a real-life example: the recent snowstorm that hit the UK.


Unless you’re trying to see which grammatical parts are most present in your body of text, seeing conjunctions and prepositions is completely useless in a word cloud.

For instance, it’s absolutely useless for me to know that the most used words in this tweet…


…are in (3 occurrences), the and if (2 occurrences each) - a preposition, an article, and a conjunction.

A good word cloud should take into account things like linguistics (particularly semantics), and by default remove the basic blocks of a sentence that, on their own, do not infer meaning.

If I’m the social community manager of a train company, it’s more useful for me to know that in the replies under our recent post about train disruptions due to the bad weather…

…the most mentioned words were “refund” and “compensation”, not “but” or “not”.

Linguistics should play a bigger role than it currently plays in the creation of word clouds, for another reason: a lot of tools only stop at visualising individual words. As a result, you have a word cloud spitting a bunch of commonly mentioned words, and that can be useful to a point. Our communication is bigger than just words. We use emoji, we use phrasal verbs (e.g. “get away with”, “looking forward to”, “put up with”), we use various types of phrases, we use hashtags as an amalgamation of words, and we often use words to call out people, places, organisations, and more. If your word cloud generator doesn’t keep that linguistic context in mind, it’s of little to no use to you.

Social context

Word cloud generators should bear in mind certain caveats that we use when we talk to each other online. As mentioned earlier, word clouds should be able to interpret semantics. Take the following phrases:

  • I have a white house
  • We visited the White House
  • She’s a reporter working for the White House
  • We’ve asked the @WhiteHouse for clarification

Two words (“white” and “house”) that have different social inferences in all four sentences:

  • In the first one we just have two common words, “white” and “house”, adjective and noun, nothing special here;
  • In the second sentence “the White House” is a location. A tool generator can infer that from the verb that precedes it - ”visited”;
  • In the third sentence, “the White House” is used as an organisation. A tool generator can infer that from the phrasal verb that precedes it - “working for”;
  • In the fourth sentence, @WhiteHouse is an organisational account. A lot of social platforms and social management/listening tools do differentiate between individual accounts (people’s personal accounts) and organisational accounts (company and brand accounts, as well as anything that isn’t an individual account, so you’ll often see bots in this category too).

If you a find a word cloud generator that is clever enough to understand that “White House” has different connotations in those four sentences, you can be miles ahead of most tools out there. A tool like that should also let you build word clouds based on these different social cues - say you want a word cloud of all the most mentioned places in a body of text, or perhaps the most mentioned organisations, or maybe only the most mentioned people. 


A word cloud should be readable to the one who creates it and to its target audience. Sometimes, the creator and the target audience are one and the same. A decision maker should be able to log into whatever social intelligence tool you’re using and create a word cloud of their own, and create an intelligible word cloud. While that’s not always possible with a lot of social analytics tools, things are changing. Take Brandwatch for example. They’ve just announced two new features, called Explore + Entities

Explore gives everyone a window into what’s going on in the data, helping you run quick diagnostics. You can track things like brand health (is there an increasing negative association with my brand? Is there a social crisis brewing?)...

Brandwatch Entities

 ...influencer diagnostics (what topics are the movers and shakers of this conversation? What’s being discussed among influencers?) and content strategy key stats (what do the popular conversations look like on this platform? What are the common brand associations?).

Brandwatch Explore and Brand Associations

These diagnostic tools let anyone - and I mean anyone - log in and get a feel of what’s going on, without having to depend on the social media analyst to answer, “what’s going on?”. Everyone should be equipped with a tool to help answer that question, and this feature does just that.


Entities helps elevate word clouds from “a group of the most mentioned words in one place” to a more contextual visualisation. You can see the most mentioned locations, or the most mentioned organisations, or the most mentioned people, hashtags, emoji - you name it. You can also segment them by a number of parameters, including gender, sentiment, or even trending level, super useful if you’re doing content strategy exercises - which topics are hot and which ones are not? Which topics are the most talked about, and which topics are fading? Again, it’s not just about measuring the number of occurrences, but also the bigger picture - maybe the trend is decreasing as a new trend is appearing, so while the number of the latter topic might not be as big as the first one, it’s one to monitor. 

Brandwatch Entities x Storm Emma


Tools like these are essential for analysts and marketers alike, to see how the influence and relevance of topics ebb and flow over time. 

I can talk for days about the importance of these features, and the importance of having a word cloud that makes sense, but you’ll have to see it for yourself to see how you can put that into practice in your own scenarios. Book a demo with Brandwatch to see these features in action, and tell them Ben sent you.

PS for more info on Brandwatch Explore + Entities, click here.

MarTech List 200