Twitter APIs vs Twitter Firehose: Why The Difference Matters

If you’re looking to listen to the chatter on social media, specifically Twitter, you need to have a tool that can adequately do so based on your needs. Thankfully there are lots of tools out there that can help you with social listening on Twitter. However, while a lot of these tools (free and premium) will say that “they have access to Twitter”, it’s worth enquiring about the type of access they have. This can dictate not only the accuracy of your search results, but also how many mentions you’ll get, as well as their recency (i.e. real-time mentions or past mentions).

There are currently three ways to access Twitter:

  • Search API
  • Streaming API
  • Twitter Firehose

While you may have already heard of “Twitter Firehose”, it’s worth explaining what the difference is between these three types of access, and how they can impact you.

However, before we begin, it’s worth clarifying a very important detail: what’s an API?

What is an API?

API stands for Application Programming Interface, and it's how an application can be used by other applications.

Think of the API as the bridge between Application A and Application B - this bridge contains the various rules in which Application B can communicate with Application A.

In our case, Twitter APIs state how other applications and devices can communicate with Twitter.

Twitter has several APIs, all for various functions - searching, posting, etc. So, for instance, if you use a 3rd party Twitter application like Hootsuite to post to Twitter, it’ll use the POST APIs to communicate to Twitter and tell it to post your tweet.

When it comes to searching on Twitter, a tool like Hootsuite has three ways of accessing the platform, starting with the most commonly used way: the Twitter Search API.

Twitter Search API

The first API that tools can use to search on Twitter is the Search API. This lets you search through past tweets. You can get a feel of how this works when you run a search on keywords, usernames and places are the main things you can search for through this API.

When you query Twitter via the Search API, the maximum number of search results Twitter returns to you is 3,200 (with a limit of 180 searches every 15 minutes).

Due to these restrictions, you are limited in the volume of matching tweets that Twitter will return to you.

Twitter Streaming API

Another API that Twitter offers is the Streaming API. Unlike the Search API, the Streaming API gives you tweets in (near) real-time as long as they match your search query (e.g. keywords, usernames, places).

However, like most APIs, there’s a limitation here - out of all the tweets that match your search query in real-time, Twitter only returns a small percentage of them (source).

How many tweets you get depends on various factors, such as demand on Twitter, something to look out for during periods of high traffic (e.g. during the World Cup). However, the main factor is how specific or broad your search query is. To illustrate:

You decide to search for all tweets that contain the words “social media”:

  • If the tweets that contain “social media” make up less than 1% of all Tweets currently being posted on Twitter, you’ll receive all the mentions that match your query;
  • If a lot of people are talking about social media, raising the volume above 1% of all Tweets currently being posted, you’ll only get a sample stream of the tweets.

One way to overcome this limitation is by refining your query: if you combine “social media” with other keywords like “marketing” and “analytics” (or other parameters like location), you’ll increase the chance of retrieving all (if not most) matching tweets. So, as a rule of thumb, the more keywords you combine, the less the chance that more than 1% of all current Tweets match your query.

Why the limitations? It ultimately comes down to Twitter’s current infrastructure - there needs to be something bigger that can support all of the tweets and all the queries that may match them, in (near) real-time. The answer to that is the Twitter Firehose.

Twitter Firehose

Just like the Twitter Streaming API, the Twitter Firehose gives you all the tweets in (near) real-time. However, unlike the Streaming API, there are no caps or limitations on the number of search results you can have - you’re guaranteed to have all the tweets that match your search queries.

While this is a great advantage, it doesn’t come for free. The Firehose is very costly, especially for individual users. Hence, the best option is to have access to a tool that has full access to the Twitter firehose.

At the moment, only a handful of Certified Product Partners have access to the Twitter Firehose, and the vast majority of them only offer 1-2 years of Twitter data (mainly due to the costs involved).

Other tools only have partial access to the Firehose, like BuzzFinder, a social analytics tool that only draws from the Japanese Firehose due to its customer base.

A special mention goes to Brandwatch, the only tool in the industry that currently provides access to the full historical Twitter data since Twitter’s inception in 2006. This is thanks to Brandwatch’s latest feature, Hindsight (you can learn more about it here). This is definitely (and highly) recommended if you’re looking to uncover trends, do in-depth analysis, track performance over time, and much more.

brandwatch hindsight

Does It All Really Matter?

Now that we’ve covered the essential differences between the three types of access to Twitter, you’ll see why it matters to check which type your tool makes use of.

If you’re looking for a sample of tweets for a social listening report, or if your’e looking to analyse past trends, then accessing Twitter via the Search API may be enough for you.

If you’re looking to implement a data visualisation monitor (e.g. through Tweetdeck, or an in-house built tool) to look at specific mentions in real-time, then the Streaming API is great for this.

However, if you’re looking to implement real-time social monitoring, whether it’s for an ad-hoc situation (e.g. an event you’re following or a brand crisis you’re going through), or a permanent “digital strategy”, then the Twitter Firehose is your best option.

Don’t just rely on tool providers and account managers to tell you that they have access to Twitter - make sure you check what you have access to, so you’re aware of what you’re getting in return.

In a Nutshell

To recap:

  • Search API searches in the past (limited in search results - 3,200 at most);
  • Streaming API searches in (near) real-time (number of results is sampled);
  • Twitter Firehose tracks all tweets, past and present (unlimited search results, with costs involved).

Brandwatch and the Power of Hindsight

So Long Clutter, So Long Numbers