We’ve just wrapped up another eventful season of the annual phenomenon that is the Indian Premier League. Records were broken, trophies and caps won, and careers and fortunes made in a span of 6 high-octane weeks. Fans, being the lifeblood of any successful sporting event, were given extra-special attention this time around. The millions of tweets that social-media savvy fans across India tweeted during the tournament formed a veritable growing corpus of social content that was used through the IPL to engage fans as well as gauge sentiment.
A host of interesting analysis and visualizations were used on the web on the official IPL Facebook and Twitter pages and website, as well as broadcast on TV to feed the frenzy. However, there were a few glaring and some subtle errors in data collection as well as visualization and I had only begun to surmise the confusion that must have prevailed among the millions of people exposed to these. Nonetheless, I would push any nagging sense of unease to the back of my mind and enjoy the game(s) – that was until the IPL Final when things finally came to a head.
Facebook Fan Map
The Facebook Fan Map is an India map that supposedly shows a state-wise split of supporters – WHO SUPPORTS WHOM? reads the tagline below – well, beats me! I can’t make any reasonable inference from the map below except that the purple shading indicates KKR has considerable support as compared to KXIP in East and South India. BUT, my initial peeves with this map are:
- What is the criterion for shading a state Purple or Light Grey – surely each state would have supporters of both teams? Would a clear margin be required between the 2 figures for a final shade? If so, is that margin based on absolute or percentage values?
- Light Grey, really? Wouldn’t Red have been an infinitely better colour to clearly indicate KXIP support, especially considering State Borders are light grey as well? The map above shows only 3 states (one very large one stretching across India) supporting KXIP!
- What are states shaded Dark Grey? Those where a clear margin hasn’t been recorded? Or those where not enough tweets came in? Why are these more prominent than KXIP states?
For the uninitiated, this kind of a representation where the geographical boundaries are colour-filled is a specific case of a map visualization called a Choropleth – a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income (source: Wikipedia)
Choropleths are intrinsically prone to erroneous interpretation, which we can examine within the context of Facebook Fan Maps from 2 other matches:
MATCH 35: RCB vs RR
On close examination we find that 10 states support RCB while only 7 states support RR.
However, the map draws our eyes towards the great states in Central and Northern India due to their geographically large size and viewers may be prone to declare RR the fan map victor, if not a tie. The commentators went with a tied verdict on the day.
MATCH 39: SH vs KXIP
At first glance, the aerial view shows a close tie. We check to confirm that the 2 teams are even stevens – 9 & 10 states. Now, is one team more equal than the other? What about the actual numbers of people from each State – wouldn’t Maharashtra, UP & Bihar with their high population, add considerable weight to KXIP? Then again, which states have a high tech-savvy population?
If I were to visualize the fan information for the IPL FINAL, I might look to do away with the map altogether! Surely the same information could be conveyed very easily through simple column charts? But, if I do decide to use a map (choropleth, to be exact), I would overlay it with simple bubble or pie charts to circumvent any visualization inference issues due to state boundaries. I’ve created a quick sample on Tableau Public below:
Interact with and download the Tableau workbook here.
Let’s back up a bit and examine some of the individual components of IPL social media activity for inconsistencies in both data collection as well as data visualization.
Firstly, the official IPL website has a tweet counter for the hashtag #PepsiIPL that reads: 4827316. Call me pedantic, but I’d like to know a few things – What is the context for this number?
- Does it represent the no. of tweets that have occurred in the last day/ last week/ month/ across the tournament this year/ across years?
- Alternatively, when was the screenshot taken? Wouldn’t a current time/date on the tweet counter be pertinent?
- Include re-tweets or unique tweets?
Twitter Player Battles
Another interesting addition was the Twitter Player Battle where fans were encouraged to tweet in support of their star player from the team of their choice in any match, as shown below for the IPL Final (Sehwag vs Gambhir).
The default format of the tweet was:
I’m backing #Sehwag in the @IPL #playerbattles. Vote for your choice now http://bit.ly/P3gs0x #PepsiIPL
The issue with this form of data collection is that while Sehwag has been mentioned using the #Sehwag hashtag, there is really no way to ensure that the sentiment was a positive or negative one! I am also fairly confident that no form of text processing and sentiment analysis was done to derive the true intent of each tweet.
For example, I could easily have tweeted the above (there was another version with more colourful language, but that was processed and edited out before publishing this blog ;)), and my vote would still have counted FOR Sehwag rather than against him! You get the idea of how misleading this can be, especially when even major news agencies use these as the basis of their articles (read: http://zeenews.india.com/sports/cricket/ipl-7/ipl-7-kings-xi-punjab-pips-kolkata-knight-riders-on-twitter-battle_788746.html).
At BRIDGEi2i, we have designed and implemented rigorous algorithms and robust data visualization solutions for blue-chip clients using both structured and quantitative data as well as qualitative and new-age unstructured data. Feel free to give us a shout with your comments about the blog post or to get in touch with us and discuss your analytics and visualization needs.
This blog is authored by Farid Jalal, Business Analytics Expert at BRIDGEi2i
The views and opinions expressed in this article are those of the author and do not necessarily reflect the official position or viewpoint of BRIDGEi2i.