Behind the Scenes of a Live World Cup

This year’s World Cup coverage put a heavy emphasis on live, in-game updates and analysis.

As the Sports department planned its live-blogging strategy, staff in Graphics and Interactive News started thinking about what sorts of interactive features would best complement live game coverage. Eventually, we arrived at three collaborations:

The live game tracker
A team statistics leader board
An interactive feature summarizing player popularity on Facebook

The Live Tracker

Without question, the live tracker was one of the most ambitious live-update projects we’ve ever attempted on NYTimes.com. During game runs, we refreshed the data files every 2 seconds; the Flash client embedded across the blogs and home page editions picked up fresh data every 15 seconds.

The live game tracker for the World Cup championship

This left us with two key problems: How could we reliably refresh game data every two seconds? And how could we ensure we were meeting the massive scale of hundreds of thousands of Flash clients?

Feeding the Data

For live game data, The Times partnered with Match Analysis. Match Analysis’ main business is in providing analytic tools to professional soccer teams (MLS, German and Mexican teams, among others), but the company agreed to produce a special data feed for us detailing each match — touch by touch, pass by pass.

Because of the range of data we hoped to include in the tracker (touches, passes, goals, shots, ball locations — both singly and in the aggregate), we needed to ingest twelve separate XML files from the feed to generate a single XML file for the Flash.

Sequential file ingestion quickly proved too slow. After considering various threading and caching strategies, we eventually turned to a combination of disk caching and Paul Dix’s excellent typhoeus gem. Typhoeus provides a simple Ruby wrapper around libcurl-multi, enabling us to load several files in parallel.

This way, ingestion time was bound only by the time of the slowest request. Including erubius for XML templating helped to minimize file generation time. Additional Ruby classes handled various calculations and re-aggregations as the data was loaded.

By staggering requests and file regeneration into two separate threads (game events such as goals were pulled from the feed every 2 seconds; minute-by-minute totals and aggregates were refreshed only every 15 seconds), we were able to reliably meet refresh targets at a reasonable load on an Amazon EC2 high-CPU medium instance.

Client-Side

Of course, server-side data production is only half of any live-tracker equation. Some simple, obvious steps — gzipping, slimming of extraneous data and XML structure — helped hold a full game file to around 11 KB. Still, we pressed forward and split updates into two parts: a full file read on SWF init, and an updates file limited to events in the last minute of game play. The SWF polled the update file, a relatively svelte 2.6 KB, every 15 seconds.

To handle the unpredictable and potentially significant scale, we turned to the cloud for hosting. We considered various schemes involving Amazon CloudFront to reduce end-user latency, but we eventually decided to serve files directly from Amazon S3, given the ease and immediacy of file updates.

Game Operations

The 64 matches were distributed among the Interactive News staffers. The publication process was largely automated, but a staff member stayed on hand to monitor the speed and coordinate with other groups involved in the effort, such as SMS and mobile.

To help us spot on-air trouble quickly, I used term-ansicolor to colorize the game-run terminal display. Here’s a view from the driver’s seat:

Loading and processing stats during a game

Organizing XML

As the number of XML files grew, we created a light wrapper around Typhoeus’ multi-request queue to keep things organized. A slightly modified version of the base class can be found on GitHub.

Using this class, creating a group of files to fetch became very simple:

    class ResultsXML > BaseFetcher
 
      def fetch
        queue_request(:match_info, 'get-match-info-xml')
        queue_request(:key_events, 'get-key-events-xml')
        queue_request(:players,    'get-players-xml')
        queue_request(:lineups,    'get-lineups-xml')
        super
      end
 
    end

And actually fetching the XML became even simpler:

    r = ResultsXML.fetch
    r[:key_events].search("//goal")

The Leader Board

We built out our team standings table in a Ruby on Rails app. Rails produced HTML from the Match Analysis data, and the HTML was then server-side included into our content management system.

Handling the hover action in such a sprawling table became a challenge. In an early draft, the mouseover event on each cell directly modified styles on other cells. Though this performed marginally well in Firefox, it was unacceptable in Internet Explorer.

The published version wraps the table in two container divs. A series of precomputed styles is inserted:

#hoverContainer.nytwc-team-MEX td.nytwc-team-MEX,
#hoverContainer.nytwc-team-USA td.nytwc-team-USA,
// ... one for each country ..
.nytint-genericCell{ background-color: #cccccc !important ;}

and a country-specific class name is applied to each table cell:

    <td class="nytwc-standing-right nytwc-team-URU  out-of-cup "></td>

— so that when a cell is hovered over (or clicked), the handler simply applies a country-specific class to the container.

var handleHover = function(){
  hoverClass = 'nytwc-team-'+getCountryId(this.className);
  hoverContainer[0].className = hoverClass;
};

With this approach, we no longer need to inspect and modify each table cell; instead, we took advantage of the browser’s event bubbling and precomputed CSS rules to handle mouse events.

Facebook Popularity

For some time, folks at The Times have been interested in examining how social media references track news events. The World Cup presented an opportunity to collaborate with Facebook on a piece analyzing player mentions.

In consultation with our Graphics department, Facebook developed a JSON-based API allowing us to query for term frequencies at various time scales. We, in turn, built a series of scripts to query the API for player names and construct a set of tab files containing both overall rankings and minute-by-minute details for top-ranked players.

One of the most difficult challenges in this piece was detecting and addressing term ambiguity: players with common names (“Rodriguez”) or ambiguous names (“Green”) proved problematic. Facebook’s internal analytics staff provided invaluable assistance in tuning these queries.

Here, we experimented in fronting Amazon S3 with the Softlayer origin-pull CDN to address latency issues we’ve experienced in the past with moderate-volume S3 files. In New York, Softlayer reduced average retrieval on the main data file from 16 ms to 7 ms over 5,000 requests and 50 concurrent connections.

This was also our first project to use s3-publisher, a Ruby gem that takes care of gzipping and setting common headers when pushing files to S3 for public consumption.

The Take-Away

The World Cup occupied a unique space in our live-update repertoire: though updates came more quickly and at a more granular level than ever before, they did not reach the full sweep of a project such as the Olympics. We hope to use the lessons proved here to craft a faster, more comprehensive data report in future efforts.

Kathleen July 13, 2010 · 1:24 pm

The live game tracker is sick. Beautiful data! Loved this inside look at how it was executed.

Phillip Smith July 14, 2010 · 9:58 am

Ben —

Many thanks for sharing this “inside the tent” post. Great stuff. Fantastic insights. Keep it coming. :-)

Phillip.

Nicolas July 15, 2010 · 8:33 am

Very interesting post, thanks for sharing!

The New York Times

Open | Behind the Scenes of a Live World Cup

Behind the Scenes of a Live World Cup

The Live Tracker

The Leader Board

Facebook Popularity

The Take-Away

The Live Tracker

The Leader Board

Facebook Popularity

The Take-Away

What's Next