A successful decades-old, global, video game developer, publisher and hardware company generates several gigabytes of telemetry data on a daily basis. This company has several successful game franchises that attract millions of players on iOS and Android devices every day, and all these clicks and interactions have resulted in incredibly valuable data.
If this data can be aggregated and analyzed, it will hold the key to player engagement and retention. An example of the kind of data includes the event streams that indicate when players are playing, duration, levels reached and the money they spend on buying virtual goods such as new levels and avatars. Social networks complement this information with details about players’ real-life preferences.
To analyze this data and use it for intelligence an Analytics platform had to be created.
Given below are the steps used for building this solution:
Solution Architecture
The solution was built on Amazon’s cloud platform using Amazon Web Services (AWS). Software Development Kits (SDKs) for the different game technology platforms, including Android and iOS, were used to enable the games to push events data to the data collection server with minimal programming.
Data Collection
For the purpose of data collection, server using Node.js, which collects high velocity data from players’ mobile devices and writes it in real time to folders on an S3 (Amazon Simple Storage Service) bucket was used. This provides an event-driven architecture and a non-blocking I/O API that optimizes the throughput and scalability of data.
An Amazon EMR cluster to process the collected event streams from S3 multiple times every day was adopted. For each batch, a cluster on demand, based on the data volume, wrote the results back to S3, and then shut down the cluster to save on costs.
MapReduce jobs validated and cleaned the event data and wrote the results back to S3. Hive jobs then further processed these files to generate facts, dimensions and aggregated facts for later analysis.
Data Persistence and Visualization
To support rapid query and analysis, the Hive output was loaded into a data mart built on MySQL (and later experimented with Amazon RedShift as well) and Tableau to create dashboards and interactive charts were used.
As a result of this solution, the game company gained valuable insights, including:
- The conversion rates of players from free to paying customers based on geography, game title, and other dimensions
- The skew in the distribution of paying customers (a small number of players accounted for a large part of the total spending)
- An understanding of each player’s playtime across multiple games (surfacing opportunities for cross-promotion within each game)
- Detection of fraud through comparisons of the game’s telemetry data about purchases with the app store data for in-app purchases (it turned out that hackers had exploited a vulnerability in the game design that was quickly corrected)