A Look Under the Hood: DataXu’s Parallel Test Capability


 

DataXu is a programmatic marketing cloud. We provide digital marketing automation for Fortune 500 brands.  Our services include managing, organizing and securing marketing data, simple to use analytics and insights and automatically deployed, machine-learning driven media purchasing. We manage petabytes of data, tens of thousands of simultaneous and campaigns and process over 2 trillion events per month on our real-time platform. Working at this scale on a 24×7 basis for clients around the globe requires a massive amount of infrastructure automation. This post by Eugene Muzykin describes this process.

Here at DataXu, we strive to innovate in all aspects of our business. We use the latest available coding languages and tools for our programmatic marketing platform. When it comes to Quality Assurance and testing of our complex technology stack, we use best available technologies and frameworks as well. Today, I’m going to walk through an example of test solution that we currently have deployed in our platform.

We utilize GitHub, Jenkins and Ansible while using Skytap as a cloud solution for our automated test infrastructure. Check out additional information on our cloud service API project with Skytap HERE. We have Continuous Integration (CI) setup through GitHub’s code management system for main components in our technology infrastructure. This means that whenever a developer makes and commits even a smallest code change, they will see if the change breaks critical functionality. Basic flow of CI is triggered by a code change, next step is to spin the environment with the code change in the cloud, run a set of tests and report the results back to the developer through GitHub and Jenkins.

Tests were being developed and added for the reporting warehouse component until the runtimes made it impractical to be run in existing test infrastructure. A solution that was adopted was to run tests in parallel in separate identical environments. The initial speed up reduced running times from over eight hours to three and a half hours. Test environments were setup in parallel and tests ran in those parallel environments with test results being combined as the last step. The results from test results got reported to GitHub so that decisions about merging code could be made. This parallel functionality was achieved using Jenkins Flow scripting.

 

A proof of concept (POC) was also implemented which extended the original idea of parallel test capability with additional functionality.  The POC leveraged parallel environments to speed up test execution, but it also made a request to figure out what current resource usage was. If sufficient Skytap resources were available, a higher number of parallel environments would be used. If cloud service usage was high, only a small number of parallel environments would be used. Please see details in the diagram below. Various test suits were split up dynamically based on number of parallel environments. These tests were run and their results were combined as the final step in the testing cycle.

EM2

As you can see when it comes to QA and testing at DataXu, we leverage automation as much as possible to achieve highest level of quality in our releases.  We use Continuous Integration to provide constant feedback to our developers and as a way to ensure quality in our releases. We’ve used parallel capability in Jenkins to reduce test run time significantly. We continuously improve our QA processes based on the tools and frameworks that are available.