DataXu hosted a meetup on 7/16, sharing its solution for automating builds, tests, and deployments. Our Automation Infrastructure uses GitHub, Jenkins, Ansible and AWS. This post will outline the challenges we faced while using our existing infrastructure and how we put some tools together to create an overall solution and infrastructure that we are proud of. We will discuss:
• Automatically Provisioning AMIs
• Configuring Jenkins Jobs with Jenkins Job Builder
• Triggering Jenkins Jobs on GitHub Events
We presented our solution at a Meetup we hosted on 7/16. More information can be found here: http://www.meetup.com/Automation/events/223535032/
This GitHub repository can be used for reference: https://github.com/ferrants/dataxu-automation-demo
The Slides can be viewed here: http://files.meetup.com/2055411/automating_the_build_test_deploy_infrastructure.pptx
We had our Jenkins master sitting in a closet and it was manually configured with a few slaves. Our Jenkins jobs would poll for any changes and build / run unit tests if the polling found any changes. If someone was working on a new branch, they would create a new job to test that branch. This is a system that mostly worked for us, but as we grew, we ran into a number of challenges.
Commits that were not tested could be merged to master. This throws the team off and developers may base new work off a bad state.
Jenkins jobs are a hassle to manage. We had a lot of jobs that were similar, so manually updating all of them when we needed to was cumbersome and error-prone. It was the wild west – anyone could change a job, we had no traceability for who changed a job, what they changed, or why the change was made.
We were frequently running out of Jenkins capacity, resulting in some builds waiting hours to run. Developers needed quicker feedback.
Jenkins would go down, slaves would be stale and builds would fail, sometimes resulting in days wasted. Slaves might need to be rebuilt or have their hard-drive space cleared to allow new jobs to run. There seemed to be “always something” to do in regards to slave maintenance.
Machine provisioning and deployments were being done manually, with people ssh-ing into machines and running a list of commands from a wiki page. This is time consuming and error-prone.
We planned the next generation of our automation infrastructure and came up with this list of requirements.
- Automatically test everything before it gets to master.
- Automated Job configurations must be stored in git.
- Jenkins must be able to scale automatically. We want to have as much capacity as we need, but not keep extra machines running all the time. Save $$$$$, don’t waste time!
- The system must be robust. Issues should not persist. If there is a weird issue, re-run or kill machines, don’t waste my time making me look into it.
- All machines must be provisioned and deployed to using some configuration management. This includes Jenkins master and slaves.
We identified these tools:
- GitHub-Webhooks – Triggers Jenkins Jobs based on GitHub Events. Created at DataXu.
- GitHub-PR – Allows programmatic use of GitHub pull requests. Crated at DataXu.
- Jenkins Job Builder – Configures all Jenkins Jobs. Templates and macros allow reuse between jobs. Job definitions stored in git.
- Jenkins Amazon EC2 Plugin – Spins up (and down) jenkins slaves based on demand.
- Jenkins Build Flow Plugin (not recommended by Jenkins core, see Workflow Plugin) – Allows orchestration of multiple jobs together.
- Ansible – Configures AMIs, including Jenkin master and slaves. Runs deployments.
Note: This includes code examples. They are just rough examples, to be used for reference. Read the Ansible and Jenkins Job Builder documentation for more information. Our Jenkins is used as an example here, but this all can be generalized for your individual applications.
The Open Source community is awesome and there are roles that can bootstrap your setup. See ansible-role-jenkins. We have fully automated builds of Jenkins master and slave AMIs, and can test them before deploying to our production Jenkins installation.
This is just a brief reference on AutoScaling Group Deployment using Ansible, not a comprehensive guide:
Now we have deployed an instance of Jenkins running in AWS.
If you want elastic builds, be sure to configure the ec2 module during the deployment or manually on the running instance after. The benefits of this plugin are:
- Builds should not wait around too long, instead of just sitting in line, a new slave will spin up and take the job (configurable max limits).
- If you usually have a lot of slaves, you can save some money by making them ephemeral (slaves are terminated after a configurable idle timeout).
- Slaves should stay in a cleaner state. If one is in a bad state, simply delete it. A fresh one should always spin up from the original slave AMI you generated.
We also connect Jenkins to our internal LDAP for authenticating and permissioning users/groups to run jobs.
Now, Jenkins Job Builder (JJB) goes to work. Create a git repository named something like,
jenkins-jobs. Keep your job configurations in here in
/jobs/ and a config at the top level,
Notice that we use a naming convention for our jobs to meet the following goals. Jobs should:
- Be easily grouped by purpose
- Facilitate templating
- Be easily identifiable to the naked eye
- Be easily addressable (esp. in logical groups) with our GitHub-Webhooks triggers
You don’t need to follow our naming convention exactly (though we’d be flattered if you did, thankyouthankyouverymuch), but having a naming convention for Jenkins jobs helps with the aforementioned goals. Here are a few examples from our naming convention:
Install JJB locally and create your first job, to test and reconfigure Jenkins Jobs:
Test this locally and deploy it with the above commands. Now, if you look at the configured (and running) jenkins URL, there should be a
jenkins-jobs_master you can run manually.
This simple example belies the power of JJB. We have close to 500 defined Jenkins jobs, all stored in source control, using under 100 files. All job changes go through a pull-request-based code review and testing on a test/staging server. Combining JJB with our AMI-based deployment of Jenkins allows us to test Jenkins upgrades and any new plugins in a built-with-one-click staging environment that closely mirrors our production Jenkins installation, and we use a representative sampling of our job types (ruby, python, java, etc) to validate the new Jenkins version or plugin changes.
We use GitHub-Webhooks to automatically trigger Jenkins jobs based on events in GitHub. You can run github-webhooks on an aws instance that is internet-accessible. You should limit access to this instance to only allow connections from GitHub and your internal infrastructure. See GitHub’s documentation on whitelisting.
Once the instance is online, you need to test GitHub’s connection to it and start sending events. Go to https://github.com/organizations/settings/hooks and configure a hook to send an event to it. Additional documentation here.
We configure GitHub-Webhooks to try to trigger a repo’s master job on every master commit.
You can test this locally by making a commit to your
jenkins-jobs repository, adding a new job, and committing to master. Confirm that your
jenkins-jobs_master job is triggered and that jenkins is updated (the new job you added has appeared in Jenkins list of jobs).
Great! Now you have automated Jenkins Configuration! Next, let’s trigger jobs to run against Pull Requests to your
jenkins-jobs repo. Create the following job:
And add the following to your GitHub-webhooks configuration: https://github.com/ferrants/dataxu-automation-demo/blob/master/ansible-playbook/templates/github-webhooks-userdata.sh.j2#L13
Now you have a Jenkins infrastructure that allows you to easily create and review job configurations, and test any incoming changes before they reach master. If you installed and configured the
ec2plugin, you will also have Jenkins slaves that scale up and down as you need them.
Read up on how to use jenkins-job-builder you can set up your jenkins jobs to:
- Email developers on failure
- Report the status of the commit back to GitHub using GitHub Status API
- Use a common template for building all java, python, ruby projects (We’ve found that this can save significant time and resources while also speeding up the adoption of automation during the development of new projects).
Refer back to the Jenkins AMI and Deployment sections. They are simple ansible playbooks that were run locally to create an AMI and deploy it to AWS. Let’s make some other jobs that create and deploy AMIs. First, put your ansible playbook in a git repo as
jenkins_deploy_ami.yml. Let’s name the repo,
ansible-playbooks. Now we can create generic jobs that use those playbooks using
Now we can run
jenkins_deploy_ami jobs in jenkins to create a new jenkins ami and deploy it to AWS.
Jenkins Build Flow Plugin is not recommended by Jenkins core team, who have created the Workflow Plugin to do similar tasks (and more). The below example is how we use “flow jobs” now. New implementations using the Workflow Plugin may be similar, but the Workflow Plugin is not yet supported by JJB.
This job will run a job we have for build/unit test, if successful create an AMI, test that AMI and if successful then deploy that AMI. This flow is triggered whenever there is a commit to the
Note: We are just just reaching this phase of automated merging and are using this for a limited number of repositories.
If you have enough confidence in your tests, a pull request that has passed all necessary testing could be merged to master immediately. We created a tool to look up pull requests in GitHub, github-pr. Our Jenkins jobs are configured to report back to GitHub using the Status API, therefore we can use github-pr to look up whether a job triggered by a PR has passed or failed.
$ github-pr list -r dataxu/myapp --table # State Status Merge Base Head Title --- ------- -------- ------- ------------ ------------- ----------------------- 794 open success clean master some-branch added something awesome $ github-pr merge -r dataxu/myapp -n 794
We can use this script to do a number of things in an automated job that runs nightly:
- Look up all successful branches/pull-requests, merge them into an integration branch for longer-duration testing before merging to master
- Merge pull requests
- Add labels to pull requests (“failed_merge”, “failed_nightly_testing”, etc)
- Comment the status back to pull requests (“Congratulations, your PR passed nightly testing and was merged to master! A celebratory toast may be in order, but we defer to your discretion.”)
Practically everything runs through Jenkins. We automatically trigger builds on commits to master and for opened/updated pull requests. By running Ansible, we have created jobs to automatically create AMIs and deploy those AMIs. Job flows can be used to tie jobs together, creating workflows or pipelines that proceed through multiple steps/jobs based on each prior step’s success or failure. Job configurations are stored in SCM, so we are aware of, test, and can approve of all changes before merging them.
By creating standard job templates for things like “a java build using maven” or “a ruby gem build”; we don’t need to reinvent the wheel every time there is a new project, we just need to fill in some configuration values, test it, and we’ve got a new Jenkins job for very little cost. Using a naming convention for jobs (<repo>_master, <repo>_pr, etc) makes jobs easy to find, group together in definition and triggering, and their purpose is clear at a glance.
We’ve found this combination of GitHub, Jenkins, JJB, and Ansible (along with our own recently open-sourced tools GitHub-Webhooks and github-pr) to be immensely helpful in scaling and maintaining a thriving infrastructure of automation at DataXu. We take pride in the gains we’ve made to boost developer productivity (and the accompanying overall velocity boost this provides to the company as a whole) and look forward to building even bigger and better things in the future.