I’m Richard Baines, Product Manager for our Application Programming Interface (API) Platform, which allows the secure exchange of data between HMRC and third parties like the tax software packages used by many businesses and agents.
In a previous blog post, Steve Rowlands, Digital Operations Director, talked a bit about the DevOps culture we’ve built within our teams. Transformations in culture and working practices are often hard to understand without seeing an example of what success looks like. I was recently discussing my own team's journey with some of our leadership team and we agreed it would be a good idea to share my team’s story more broadly.
What is DevOps?
DevOps is a software development methodology that advocates a combination of specific ways of working with a combination of tools or ‘toolchain’, in order to speed up or enhance the software development cycle. It’s not an alternative to Agile, and the thinking behind it is a logical extension of Agile software development, which aims to deliver value to users faster than other forms of project management methodology.
The ‘ways of working’ element includes automating as much as possible, and getting the right level of monitoring and assurance early on in the build so we know straight away when things aren’t going to plan. This then informs the decisions we make on the ‘toolchain’, setting the stage for organisations to develop a continuous integration and continuous delivery pipeline.
I like to think of these two elements simply as the questions we ask ourselves; “how quickly can we release?”, and “how, using automation, can we build as much security and reliability into our products as possible”?
Finally, a core concept of DevOps is that the people who are developing a service, work in close collaboration with the people who will be supporting the service when it’s live. This then gives it the two parts to the name; “Dev” and “Ops”.
How we “did DevOps” in the API Platform
The API Platform is unusual in HMRC digital in that it’s a platform within a platform.
It’s hosted on the award winning Multi Digital Tax Platform (MDTP) so out of the box my team benefits from the work the MDTP teams have been doing on building a continuous integration and continuous deployment pipeline. All HMRC digital teams benefit from their common set of build and monitoring tools, which meant we didn’t have to build everything from scratch to start using this toolchain.
Last year we undertook a large project to move over to a new cloud provider. This was a big project for us and we wanted to make sure we really got the benefit from our immutable infrastructure, rather than feel like we’d just replicated a physical server in the cloud.
We brought in Seb, a new ‘DevOps Engineer’ to the team, and for the first time had someone on site with us in the Shipley digital delivery centre to advise the team on ways of working. In true DevOps fashion, Seb advocated a combination of good cultural practices in the way the team managed the API Platform micro-services in pre-production environments He also brought a lot of insight on the toolchain to the team as they were able to pair with Seb and gain a greater understanding of how our new cloud infrastructure was going to be configured.
Together with Richard our Technical Architect, Seb and I spent a lot of time in London with our MDTP colleagues to work on this large project together, which in the spirit of collaboration allowed my team to gain greater knowledge and discuss best practice with the various MDTP teams.
Why DevOps has worked well for us
In line with the core concept of DevOps I mentioned earlier, as we expanded we have structured the team so the Infrastructure Engineers and Developers are still in one team, attending the same stand-ups, sprint planning and any other team related activities they take part in.
Having our infrastructure engineers and developers working together means the team’s conversations always include the effect of the code on the infrastructure, and vice versa. We don’t want a wall between these two roles, and we’ve found that having one multidisciplinary scrum team has worked really well for us on the API Platform.
This close collaboration doesn’t just create a shared sense of ownership of the service, it also ensures a feedback loop is created. If a live incident occurs on the API Platform, we don’t just respond: we also record full details so we can host a blame free post mortem together when the whole team is back in the office. We discuss whether something might need to go on our team’s backlog to improve the service, or if we should further refine our monitoring and alerting.
The ‘inspect and adapt’ feedback loop has helped us continually refine the API Platform, and led us to achieve 99.99% availability on a regular basis each month.
We know a lot earlier when changes might not work, because we have worked hard to make sure our pre-production environments are more ‘lifelike’, in other words they are as close as possible to live as we can make them. The work we do in pre-production is repeatable when we start to promote it towards ‘live’ in our deployment pipeline, and consequently the results of each test become more predictable. This means we don’t have to do scary things like testing in live, which could leave our users exposed to things not working in the meantime before our assurance checks are finished.
Testing is simpler with the automated test suite we have developed, which removes reliance on one person to be a “tester”. The team take collective responsibility for the quality of the work they do, and anyone can trigger the tests. If you’ve worked in a team before, you’ll know how important it is that the team feel like they share ownership of their goals. This can help them work together more fluidly to get new features into production.
Another reason we can be more confident when we make deployments is because they are easy to back out. Infrastructure deployments are ‘rolling’ and automatically won’t deploy if there are any issues. In this style of deployment, traffic moves seamlessly from the old version of our service and infrastructure to the new one.
These rolling deployments mean no impact on our users and no additional planning needed for us to mitigate that impact. Personally as a Product Manager that means a lot less stakeholder engagement needed for a release because I don’t need to work with colleagues outside my development team to mitigate the impact of any down time. This frees up more of my time to spend with my users talking about more enjoyable things like improvements!
Improving the way that we work with DevOps means that we are deploying more frequently and doing multiple, small deployments. This lets us deliver value faster and reduces the risk of delivering new features into production.
After these new deployments are up and running, the auto healing we’ve put in place will automatically terminate any unhealthy instances and rebuild them, giving us tangible benefits from using cloud infrastructure and an API Platform that’s much easier to support.
Ideas for the Future
Hopefully you’ve learnt from our journey that DevOps for us is mainly about using the latest technologies to make our software deployment pipeline better, and changing the way we think to improve the way we work.
We are constantly thinking of ways to improve what we do, ranging from a backlog of things we’d like to automate, to better feedback loops for the API Producer teams who use our platform. If you’d like to share your ideas, experiences or questions, don’t hesitate to use the comments section below and get in touch!
Richard Baines, Product Manager API Platform
Check out our current vacancies. They're updated regularly so worth keeping an eye on.
Now you can follow us on Twitter @HMRCdigital
To make sure you don't miss any of our blog posts, sign up for email alerts