Dec 18, 2014

My DevOps Journey and Experimentation

Hello reader,
I hope my post helps you by few pennies :)


I am aspiring to deploy DevOps/Continuous Delivery in my entire Organization. Yes but are we there yet? Well, I know our destination but I want us to enjoy the journey which is filled with fun, learning and contentment. DevOps is a mindset and culture change to begin with and then comes the methodologies, tools and framework. I believe a bit on the opposite. Without a proven working model it is not going to be a trivial task to change the mindset of people in any Organization. The reason is that they see to believe it!

Nearly a year and half ago when many of our development and test teams in my organization were in jeopardy as the test automation was not up to the mark. With so many defects reported, regressions kept on multiplying like piglets. Since we could not directly use any open source test framework due to the complex embedded domain we are in and those tools if had to be used needed a whole lot of mocks and then we went round and round with severe loss of time and very few tests running. 

When frustration was at its all-time high and our motivation was at all-time low, I decided to pick a cup of coffee and started chatting with few of my team members in our team work area. In India we have a chain of coffee outlets called CafĂ© coffee day that has a tagline “A lot can happen over coffee”. Let me disagree and say “A hell a lot more can happen over coffee with the team”.  

That’s when we to write a functional test automation framework on our own that befit our embedded requirements. We started with a small PoC and took it to our technical mentor. He was honest enough to say nice things about our framework but brutal enough to pin point the short comings. Our framework went through an overhaul and few plastic surgeries and there you go, we branded it as Genie, a functional test automation suite. It is now open enough that it has a DSL that can be extended to test any product so that others benefit too.
I always consider that quick turnaround of builds and reduced cycle times to execute tests are the cornerstone for DevOps/Continuous Delivery. While I believe that one major hurdle of test automation has been crossed decently well, the challenge now am facing is to ensure all teams understand the importance of this and embrace test automation in their respective projects. So we formed a test automation guild and I organized in a way that we meet up weekly and share best practices, share APIs that can help across teams just to prevent reinventing the wheel. Of course, who wants to change? I went through similar hassles of poor attendance in the guild and I was not really sure if am able to reach the last developer. Then a senior helped conduct technical debt reduction day in which we stressed the importance of test automation and encouraged people to do only test automation that day and slowly we gathered momentum. Now, I can confidently say that all of them have understood how important test automation is. Teams are trying to have a user story to reduce technical debt in every sprint.

The next big challenge was build time reduction. Each clean/full build takes about 50 minutes during peak development hours and about 35 minutes during other times. Every developer is making so many builds a day in our Continuous Integration system and it is imperative to reduce the build time so that the overall productivity of every developer consequently the organization goes up considerably. By this time my Organization gave me the responsibility to implement DevOps across the board. With more vigour we started solving the build time issues. We wanted to ensure that after the first time always incremental build happens. We worked towards this goal and now the build time has considerably reduced to approximately 9 minutes in the same environment.  Needless to say that there were many challenges including the clock skew issue that was not guaranteeing the correctness of the build and the linux servers were hosted and owned by some other team and it was indeed a task to ensure that the servers came in synch with NTP (Network Time Protocol). And then I started asking teams across the board to adopt these incremental build changes so that everyone benefits. As I write this abstract, many teams have tried and found it working and currently they are in the process of configuring these changes in their CI environment. The simple math of productivity improvement is as below:
Build time reduction in CI (triggered by CI) - ROI
  • Total number of builds per day (worst case) = 2250 builds (Actual data took as a sample)
  • How long one full build takes (avg) = 30 min
  • How long one Incremental build takes (avg) = 10 mins
  • Time saved per build = 20 mins
  • Time saved for 2250 builds = 750 hours every day
*Statistics may vary depending on the no. of builds that happen everyday

Build time reduction in BUILD Server by developers – ROI
  • Total number of builds per day (worst case) = 800 builds (Actual data took as a sample)
  • How long one full build takes (avg) = 30 min
  • How long one Incremental build takes (avg) = 10 mins
  • Time saved per build = 20 mins
  • Time saved for 800 builds = 266 hours every day i.e., 33 man days every day
*Statistics may vary depending on the no. of builds that happen everyday
Total time saved = 750 (CI) + 266 (Dev) = 1016 hrs a day
We also realized that we can get away with building on the servers (due to its load it does slow down) and use one’s own desktop machine as their own exclusive build server. In order to realize this idea what we are doing is have developers install a linux VM on their windows machines and do the builds on that. Any 8GB RAM PC should be enough to get this idea through where in 4GB for VM would be great. This way the build dependency on one supreme build server is avoided and the hardware capabilities of the developer’s machine are used to the full extent possible. 

We are also evaluating the usage of emulators to run the automated test suite to solve insufficient hardware problems. One of the counterpart teams in the other region developed an emulator for their platform by using PC virtualization customized on top of specific device drivers. Instead of reinventing the wheel, what we did was to take their emulator and planning to make it work for us for our platform. 

Another tangent to this problem is that we are trying to use the developer’s hardware to be submitted to the continuous integration system such that during night times the automated test suite can be run using the developer’s hardware. This way we are looking at a global optimum wherein many more test suites can be executed in parallel.

That is the power of looking at the big picture – DevOps/Continuous Delivery. So, as of now we are in the process of crossing the two big hurdles which are test automation and build time reduction. 

What is the next one that we are attacking as part of DevOps? For that let me first explain the DevOps pipeline that I came out with for my Organization. 
May sound like basics but trust me it’s a mammoth task to get these things going especially across the organization. We have a lot of legacy that we are carrying forward when the whole world is going open source. It is time to dust it all up and move ahead. So, as part of that my team is working towards migrating to Jenkins, an open source CI server.  What we have currently is an in-built proprietary system and it comes with its own overheads including poor documentation and people dependency.
Hence the work on Jenkins migrated started few months ago and we try and reuse all those available plugins and at the same time write our own custom plugins to fit some of our specific needs. One of my counter parts is also working towards migrating to GIT from ClearCase. While GIT migration came as an Organization mandate it also helps us as GIT and Jenkins works in tandem with each other quite well.

Whenever a build fails, Jenkins shall trigger an email and text messaging notification to the culprit’s mailbox and mobile phone respectively.

On the ops part, currently part of the staging part has been tried where is the application is digitally signed and pushed over the air (OTA).  Our focus has been on the dev side of the DevOps to ensure that we get our basics strong and right.

No comments: