When Mike Loukides published the long essay What is DevOps? in book form for O’Reilly Media, he gave it a subtitle that would become well-known: Infrastructure as Code. That essay, just 20 pages long, proposed a few key things:
- Infrastructure moves into the code. The systems that run the software, which are in the cloud, are created by code.
- The operations role would move into the teams.
- Monitoring moves into the platform. The virtual machines we create, through code, to serve our software, will include built-in monitoring.
Eight years later, it may just be time to ask if these predictions were accurate, what we have learned, and, perhaps, what’s coming next.
SEE: Kubernetes security guide (free PDF) (TechRepublic)
Infrastructure as Code
Loukides’s essay included several famous examples, such as ChaosMonkey at Netflix, that were full-blown computer programs doing infrastructure work. The most popular idea at the time was the operations people would become full-blown computer programmers, writing programs in Python or perhaps Ruby to set up a series of virtual machines that would run application code. The customer Python and Ruby would need to manage resources, scaling, availability, and so on.
That turned out to be hard to write, harder to debug, and nearly impossible to keep running.
The industry did step up, in a couple of powerful ways.
First, in 2013, at the Python Conference, Solomon Hykes and Sebastien Pahl unveiled Docker, a lightweight virtualization tool for Linux systems. A year later, Google made Kubernetes open-source. Kubernetes and docker introduced one major difference between traditional “infrastructure as code”—they are more configuration and command driven than code driven.
The popular term for this is declarative DevOps. Put simply, instead of writing regular classic code telling the computer “how” to create the servers, you create a configuration file that tells it “what” and run a command. In Kubernetes terms, this is a manifest file instead of a series of Kubectl commands from the command line, or worse, a Python program to run the kubectl commands, running in an infinite “while” loop, attempting to monitor the system and take corrective action. Bob Reselman, a consultant and trainer, suggests that a manifest file will create a reusable asset that is easier to audit and control.
While Infrastructure as Code has not taken over every aspect of software, it has been critical in enabling the rise of microservices, which teams can often run themselves.
Operations into the teams
At least for microservices, it may be fair to say that operations is now part of the software development team. That is, for new services, I see teams supporting the services they create. This isn’t every organization I work with, but none of these changes are everywhere. IT is becoming as spread out as media, where big cities still have newspapers, radio, television, cable, and the web all working at the same time, often at the same media company.
Another innovation is an entirely new job category, the Software Reliability Engineer, or SREs. SREs are responsible for system availability, latency, performance, emergency response, capacity, and so on. They can both monitor and take corrective action on a large number of sites and services. This is sort of a “DevOps” job in that it brings software development rigor to operations. Personally, I find it a bit sad, because instead of development and operations working together, we’ve invented an entirely new job category. It does seem to work with huge companies with scalability problems; smaller groups can just put operations onto the teams.
Monitoring in the platform
A lot of things can go wrong between the phone and the router and web server and the microservice, the database, all to the internet of things device. One thing that hasn’t happened with Kubernetes is the support for monitoring we were hoping for. Cloud hosting companies do provide amazing dashboards to look at the health of servers, but tracking messages, part of observability, is something most groups have to plan for on their own.
It may be part of what’s next.
While Windows containers do work, at least in theory for one specific operating system, I’ve yet to see a company actually use them. Kubernetes remains mostly a solution for linux systems, particularly web and perhaps database servers. For the time being, staff engineers will have to get used to working in a heterogeneous environment, where traditional operations staff will continue to play a role.
Then there is monitoring. There are packages and open source systems, like Istio, that instrument cloud systems and automatically create monitors and audit trails. The problem I see is that they require great amounts of CPU/Member, which, in the cloud, translates as dollars. They can also roughly double the network requirements. More than once I have seen a company spend tens or hundreds of thousands of dollars, plus a few engineer-years, to implement a monitoring system, only to turn it off because the system requirements actually impacted production.