Showing posts with label DevOps. Show all posts
Showing posts with label DevOps. Show all posts

Thursday, November 29, 2018

A really interesting AWS DevOps job opening

I just received this email, and the job looks incredibly interesting to me. If you've got AWS and DevOps experience, please contact Bhaskar directly (contact details below):

Direct Client:: In person interview is needed. No skype/WebEx/Phone.

Location: Boston, MA
Duration: 12+ months (37.5 hrs/week)
Rate: Open

•             Help with production issues and deployments and any analytics/data science products
•             Move the team  closer to continuous deployment, improving tooling (i.e. automation) and use of infrastructure (i.e. try-test-iterate faster)
•             Setup deployment infrastructure for Mayflower, our style guide and visual component library
•             Setup deployment infrastructure for analytics and data science products, that AWS Lambda and Docker based (Goal: Reduced time to go from engagement to receiving data)
•             Create and implement a strategy to monitor Digital Services supported applications and accurately notify engineers of problems
•             Construct and maintain a Threat Model for Digital Services supported applications, and implement solutions for gaps in our security based on it
•             Develop a reusable infrastructure playbook and tooling for rapidly deploying Data Team packages to internal customers
o             Quickly standup standard environment and/or distribute or handoff work deliverables
•             Mentor the team on good DevOps practices
•             Development and deployment of RESTful APIs, including documentation
•             Setup and facilitate processes and environments for the creation of new services; this would include the creation of processes and deployments to dev, test and prod environments for the proof of concepts services developed throughout the agency.

Skills Needed
•             5-8 years of experience in the below categories
•             Experience with AWS cloud platform specifically with the following services (or equivalent services within alternative cloud-based platforms)
o             AWS CLI
o             Cloud Formation
o             Cloud Front
o             S3 management
o             RDS
o             DynamoDB
o             SNS & SQS
o             EC2 management
o             Elastic beanstalk and other auto-scaling services
o             Lambda Function (python & node)
o             API Gateway
o             AWS Route 53 and AWS Cert Manager
•             Linux terminal
•             Experience with an IaaS (preferably: Amazon Web Services)
•             Virtual machines
•             Monitor production web applications
•             People and technical process improvement/re-engineering
•             Communicating effectively
•             Continuous integration/deployment
•             Conducting technical and behavioral interviews
•             Infrastructure security practices
•             Documenting in plain language
•             RESTful APIs
•             Infrastructure automation tools
•             Amazon Web Services
•             “Serverless” architecture such as AWS Lambda
•             Microservices architecture
•             Bonus points for experience with:
o             Acquia Cloud
o             PHP
o             Drupal
o             Agile/iterative development
o             User Experience (UX) practice
o             Python
o             Ansible
o             Docker
o             Other Coding Experience


Bhaskar Nainwal
Software People Inc.
Ph: 631-739-8915 © Fax: 631-574-3122

Thursday, July 20, 2017

DevOps and Microservices Architecture done right - IBM Netcool Agile Service Manager


Our last article described just how easy it is to upgrade any or all of the components of Agile Service Manager. This article is meant to describe some of the design, patterns and processes that had to go into the application itself to allow a two-command in-place upgrade.

Microservices Architecture

Yes, this is a trendy buzzword these days, and that's only part of the reason I'm using it here. In general, a "microservice" can describe almost anything you access over the web - a website, a document, etc. Absolutely anything. But one very important concept about microservices when designing an application is separating functions that then expose all of their capability through some interface. In ASM, there are several "services", each of which is implemented in a Docker container. Each one of these "services" actually provides a number of related "microservices", which are then exposed via URLs accessible through the host system. All of these services communicate with one another through the exposed microservices. 

ASM's strict separation of functionality allows a lot of flexibility in application development. For example, one service is the File Observer, which is used to read a file of a specific format, which contains topology information. The main purpose of this service is to read a file and convert it into data that is then sent to the Topology service, which is responsible for processing the data for its purposes and ultimately sending it to the Cassandra (database) service, which will persistently store the data on a filesystem that's available on the host and accessible via the Search/ElasticSearch service. Notice that this application pattern is very similar to existing application patterns, but in this case each service is provided by a separate container.

Containers vs VMs

Docker containers are MUCH smaller than full virtual machines. Additionally, Docker has defined and implemented numerous usecases that make containers easier to create, deploy, configure and orchestrate than VMs. So, while ASM could have been delivered as multiple VMs (via OVA files and some type of hypervisor-specific orchestration), the use of Docker containers makes deployment and management much simpler. I'm saying that a similar result could have been achieved with VMWare vSphere, but IBM's Docker solution for this application seems to be sleeker to me.

Containers vs J2EE Applications in an App Server

In many ways, Docker can be seen as similar to a J2EE Application Server like WebSphere - it provides a common architecture with functions, capabilities and services that are shared among the applications/containers running within it. However, Docker containers can run applications written in any language you want - from Java to R to Haskell. Anything you can run on the host OS can be run inside a container. Containers can also be given strict resource limits for CPU usage, memory, file access, etc. To me, containers seem to be much more like atomic units than J2EE applications. 

As an example that I believe many people can relate to, an Application Server can be thought of as your browser, with each tab being an "application". It doesn't happen often, but one tab can crash your entire browser. Docker has been written specifically to avoid this with containers, which is a great thing.

IBM's Design Choices

IBM appears to have chosen this particular pattern in order to make the application as manageable as possible from both perspectives - development AND administration. When upgrading the components of the application, each service/container is basically free to do whatever it wants as long as it continues to adhere to its published REST interface (since REST is the only interface IBM has created for the services).

What does this have to do with DevOps?

DevOps requires frequent building and deploying, ideally in a manner that does not cause any regression test failures. The structure of this application wholeheartedly adheres to this requirement, and it is brilliant. 

I'm certain you can think of thousands of ways that this doesn't apply to some application that you deal with, but you should ignore those thoughts when thinking of the future. I truly believe that most, if not all, enterprise-scale applications will be rewritten using either this pattern or one that's extremely similar to it. And everyone in all areas of IT needs to be ready for the new opportunities and challenges that will come with it.

Monday, April 3, 2017

DevOps: Operations Can't Fail

Agile and DevOps are all about "Fail Fast", which is fine for developers, but absolutely unacceptable for Operations.

For a recent example, just look at the recent AWS outage:

That was caused by someone debugging an application. None of us want our Operations department to be in that position, but it can obviously happen. I think there are one or more reasons behind why it happened, and I've got some opinions on how we need to work to ensure it doesn't happen to us:

Problem: Developers think Operations is easy

Absolutely everything labeled "DevOps" is aimed at allowing Development to do just enough "operations" to get by. But we in Operations know that it takes a lot more: Change and Configuration Management, Event Management, Business Service Modeling, and the list goes on and on. Individual Development teams don't necessarily understand these practices outside of their own application.

One Solution: We need to learn about "the new stuff"

The only way we'll be invited to the table to talk to development teams is to learn about the tools they're using (Jira, Puppet/Chef, Kubernetes, Docker, etc.). This will allow us to use a similar vocabulary when meeting with them. Without this basic knowledge, they simply won't invite us to any of their discussions.

Problem: Developers think Operations is unnecessary

Individual Development teams often don't see why the Operations department even exists. They have their tools that allow them to consistently deploy their application, so why does Operations need to be involved. They don't understand that any one of their 20-or-so "incidental" microservices may actually be absolutely critical to some other application in the environment.

One Solution: After learning the new stuff, ask to be involved

The Operations Manager needs to get involved with the Development teams. She needs to give Development teams some type of framework or process or SOMETHING that makes their application's metrics and availability visible to the Enterprise. This will allow ALL involved parties to understand the situation when there is an outage.

A great graphic from Ingo Averdunk at IBM

The parts in light blue (Logging, Monitoring, Event Mgmt, Notification, Runbook Automation, ChatOps and Root Cause Analysis) are those components that need to be standardized across all applications. If your Operations team isn't meeting with Development, you won't get to explain the need for the standard suite of tools.

There are other problems and other solutions

This post is meant to help Operations in a sea of DevOps information that is aimed only at Development, in the hope that we can reign things in and continue to ensure that the entire enterprise is healthy and available.

Friday, March 31, 2017

DevOps: The functions that must be standardized among different applications

DevOps appears to be here to stay, so from an Operations perspective, we need to ensure that all of the Development teams are playing together nicely and following some common rules.


I just realized that many Dev teams don't fully understand the need for Ops when they're implementing DevOps. Here are the foundational reason, IMO, behind the need for Operations:

Business Continuity

In many enterprises, applications never die, and customers continue to need support long after the original application development team has moved on. If applications don't follow some basic standard practices, they can easily be forgotten by the people who need to support them - Operations. Developers want to move on to the next new thing, which is great for Dev, but horrible for Ops. There are numerous classifications of applications that can't simply change on a whim due to factors such as regulatory control. Regulations affect a truly stunning number of companies, from utilities to taxis to manufacturing. Unless Dev is going to take responsibility for the support of their application over its entire lifespan (which can be 5 to even 20 years), Operations needs to be involved.

Integration With Other Applications

Applications need to talk to one another at some point. And when those connections fail, all involved application teams usually point fingers at one another. To minimize this finger-pointing, all applications should adhere to some common standards, several categories of which are found below. Even if all Development teams coordinate tightly in your company, there are still MANY external applications being used that need to be supported (e.g. WebSphere, Oracle, etc.). And the management of these applications needs to be coordinated with the in-house applications being built. Operations provides this management and coordination.


Application logging should be somewhat standardized to allow the log data to be collected and parsed for important information. This doesn't mean they all need to log in exactly the same format, but they should all adhere to some best practices, such as:

Every log entry should have a timestamp and a unique identifier (such as transaction ID)
Logs should be human readable
Identify the source of the message
Avoid multi-line messages if possible
Use name-value pairs (possibly log in JSON format)


Applications NEED to be monitored at very least for performance (response time) and availability (up/down). Ideally you want to have data collectors at each tier of a multi-tiered application to give you transaction topology and detailed monitoring data, but this can come later. At a bare minimum, all applications need to be monitored using some type of synthetic transactions, which run dummy/non-"real" transactions through the system to gather constant performance and availability metrics.

Event Management

While many applications log information, there are parts of the infrastructure that can only send "events" to some remote destination. The most common types of events are "SNMP traps" (SNMP=Simple Network Management Protocol), which are generated by network equipment such as routers and switches. A cohesive management strategy by operations needs to manage information in log files and events to allow for correlation between and among different systems. For example, a JDBC call from an application may fail, but the application itself doesn't know if this is a failure of the database itself, the network infrastructure or possibly even DNS misconfiguration. The event management function of the Operations group works on identifying these relationships in order to help perform Root Cause Analysis of incidents. This decreases the amount of resources required to resolve an issue.


Who needs to be notified when "something" goes wrong? Do you want every application team to receive an emergency text in the middle of the night for every problem? Probably not. The Operations team is usually responsible for sending (and, more importantly, suppressing) the appropriate notifications. This is tightly related to Event Management and Root Cause Analysis.

Runbook Automation

Anyone who is responsible for handling a ticket needs to have some idea of what to do. Runbooks are sequences of steps an operator can run to gather more information and/or resolve an issue. Runbooks need to be maintained to ensure that they're valid and up-to-date. Application teams often don't have all of the experience needed to create comprehensive runbooks. They are created over time by the Operations staff, who are constantly handling issues.


In an enterprise, the ideal situation is that each user has ONE userid and password (or certificate, etc.) that they use to authenticate to all applications. This authentication storage mechanism needs to be maintained. This is another function provided by Operations.


DevOps is currently a very popular methodology, and it serves its purpose very well. It allows Development teams to continuously deploy applications to provide better business value. Operations is still required to perform quite a few functions that simply aren't in the purview of Development.