Authentication Maturity Model (Part 1)
My last post outlines how to use maturity models to describe the robustness of an enterprise cybersecurity practice. It describes an organizational structure as a fully-connected graph with 4 nodes (App & Platform, Operations, Data Protection, and Identity & Access). I also assert a relationship between overall maturity and information flow between nodes.
In the next 2 posts (2 models in each), I’ll walk through creating maturity models for user and API authentication (authN) and relate it back to our 4-node structure.
Why Authentication?
At first glance, authN appears simple. All you’ve gotta do is roll out a standard for Single Sign-On (SSO) and API authN. Why is that interesting?
In my experience, authN is a complex, nuanced, and continually evolving topic. Consider the evolving world of authentication protocols and services. Protocols evolve and developer preference ebbs between OpenID Connect, OAuth2, mutual TLS, newer #tokenbindings, and sometimes even reverts to SAML and Kerberos. The services landscape is also changing quickly. Cloud providers offer a spectrum of services, and the Kubernetes community has created a new interest in sidecar architectures, most notably with Istio. Enterprises have to make sense of this somehow.
IMO, success with authN at a large enterprise requires sophisticated thinking and the ability to collaboratively execute across the organization. It seems like a good topic to explore to explore here.
Meet Acme Corp.
Our scenario takes place at Acme Corp, a fictional company. Acme is based on a series of mostly-true stories.
The technology arm of Acme Corp consists of various business groups and a central IT function. They report through different hierarchies and meet at the CEO. Security is a central IT function. The business and security groups both have the ability to develop and deliver software, but the security group is small compared to the business groups. Business developers outnumber security developers by ~200:1.
Business developers are under intense pressure to deliver new functionality. They want to build cloud-native apps and microservices.
Everyone at Acme Corp is aware the cloud landscape is changing quickly.
The security team wants to figure out how to help the business safely embrace the cloud for sensitive workloads. They know their job is to empower the rest of the organization, and they have valuable domain knowledge across the various cybersecurity disciplines. The security team sees their role in the cloud as establishing best practices and controls for the cloud.
Opportunities for Failure
Before we get to a maturity model, I think it’s important to list some of the ways Acme could fail at authenticating user and API requests. These are illustrative, admittedly incomplete, and appear in no particular order.
- Business developers create their own authentication service they don’t fully understand.
- Security team uses SSO or CA integration as a lever to force thorough reviews and approval of everything the business developers are planning.
- Business developers wait to add authN until their workload nears production.
- Security team “approves” authN technology the business developers won’t use.
- Business developers decide they are going to bring back Basic Auth.
- Security team clings to old servers for SSO and API AuthN, thereby refusing to modernize.
- Security team tries to force old technology into the cloud or a container.
- Acme developers write their own SDKs for validating sessions or tokens.
- Workloads don’t properly log attempts to subvert authentication mechanisms.
- Business developers hard-code configuration into their apps and have to make source code changes every time they promote to another environment.
- Non-prod environments have unrealistic security services.
- Either team decides to wait for the “NewWhizBangy authentication feature from TheMagicPlatform”, thereby setting a date and feature fixed milestone.
- Acme places a big bet on a technology that never materializes or matures.
- Acme becomes paralyzed and avoids making authN decisions.
In short: there are lots of ways for this to break. As described in the last post, I think maturity models can help avoid some of these pitfalls.
Success
We want to get quality app onboarding without scale constraints on the security team. Some call this “building guardrails”, others describe it like a highway.
Metaphors always have limits, I’ll test those limits here. This morning I crossed the Chattahoochee River in my car after taking my daughter to school. I didn’t have to call anyone for an approval to use the bridge. I didn’t have to make an appointment and wait. I didn’t attempt to build a more convenient bridge after reading local building codes (DIY bridge). I didn’t have to wonder if the bridge had moved in the last week. I didn’t test the bridge for safety beforehand. I didn’t bang into the guardrails to see if they worked. I didn’t look for ways to go around the bridge. I didn’t build a ramp and jump the river (bridgeless architectures). The bridge was a non-event.
We kinda want a lot of security to be the same way (yes, I realize technology changes more than bridges).
The metaphor isn’t all that important. To the business groups, it’s a way to get security capabilities without much manual involvement from the security team. To the security team, it’s a way for business developers to build secure apps independently.
We can’t click our heels and have it all. We have to grow. We need something that emphasizes maturity over any singular technology.
I don’t know of any rules for starting, but failing to start is a sure way to fail.
Maturity 1 — Compliance Minimum
This level describes intent. In the authN domain, let’s say we require integration with robust authN services and avoid bespoke authN implementations. A starting point for that intent is a simple written policy, call it a 1-pager. Referring to our synaptic structures from the last post, we are strengthening the connection between App & Platform and Identity & Access Management.
The Policy
This policy applies to workloads created at Acme and exposed to users or as an API. Unless there’s an approved exception, all workloads subject to this policy will:
- Properly authenticate and log every request.
- Integrate with approved authN services using approved protocols.
- Integrate with authN services early in the lifecycle, prior to production.
- Use only approved and properly configured SDKs for token, cookie, and certificate validation.
Super-simple. These policy directives are non-controversial and align with best practices. They give both groups (security & business) degrees of autonomy while also prescribing a reasonable architecture. However, it says nothing about how Acme gets it done.
What’s Required
Don’t be deceived, there’s a ton of work here. Some obvious next steps: defining what “properly” and “approved” mean, creating an exception process. Lots of collaboration required. Security can’t create something and “heave” it over the wall to the business.
Acme must define “properly” for the different platforms in the cloud (VMs, PaaS, k8s, FaaS). These platforms are changing quickly, so “properly” will change regularly.
Next, the security team also has to expose and maintain useful authentication services in pre-production environments along with defined methods for onboarding new apps.
We also need a way to communicate a fully defined policy to the groups actually building cloud-native apps.
There’s also the feedback loop and metrics.
Points of Friction / Where it will break
Maturity 1 doesn’t scale for anyone. It’s easy to see how, from the security team’s point of view, onboarding 100 apps will require 10x the work of 10 apps.
It will be very hard for Acme to keep their workloads current, and they are sure to end up with lots of snowflake implementations.
Verification will likely occur through manual reviews, which will lead to a waterfall release process.
Stagnation at Maturity 1 increases organizational pressure and dissatisfaction.
Maturity 2 — Industry Baseline
At this level, we begin to add scale and repeatability with technology. IMO, we want approved, automatic, and easily auditable (the 3A’s?) mechanisms for authenticating and logging requests.
It’s OK if we can’t provide these capabilities for all types of cloud workloads (VMs, Containers, PaaS, FaaS) right away. It’s fine to make limiting decisions. Make platform decisions knowing they will change. Again, no singular group can make these decisions on their own. Effective collaboration is required.
Here’s a good starting place:
- The Security team collaborates with business groups to review and approve combinations of {workload type, environment, SDK, authN service binding}. Specific areas of interest include authN protocol, authN credential handling, SDK configuration and parameterization, request logging and log forwarding, API configuration, and session management.
- The collaboration includes Security Operations for their domain knowledge. Specific areas of interest include log contents/encoding/format, proper forwarding, making sense of the log contents. Look for attempts to defeat authN mechanisms. Alerting on these signals is important. Some examples: new authN service appears (might be a malicious one), a new credential with root scope appears at authN service, an app receives requests with bad tokens. Even if these don’t reflect an attack, they create good information flow between groups.
For authN, I think the industry baseline is the ability to automatically configure workloads to properly authenticate requests while providing essential visibility to security operations.
Be On The Lookout (BOLO)
Irrational agendas can derail progress. Group collaboration requires some degree of individual maturity. The authN domain is notorious for pet peeves. I have several. The goal of these collaborations is to sustainably improve Acme’s authN, not a means for finally addressing pet peeves. I feel a future post in the making.
What’s Required (Example)
This maturity level requires code. Let’s describe some of the code (keep in mind these are needed for each environment and each workload type):
- Automation that provisions authN services for apps
- Automation that creates SDK configurations for apps
- Log sinks/destinations that are configured and available
This code can appear in a few different form factors. Open Service Brokers, API Gateways, and Istio plugins are good candidates. Let’s look at some of the features in an Open Service Broker and how they could age as platforms evolve.
Let’s take the simplest example: a Java website on a PaaS (imagine that). Let’s say the website users authenticate with an SSO service. The server-side processes must authenticate tokens from the SSO service, but they will also likely issue and manage their own cookies for session management. Java SDKs (maybe Spring) can handle these tasks, but they require tedious configuration for each environment (dev, pre-prod, prod, maybe even also by geo). The SDKs requires access to credentials, which is a whole topic unto itself. The Java website must also be provisioned with the SSO Service, which means the SSO service has a policy allowing it to issue tokens to the website for users.
Open Service Brokers are pieces of automation well-suited for these types of authN tasks. Pivotal ships an implementation of this named the Single Sign-On Service. This service is a great time saver for most organizations and worth checking out. My point is simple: search for areas you can automate without much risk. IMO Service Brokers are a safe bet.
Some might react with “But WAIT! What?! What about Istio? Istio sidecars might make all of that obsolete. We don’t want to be obsolete. We skate to where the puck is going.” I’m bullish on Istio, and I think it’s going to prove useful over time, but let’s look at Istio in context.
In my experience, most software has the shelf life of an organic banana.
For bananas, consume and replace is better than acquire and hold. The apps you’re building today will need to evolve over time, and you could reasonably plan for Istio sidecars. If that’s the case, then Java SDKs for token validation and session management leave the application. They move (in some form) into Istio sidecars that execute before application code in the request path. Maybe the Java SDKs morph to Golang, maybe the same SDKs move to another process wrapper. I don’t know.
Though I can’t say exactly how it happens, the progression moves responsibility from the app developer further into the platform. This aligns with current trends & common sense. Open Service Brokers kinda do the same thing — they shift service provisioning and SDK configuration further into the platform. I think it’s entirely possible that Istio becomes Open Service Broker consumers. If that happens, then Open Service Brokers seem like a reasonable step for an investment today.
We could do the same exercise with gateways and other technologies. I’m hopeful the points are clear.
Next Up
I haven’t talked much about testing. I think authN testing is essential for the next maturity level. If done properly, It increases quality, strengthens cross-group synaptic connections, and facilitates faster evolution.








