So, Okta was recently compromised. As is the case after every security incident, there has been a bunch of FUD – Techcrunch says the data breach has affected hundreds of Okta customers, the Hacker News crowd is up in arms, Cloudflare’s CEO says he is so disappointed in Okta that he might enter the space himself, and the Tenable CEO went off on a rant about transparency. I understand journalists need to sensationalize and vendors need to posture, but, in all the ruckus, we seem to have lost sight of some key learnings from this security incident that the industry could benefit from.
This article attempts to fix that – I will outline what actually seems to have happened at Okta, describe how this situation could occur at any SaaS provider, and provide some tangible ideas on how to reduce such risk.
Caveat: While I am fairly knowledgeable about the subject matter here I have no insider information about what happened at Okta. Please let me know if there are any factual errors in my notes below.
What actually happened at Okta?
Okta is a leading cloud-based workforce identity provider; millions of employees and contractors use Okta every day to authenticate into their corporate applications. Okta uses an internal application named SuperUser that allows its support engineering team to administer tenants on behalf of the customer. Okta support engineering itself is handled by the Business Processing Outsourcing (BPO) subsidiary, Sykes, of the contact center giant, Sitel. These support engineers use devices owned and managed by Sitel.
Some time between Jan-16-2022 and Jan-20-2022, a Sitel device used by an Okta support engineer was compromised and the threat actor gained RDP (Remote Desktop Protocol) access to it. We can assume RDP access gave them full control of a device on which the support engineer was logged into the SuperUser application that administers the Okta service.
On Jan-20-2022, the hacker attempted to escalate the compromise. Using the support engineer’s valid Okta session on the compromised device, the hacker tried to register a new authentication factor. The support engineer detected this unauthorized registration attempt and escalated it to Okta Security and the hack was contained.
The Okta Security team examined their logs to scope the blast radius from this incident. Their worst-case analysis assumes all Sitel devices were compromised between Jan-16-2022 and Jan-21-2022. Under this assumption, Okta estimates the worst the hacker or hacking group could have done was to access the Okta tenant of 366 customers (~2.5%) via the SuperUser interface. Further, the SuperUser interface itself has limited access to internal systems – it cannot create or delete users in the customer tenant, cannot download customer databases, cannot access source code – so Okta feels comfortable in saying their service is fully operational and there are no corrective actions customers need to take.
What did Okta get right?
Despite the hoopla over this security breach, Okta seems to have done most things right here.
- The support engineer subcontractor didn’t have VPN access to the Okta production network; instead, they only had access to the applications needed to do their job. The hacker couldn’t move laterally across Okta’s network or compromise Okta devices.
- The SuperUser application implements least privilege concepts, so support engineers can’t create or delete users. Even though the hacker/hacking group had full control of a device on which a user was logged into the SuperUser application, they did not gain unfettered access to the Okta tenant environments.
- Okta retained detailed audit logs. Both the external forensics firm and Okta Security were able to check access patterns and create a forensic report to better understand the blast radius relatively quickly.
- Okta had a good working security relationship with its BPO. Okta Security knew of the incident within hours of Sitel detecting its device was compromised. Many BPOs I know would choose not to disclose such incidents.
To me, these points give me a lot of confidence in the Okta solution and in the company’s security processes.
What did Okta get wrong?
Okta has access to all the security technology it needs so it likely wouldn’t gain much by purchasing new security products. Their processes around tracking security incidents at third-party contractors and disclosing impact around cyber attacks seem a bit slow, but that, too, is well within what I have seen in the industry.
I think the main area Okta hasn’t done well has been their crisis communication. They’ve come across as tardy and less than transparent. We saw this first in their handling of the CVEs associated with Log4J and Log4Shell, where many of their initial answers around the potential exposure and subsequent impact were vague and unsatisfactory. There are no Okta employees on Hacker News responding to comments. Their Twitter game is pretty weak. Okta would probably do well to take some ideas from Cloudflare here – crisis or not, Cloudflare uses both formal and informal channels very effectively to get their point of view across.
Why is this relevant to all SaaS providers?
The most interesting thing about the Okta incident is that it is not specific to identity or security service providers – it could have happened to any company that provides a SaaS offering. In fact, the Okta incident is in many ways analogous to Uber’s “god mode” controversy where employees could spy on its riders.
Every SaaS provider has one or more SuperUser-like administration interfaces for its internal staff to provide support across multiple customer accounts. These administration interfaces are restricted to a specific set of users because unauthorized access could seriously compromise multiple customer environments.
Then, once a SaaS provider reaches a certain size, it starts leveraging third parties – partners, vendors, BPOs – to serve its customer base. These third parties need access across customer environments to perform their jobs so more administration interfaces are operationalized. This further increases the associated risk.
In my ideal world, the Okta incident would have prompted a deeper discussion on how SaaS providers can and should provide least-privilege access to their administration interfaces, especially those used by third parties. We would share cybersecurity best practices and debate specific approaches. I see none of that so far … let me try to start that discussion.
How can SaaS providers better secure their administration interfaces?
Here are 4 security principles all SaaS providers should employ to secure their administration interfaces.
1 – No Static Credentials
Many administration interfaces are secured with shared usernames and passwords. These credentials are often used in administrative scripts and are not easy to change. The LAPSUS$ group specifically recruited employees and contractors who had access to these types of privileged static credentials.
Instead, administration interfaces should use authentication tied to SSO via the corporate identity provider and combine with MFA. Okta has implemented SSO + MFA for its SuperUser application (it had better, it is a corporate identity provider after all!) and that’s what enabled it to contain this security incident.
If configuring dedicated SSO for your internal-only administration interfaces is not a priority for your dev teams, you could instead rotate the credentials regularly via a password manager.
2 – Enforce Device Trust
As we saw with this Okta incident, device compromise is all too common today. Most device compromises occur over the network, so all devices used for admin access should have their firewalls enabled. Further, ensure remote access servers (such as RDP and SSH) are turned off on these devices. To enforce this, ensure a user can’t even sign on to a service from an unregistered device where you can’t validate device posture.
A common technique to enforce device trust is to whitelist access to certain IP addresses you can only obtain via a VPN client that checks device posture. Many organizations use this technique, but it fails if multiple users use the same network and share an IP address. In the case of a breach, you cannot map the IP addresses in logs to specific devices, so you have to assume every single device in that shared network has been compromised.
A better technique to ensure device trust is to employ a device certificate stored in a secure enclave on the device. This technique allows you to uniquely identify a compromised device regardless of the network it is on.
3 – Audit Programmatic Access
Admin interfaces often need to interact with multiple systems in a SaaS provider environment. Support teams often automate these interactions for provisioning (e.g., AWS), ticketing (e.g., Jira), chat (e.g., Slack), and more.
These application-to-application interactions authenticate via API keys and are typically exempted from the policies and audits that are applied to human-initiated interactions. API keys are a threat vector that attackers always look to exploit. Okta never mentioned it in their report, but the LAPSUS$ hackers claimed on Telegram that they explicitly searched for, and were able to find, AWS access keys in Okta’s Slack channels.
API keys used to build administration interfaces are particularly sensitive. Don’t exempt programmatic access from policies and audits – make sure these are also authenticated and authorized consistently.
4 – Define Approval Workflows for Privileged Access
The two-man rule is a time-tested protocol for sensitive operations, and shouldn’t be reserved for nuclear submarines. Tying privileged admin access to an approval workflow can significantly reduce the risk of breach.
Traditionally, approval workflows have been associated with tedious bureaucracies and waterfall style development. But, no more! Modern solutions utilize just-in-time provisioning with API integrations that support Slack webhooks, Jira tickets, and more, making approval workflows easy to operationalize and not-too-onerous to use.
Any time a user elevates to super admin privilege should trigger an approval flow.
In summary, what happened at Okta could have happened at any SaaS vendor. The industry should learn from this incident, and better secure their administration interfaces. There are several commercial solutions, including Zero Trust Network Access. Contact Banyan Security to learn more about how we can help you with such initiatives.