IR Is Not ITSM
- July 26, 2017
I’m not an ITIL expert. Yes, I’ve been at an ITSM company for over five years, so maybe it’s a bit like playing an expert on TV. I probably use some ITIL terms incorrectly, but if you’re a security person, you’ll likely forgive any mix-ups on the IT front.
Back when I first started at ServiceNow, we were using our IT Service Management incident application for security incident response. It was a much smaller company in 2011 with around 400 employees. Compare that to over 5,000 employees (and 3,600 customers) today!
But our first foray into home-grown security response didn’t go all that well—and for the right reasons. Because security incident response is handled differently than ITSM. The responsibility assignment matrix (RACI/DACI/your acronym of choice) is different for ITSM. The ITIL framework used for the incident management lifecycle is very specific to IT. These didn’t translate well to security.
ITSM is also highly focused on availability. IT teams are largely measured on uptime. Of course, availability is important, but it’s just one-third of the CIA triad (confidentiality, integrity, and availability). Your systems can be up and available 100% of the time, but that doesn’t mean you’re winning at security. In fact, some downtime is inevitable for good security hygiene.
Here’s a quick example of the different perspectives on availability. Say a web server goes down, and it wasn’t because of some kind of attack. The IT team will work tirelessly to get that server back up and running, and it will be an all hands on deck situation until they succeed. The security team, however, sees this downed server as secure because no one can access it. Okay, that’s a bit of an exaggeration, but you see there’s a marked difference in priorities between IT and security.
On the security side, every incident is top priority until more is known each one. An incident in the “new” state (which is really “triage” for security) requires immediate response. Once it has been triaged and the extent of the incident is understood, it can be prioritized…and that might mean deliberately letting it sit for days, weeks, or even months. Determining incident priority depends on a number of factors, not just the incident category or type.
IT incidents also generally have a fairly fast initial response SLA, but that first gate is generally acknowledging the submission of the incident. Then another SLA timer begins with various timelines based on the type of request, ticket, or outage. Letting something sit generally isn’t advised unless you enjoy receiving angry emails from users.
With security incidents, the contain and eradicate stages can take a long time to complete for a number of reasons. Sometimes waiting is the right choice when investigating an anomaly. Also, it’s important to ensure a threat is completely eradicated before closing out an incident. That verification can take weeks or months to complete.
Another difference for security is that confidentiality is critical. It’s no big secret that Joe Employee asked for a new phone or that the marcom team requested five more Creative Cloud licenses. But security incidents are kept private until more is known about them: who or what is affected, how bad it is, and what’s the extent of any damage. Once the security team has that information, the sensitivity level can be lowered, if appropriate. These incidents can even be used for educational purposes, for example, real-life phishing attempts can be used as training tools. On the other hand, some incidents—especially those involving human resources or other personally identifiable information—must maintain the highest level of confidentiality.
And once an incident is closed comes the post-incident review (PIR), a step that doesn’t really exist in ITSM. How often and how quickly do you complete PIRs? I’ve heard people respond with weekly or monthly, and on rare occasions, as soon as the incident is closed. The longer you wait, the more difficult creating the PIR will be, as those involved forget or misplace the details. The right technologies can automate PIR creation to make this part of the process easier and faster. Then you can use that PIR as an executive status update for critical incidents.
IR and ITSM are both critical to keeping businesses up and running, but you can’t substitute one for the other. The teams, priorities, and processes are completely different for a good reason. Gartner analyst Anton Chuvakin wrote, “think of security IR as responding to a ‘business incident,’ not an IT issue.” Those business incidents need their own purpose-built application, which is why ServiceNow no longer uses our ITSM incident application for security. Instead, we built Security Operations on the same platform, allowing security and IT to collaborate without being forced to make one size (awkwardly) fit all.