Six Questions About Big Data Cyber Risk Answered
- September 20, 2017
One of the hottest topics for both DellEMC and Hortonworks today is how to protect big data repositories, data lakes, from the emerging breed of cyber-attacks. We sat down to discuss this topic to address some of the common questions we’ve faced, and would love to know your thoughts and contributions. Our thanks also to Simon Elliston Ball for his contributions to the discussion.
The threats to big data environments come in a few broad areas.
First, these are ‘target rich’ environments. Years of consolidating data, in order to simplify management and deliver value to data science and data analytics, makes for an appealing destination for cyber attackers. These will be subject to many ‘advanced persistent threats’ – cyber attackers and organisations trying to use extremely focussed and targeted techniques ranging from spear-phishing to DDoS attacks to gain access to or exploit your big data platforms in some way.
Second, they are powerful computational environments. So, things like encryption attacks, if they are ever unleashed on big data operating environments, could potentially spread very rapidly.
Third, big data repositories are often accessible to many employees internally. In general, this is a good thing, as how else could organisations tap into the potential value of big data? But a comprehensive framework to monitor and manage data access and security is required to protect against possible abuse or exploits.
The good news is that WannaCry and other ransomware variants currently in the field don’t really target the operating systems on which big data platforms run. The bad news is, it’s probably just a matter of time before they do. And the fact that these environments are very capable computational resources means that these sorts of exploits could spread fast, if steps aren’t taken to protect them.
There’s a lot about the way big data platforms are architected that could potentially protect against these malware forms – assuming the right steps are taken. Here are some suggestions:
One of the most common misunderstandings in deploying big data environments is that you can still think of RTOs and RPOs for the infrastructure as a whole. You can’t – it’s too large! You’d have to build in such a vast amount of redundancy as to make the whole thing commercially impossible. Rather, you need to set RTOs and RPOs for individual data sets or storage tiers within the environment. In this context, you need to allow sufficient slack in your resources for the right number of snapshots to be in place for key data sets to insulate you from risk. This might be anything from 30-50 percent unused capacity in a given storage tier, made available for snapshots, though the latter would be verging on overkill in most cases.
It’s a critical part of protecting any environment, educating employees, as this will be a more likely first possible entry point into an organisation than anything else. Raising employee awareness around the dangers of spear phishing, modern malware attacks, and beyond. The standard tricks of redirecting people to websites and downloads, via sending dubious email attachments and beyond have become much more sophisticated.
The people that attempt to hack a Hadoop cluster might start by hitting a system administrator with a Servicenow helpdesk request… This camouflage makes it difficult to spot. It’s important to remember that the people that are coming after these resources are good… not script kiddies or mass market ransomware opportunists, but people who are into causing serious damage, either for ideological or commercial reasons.
Even with training, people will remain a weak link. Given another guesstimate that the “per event” reputational and regulatory impact of a breach can cost up to two percent of market cap, having good remediation policies, processes and technologies in place given the eventual inevitability of a breach is key.
The critical component here is the audit piece, given need to know exactly where your data is being stored, controlled and processed, and what it’s being used for in an evolving regulatory context. This is something you both apply to your use of big data, but also something big data enables you to achieve, for other systems as well. The audit and exfiltration monitoring tools you build in as part of your hygiene planning around your big data are useful, for example… but these logs are no use without analytics, and without being able to cross-reference and cross-check other data resources, e.g. if a piece of personal information has been accessed on one system, does it also exist on others? And should it therefore have been deleted from all?
The rise in the volumes of unstructured data represents a huge number of unknowns. As such, we are going to see a huge opportunity around digital transformation. Organisations are going to be forced to assess how they handle data and put in some big improvements in terms of the structure of their environments, their ability to do those analytics, pull back the information in a short amount of time and so on… else organisations may be exposed to potential regulator enforcement/investigation scrutiny for failure to embed within an organisation appropriate data governance and data security.
For those interested in functional ways they can tackle these problems, Dell EMC Isilon has built-in tools that aide in recovery from a ransomware attack; however, detection & prevention is a much better alternative. Fortunately Dell EMC partners with Superna and Varonis to offer ideal solutions.
If you’re interested in how Dell EMC Isilon and Hortonworks customers tackle other challenges around gaining value from their big data, join our upcoming webinar on “Batch + real-time analytics convergence” in late November. Register here.