On the DBIR, data analysis and information security

I read through the DBIR from Verizon yesterday and I believe (actually, I hope) this report will be a turning point in the way that we handle information security and data breach reporting.

I don’t say this because of the refreshing lack of pie charts, or the meme density on the analysis text being over Nine Thousand. I say this because of how the data was being handled in the report. It was being handled… like data.

By the numbers

The DBIR team made bold analytical decisions on how to manipulate what they had. Clustering the attack/breach information based on behavior and other features has not been done before as far as I can tell. When you look at it, the outcome seems logical and natural.

Now this, boys and girls, is what good data analysis looks like: you formulate a hypothesis and use the data to validate it. The only interference by the analysis (unless there was some huge cherry picking involved, unlikely given the unprecedented size of the data set) was to decide where to slice off the dendogram in the hierarchical clustering output.

And while this may seem “Oh! Magical!” to us Infosec people, an experienced data analyst would go: “Meh, so what? That is literally the first thing you do when you have a new data set to analyze”.

Yes, my data analyst friend, it is. And I really hope it is not the last thing we do with it.

Data will set you free

I was about to wrap this up when I came across a few criticisms in Twitter (from people whose work I absolutely respect) discrediting the results, in the spirit of “We already knew that! Cut out the marketing stunt!”

I can totally get behind the pushback on marketing, as I am guilty of that many times myself. The “same-ism” and inaccuracies on these breach reports are staggering. However, we are dealing with a different animal here, and I wish the naysayers would read through it and draw a more informed conclusion.

The simple fact that we can analyze a sample of the unknown population of breaches worldwide and reach the same conclusions as experts who have access to classified info and secret squirrel intel fills me with hope. Hope that we may be ready to move away from the cargo cult and shamanism that pervades our industry today. Hope that data can start to be used more meaningfully to drive our decision making on Infosec.

Information Security is not an special snowflake. We are not “different”. If you have enough data to analyze, the patterns will emerge.

Suggested reading: The Hedgehog and the Fox