This year Invincea Labs will give two presentations at Black Hat. Josh Saxe will present new research on a scalable approach for analysing code-sharing relationships among millions of malicious binaries. Giacomo Bergamo will present Cynomix, Invincea’s new platform for automated malware analysis and visualization. Below are the full descriptions of their presentations.
The millions of unique malicious binaries gathered in today’s white-hat malware repositories are connected through a dense web of hidden code-sharing relationships. If we could recover this shared-code network, we could provide much needed context for and insight into newly observed malware. For example, our analysis could leverage previous reverse engineering work performed on a new malware sample’s older “relatives,” giving important context and accelerating the reverse engineering process.
Various approaches have been proposed to see through malware packing and obfuscation to identify code sharing. A significant limitation of these existing approaches, however, is that they are either scalable but easily defeated or that they are complex but do not scale to millions of malware samples. A final issue is that even the more complex approaches described in the research literature tend to only exploit one “feature domain,” be it malware instruction sequences, call graph structure, application binary interface metadata, or dynamic API call traces, leaving these methods open to defeat by intelligent adversaries.
How, then, do we assess malware similarity and “newness” in a way that both scales to millions of samples and is resilient to the zoo of obfuscation techniques that malware authors employ? In this talk, I propose an answer: an obfuscation-resilient ensemble similarity analysis approach that addresses polymorphism, packing, and obfuscation by estimating code-sharing in multiple static and dynamic technical domains at once, such that it is very difficult for a malware author to defeat all of the estimation functions simultaneously. To make this algorithm scale, we use an approximate feature counting technique and a feature-hashing trick drawn from the machine-learning domain, allowing for the fast feature extraction and fast retrieval of sample “near neighbors” even when handling millions of binaries.
Our algorithm was developed over the course of three years and has been evaluated both internally and by an independent test team at MIT Lincoln Laboratories: we scored the highest on these tests against four competing malware cluster recognition techniques and we believe this was because of our unique “ensemble” approach. In the presentation, I will give details on how to implement the algorithm and will go over these algorithm results in a series of large-scale interactive malware visualizations. As part of the algorithm description I will walk through a Python machine learning library that we will be releasing in the conference material which allows users to detect feature frequencies over billions of items on commodity hardware.
Josh Saxe – Associate Research Director, Invincea Labs
Josh Saxe currently serves as Associate Research Director for Data Science at Invincea Labs, where he is Principal Investigator on a DARPA program focused on developing a novel machine learning system for automatically discovering, analyzing, and visualizing evolutionary relationships between malicious software artifacts. Previously, Josh served as lead research engineer at Applied Minds, an inter-disciplinary technology think-tank. There Josh led a two-year research project focused on dynamic social network analysis and visualization of social media communications data for cybersecurity.
Wednesday, August 6th 2014
Mandalay Bay Ballroom – South Seas F –
The stream of malicious software artifacts (malware) discovered daily by computer security professionals is a vital signal for threat intelligence, as malware bears telling clues about who active adversaries are, what their goals are, and how we can stop them. Unfortunately, while security operations centers collect huge volumes of malware daily, this “malware signal” goes underutilized as a source of defensive intelligence, because organizations lack the right tools to make sense of malware at scale.
To contribute to addressing this problem we will be launching Cynomix.org at the opening of Black Hat USA 2014. Cynomix will include three key, novel capabilities that we hope will broadly impact the way malware analysis is performed:
- A subsystem for revealing “social network” style relationships between malware samples based on their shared characteristics. This subsystem allows analysts to see a group of malware samples in relation to a population-scale database of millions of malware samples.
- A subsystem for revealing malware sample capabilities based on correlations between samples’ extracted technical symbols and a machine-learning model trained on web question-and-answer documents.
- A subsystem for automatically generating statistically principled Yara signatures for malware samples and malware sample groups based on Bayesian reasoning at scale. This subsystem will allow users of Cynomix to quickly defend against new malware families before anti-virus companies generate signatures for them.
In our demonstration presentation at Black Hat Arsenal we will introduce Black Hat attendees to Cynomix.org, which will host a freely available version of our system. As part of our demonstration we will give detailed explanations of our platform’s visualizations and algorithms while also helping people to sign up to use the system in their own security operations work.
Giacomo Bergamo – Invincea Labs
Giacomo leads the Cynomix project and also supports the Cyber Genome DARPA program focused on automatically discovering and visualizing characteristics of and relationships between malicious software artifacts. Previously, Giacomo worked as a lead engineer and concept designer at various startups, founded a social entrepreneurship nonprofit, and performed research at RAND and other think tanks on topics ranging from optimization of battlefield intelligence gathering to creating unmanned vehicles capable of learning behaviors in simulated environments.
Thursday, August 7th, 2014
Mandalay Bay Ballroom – Breakers JK- Station 6