Background
This guide is going to discuss methods of modeling cyber threat intelligence using the Cisco Threat Intelligence Model (CTIM). It will also introduce best practices for client developers using the Cisco Threat Intelligence API (CTIA).
This document is managed and maintained by the Cisco XDR Threat Intelligence team. Use of this document for purposes other than educational are strictly prohibited.
Nothing comes from nothing, and CTIM didn't spring from a vacuum. In order to better understand why CTIM and CTIA are the way they are, it helps to have a little bit of understanding of the business and technological context in which they arose.
What is Cyber Threat Intelligence?
First, let us be clear what we mean when we talk about Cyber Threat Intelligence. The Center for Internet Security provides this useful definition:
Cyber threat intelligence is what cyber threat information becomes once it has been collected, evaluated in the context of its source and reliability, and analyzed through rigorous and structured tradecraft techniques by those with substantive expertise and access to all-source information.
There are three processes taking place in sequence here:
- Collection
- Evaluation
- Analysis
It might help to think of each of these processes as part of a data refining pipeline. In the same way that an oil refinery converts crude petroleum into a variety of refined petroleum products, this data refining pipeline gathers raw data (such as information about how computer software behaves in a variety of contexts). It then progressively refines that data by decomposing and structuring it, enriching it by placing it in context with data from other origins, with its own varying levels of refinement. Refining it in this way allows us to isolate the parts of the data that help us make more informed decisions. The end result is what we call actionable intelligence.
In order to understand the origins of where the Cisco Threat Intelligence Model comes from, we must consider the technology that immediately preceded it: Secure Malware Analytics (formerly Threat Grid).
Controlled Detonation
Without getting lost in the details, Secure Malware Analytics is a sandbox designed to safely perform controlled detonation of malware samples. This is how that works, in general:
- A sample is submitted. This could be any file, document, executable, image, email attachment, nearly anything that can be packaged into a file on a computer.
- Some virtual machines are created, and the file is "detonated" in them. Documents are opened, executables are run, and so on.
- Secure Malware Analytics carefully observes the system to see what happens to it when the sample is run. What does the sample do? What does it change? What memory does it access? What files does it create? What websites does it visit? Does it install software? What are its behaviors? All of these observations are recorded.
- After a while, the virtual machine is recycled and the observation data is stored, to be evaluated later by various automated systems as well as further inspected by human analysts.
Each sample can easily yield hundreds of megabytes of observation data. By the end of 2015, Secure Malware Analytics was processing upwards of 1,000,000 samples each month, and generating hundreds of terabytes of raw cyber threat intelligence each month.
That's a lot of raw material. Certainly more than what can easily be displayed in a web app (but not impossible--Secure Malware Analytics does pretty wondrous things with all of its data). However, one of the things that the system could tell us with great confidence is whether or not a given sample was malicious.
And so, for reasons that are logical and entirely forgivable, customers started using Secure Malware Analytics as a verdict service. They wanted to know whether their email attachments were safe to open. All of them. This was not an ideal situation, because sandboxes aren't optimized for this use case. They're for performing a deeper inspection on samples.
We were generating and storing an awful lot of data, when most of the time what people really wanted was a simple yes or no.
But we knew they really needed more than that. People would need some context about why we thought a given sample was malicious, and if so, what it meant for their organization. Most importantly, we knew that if evidence of a threat was detected on their network, they probably needed help deciding what to do about it.
The CTIA Project
And so, at the end of 2015 we started working on a prototype for a fast verdict lookup service. We called it the Cisco Threat Intelligence API (CTIA).
Building on the work of the Open API Initiative, we wanted to build our prototype of CTIA as a robust REST API standard that had executable human-readable and machine-readable documentation, enabling service discovery platforms later on. For this we chose a Swagger 2.0 implementation.
For our early data model, and learning from the complexity of storing Threat Intelligence in Secure Malware Analytics, we chose to borrow many concepts from the emerging Structured Threat Intelligence Expressions (STIX™), which defines itself as:
a structured language for describing cyber threat information so it can be shared, stored, and analyzed in a consistent manner.
However, we didn't focus on implementing an exchange for STIX data, such as a Trusted Automated eXchange of Intelligence Information (TAXII) service. STIX is a fine wireline format, but our main emphasis was on rapid storage and retrieval to accelerate analysis and to ease incident response.
And this is a very important distinction between the model that became CTIM, and its roots in STIX. We weren't trying to build an exchange for threat intelligence indicators. Our raw data wasn't so much the patterns we were looking out for, it was actual behavior that we could observe malware engaging in, right in front of us. We have petabytes of such data, it just needed to be refined, but also placed into an actionable context.
And so, development of our REST API began apace.
Extracting the Model
Once the CTIA prototype was built in the Spring of 2016, and we started showing demonstrations of what we could do with it, we realized that we needed to isolate the model from the API. We could build other tools on top of that model, and the first such tool was our Incident Response Orchestration Hub.
So, we spun off the Cisco Threat Intelligence Model (CTIM) project from CTIA, and over the course of the next several years we continued to build those tools on top of our elegant model and lightning fast API, both of which are Open Source.
This guide is written in the hopes that it will enable developers and threat analysts to more easily model their cyber threat intelligence assets using the Cisco Threat Intelligence Model (CTIM) and Cisco Threat Intelligence API (CTIA).
This guide is nowhere near exhaustive. This guide focuses on the most common entity types encountered by threat analysts and tool developers.
Intended Audience
- This guide is aimed at CTIA client developers or threat analysts building tools to model their threat intelligence data in CTIM and store it in a CTIA server.
- API access via Cisco XDR is not required. Developers without access to Cisco XDR can still store intel in a local or on-premise deployment of CTIA, if they wish.
- All of the data in this guide is presented in JSON format, for simplicity.
Objectives
By the time you finish this guide, you should have learned the following:
- How to build common CTIM objects and package them into bundles.
- The advantages of using bundles:
- Using external IDs to avoid unwanted entity duplication
- Using transient IDs to reduce your volume of HTTP requests
- How to POST the resulting bundle to CTIA.