Introducing Securonix Extended Data Science Suite: Modern Data-centric Architecture for Security Data Science, Detection Engineering, and Threat Hunting

Information Security, Security Analytics
Share

By Oliver Rochford, Applied Research Director, Securonix

 

For most security users, out-of-the-box detections and analytics are key capabilities, allowing them to quickly deploy an effective security operations capability and helping them to stay ahead of an ever-evolving threat landscape. We have entire teams developing detections, playbooks, and dashboards for this purpose. But some organizations have requirements that are so unique that they need the ability to tailor and develop models and analytics for themselves. Similarly, to discover novel and evasive attacks, threat hunters increasingly need deeper access to operational security data than digital forensics and incident response (DFIR) typically necessitates, specifically the ability to conduct freeform exploratory threat analysis and to visualize and analyze data not dissimilar to data scientists. Detection engineering too is increasingly becoming a function in large security operations teams, especially for organizations that are targeted by advanced threats. These power users aren’t catered to by traditional security operations solutions, and instead must build and maintain disparate data architectures and pipelines, using a bewildering array of different components, languages, and tools. 

This leads to a number of different problems and challenges:

 

Data and Threat Science

The current approach to developing new data science content for security uses a separate stack of technologies to set up data pipelines, requiring excessive amounts of data wrangling, ETL, and data labeling to then extract features and develop a prototype and train models in an external set of data science tools. The prototype then needs to be reimplemented for production deployment on a SIEM or XDR. Data scientists end up spending most of their time getting their data into shape, and an end-to-end project requires several different specialists, from data engineers to developers.

 

Threat Hunting and Threat Research

Most SIEMs are not the most ideal platform for threat hunting. While they support search and basic analytics operations well, they lack the features to be able to conduct rapid exploratory threat data analysis or, for example, more sophisticated analytics such as mixed time series analysis. Analytics capabilities are included, but are often canned and preset, and cannot easily be extended. Additionally, the types of analysis and visualization methods used for incident response and threat detection are not the same as those required to analyze and identify new, novel, and previously unknown threats. Finally, hunting involves an organic process not easily captured by the organization, and therefore the knowledge of how a hunt was conducted, and therefore the opportunity to learn from a successful hunt is lost.

 

Detection Engineering

Delivering new and updated detection content has always been one of the most labor-intensive and time-sensitive challenges in cybersecurity. Threat Intelligence must be monitored and reviewed, and then IOCs and TTP extracted and codified, usually using a bewildering amount of different languages and syntaxes for different solutions, including regular expressions, Perl, and Python, as well as more recently YARA, SIGMA, Query DSL, KSQL, and Kestrel. Even if you don’t engineer detections yourself, relying instead on vendor-supplied detection content, your mileage will still vary based on the vendor’s technical debt and design decisions. To fully benefit from the new detection-as-code philosophy and make security operations more agile, detection engineering needs suitable tooling.

More jarringly, even though these three fields all utilize the same data and serve the same consumers, they lack a common data architecture or integrated collaborative workflow.

Figure 1: Threat Science, research, and engineering with disparate and dislocated tooling and workflows.

 

For each of these use cases, it is common to build separate data pipelines with repeated overheads for ETL and data transformation. That doesn’t really work or effectively scale well — whether you’re a mature and sophisticated end-user trying to develop your own analytics and ML models, running a critical SOC for an organization being targeted by advanced threats, or an MSSP developing detections for collective defense across all of your customers.

 

Enter Securonix Extended Data Science Suite

Having faced the same challenges, we’ve decided to go in a different direction at Securonix, one better aligned with our unified platform philosophy. Enter XDS — our Extended Data Science Suite for threat science, detection engineering, and hunting.

Figure 2: Integrated threat science, research, and detection workflows via Securonix XDS.

 

XDS is a brand new and ground-breaking way for Securonix customers to work with their security data, providing the context necessary to bring sophisticated solutions to bear. As one of the first vendors to open up model parameters to users, Securonix has always been at the forefront of enabling data scientists. XDS allows users to go even deeper. 

Securonix XDS is the industry’s first cloud-native security data science and detection-as-code development environment. An enterprise cloud data science integrated development environment (IDE), XDS is a full-fledged workbench for security MLOps, detection engineering, and threat hunting.  

XDS is designed to make it easy for security data scientists and researchers, threat hunters, and detection engineers, to develop, visualize and debug security threat models, detections, and threat hunting code-based playbooks. With XDS, customers are empowered with the tools they need to engineer data and express their domain knowledge to tailor models to their own unique needs.

Figure 3: Collaborative threat data science, detection engineering, and hunting workflows with XDS.

 

XDS provides managed Jupyter notebooks for a variety of data engineering and development use cases using R, Python, and Scala, and supports EMR clusters based on PySpark for performance-optimized distributed big data processing, connected directly to the enriched data and third-party context contained within the Securonix Security Data Lake.

Today, organic processes such as hunting and exploratory data analysis are hard to capture. We have heard from our customers that they have difficulty documenting these processes since they involve highly creative and non-linear exploration. As a result, the knowledge of how a threat was initially discovered, or how a data scientist determined an important feature, is not transferable within the organization. And when the highly skilled personnel capable of performing these functions leave the company, this knowledge is lost. With notebooks, the creative process is documented right with the code, and can be used for training and knowledge transfer, as well as documenting the initial discovery of a breach and the process of incident response.

Notebooks are an important part of the solution since they provide the ability to document otherwise organic processes such as hunting and exploratory analysis. 

Another differentiating factor is that XDS is not just focused on supervised machine learning, like many other approaches currently available. Realistically, data science for security must focus on a broader array of statistical and machine learning methods to be effective in detecting the full breadth of security techniques, improve reproducibility, and address the fact that adversarial behavior is nonstationary, and probability distributions change with time. XDS aims to provide a more comprehensive set of capabilities to support mixed data science approaches, a topic we will discuss in more detail in a future post. 

Backed by Securonix Threat Labs, a team of highly skilled threat researchers, detection engineers, and data scientists, we have built and will continue to add to, a set of sophisticated notebooks that are integrated into the platform. These notebooks (XDS) provide a set of tools to accelerate the variety of functions a security operations team needs, including time series modeling, peer group analytics, hunting playbooks, interactive recommender systems to guide hunting and incident response, and many other.

XDS provides you with unfettered and highly scalable direct access to all of the security data that you collect with Securonix Next-Gen SIEM, UEBA, and Open XDR, and we will also soon be adding the ability to publish your models and detections back to the platform and into production. 

 

What’s Next? 

As one of the first vendors to open up model parameters to users for fine-tuning and optimization, Securonix has always been at the forefront of enabling data science and detection engineering. We have many users who are customizing their own threat chains and models, and some are even creating brand new ones using the tool available in our flagship Next-Gen SIEM. Stay tuned for exciting updates from the Data Science team that can be immediately helpful to you.