Dark Web

From Data to Intelligence: Demystifying Dark Web Monitoring Services

Monitoring the dark web is not necessarily a data-access problem; it’s a data sense-making problem. We leverage our extensive cybersecurity expertise to focus on relevant sources and actionable insights. Rather than gathering vast amounts of data, we partner with the best-trusted providers and use our knowledge to prioritize, analyze, and report findings that truly matter to you.

- Christoffer Strömblad
2024-10-29
Insight

Curtains … Open!

Truesec is increasingly asked questions by current and presumptive customers about how we manage our “sources” related to our managed dark web monitoring solution. Basically, they’re asking how they can trust the service to do what it’s supposed to.

Knowing how to trust a service is clearly an important concern. This trust, however, cannot be entirely based on and determined by our access to data, or in other words, our sources. To appropriately answer the question of “trust,” we need to expand the scope of this discussion to include a broader view of the entire problem area surrounding dark web monitoring. You see, this is not necessarily a problem about data access but rather a data sense-making problem.

In this article, I’ll aim to address these questions directly and thoroughly explain our reasoning and processes concerning dark web monitoring.

For Most Organizations, Data Access Is Not the Problem

Let’s begin by stating that dark web monitoring and all that it entails is a rather complex field. The very nature of dark web monitoring makes it challenging to provide exact numbers or guarantees about quality, truthfulness, etc. After all, it’s an intelligence topic, and you can never be sure in intelligence, only more or less so.

In essence, and to a certain extent, data could be considered a commodity. In our opinion, the challenge has more to do with deriving relevant, actionable, and meaningful insights and conclusions from all available data. Deriving such insights will come down to experience, knowledge, and know-how.

That’s not to say that data collection is easy; far from it. It’s a tedious process that requires deliberate and careful consideration every step of the way. There are, of course, differences between one data collector and another; they’ll have different collection strategies and priorities, but ultimately, they’re collecting from the same finite number of sources.

For us, however, as a managed security service provider (MSSP), we don’t differentiate (entirely) based only on our access to data, but rather how we make sense of data when viewed from the larger context and perspective of not a singular customer, but a collective of many. Therefore, we don’t attempt to differentiate based on data but rather on insights about how we can transform all this data into relevant and actionable intelligence for you as a customer.

A Brief History of Truesec (and Why You Should Care!)

Truesec has always been purely a cybersecurity company, built and led by experts founded in 2006 in Sweden. Since then, it has grown considerably and is still very much an expert-led organization, offering managed and professional services. As of late 2024, we employ more than 350 people.

This foundation of being an expert-led company has meant that our managed services have been built on a core principle: they should focus on what matters – expert knowledge and experience. This core, overarching principle leads to a different perspective on cybersecurity.

We focus on cyber defensive actions that matter based on our real-world experience. Our experience comes from responding to incidents (Truesec has the largest incident response team in Northern Europe), investigating and analyzing alerts in various security products (we are THE managed detection and response company in Sweden, and increasingly outside of it!).

With all this said, there’s a historic and very strong culture of being the best at what we do. We have the highest concentration of Microsoft MVP personnel; we host one of the largest cybersecurity summits on our own with our own experts; we have world-renowned security researchers, and so on.

Again, all of this is to say that we know what we’re talking about when it comes to cybersecurity. Now, let’s explore how all of this affects our dark web monitoring capabilities.

Overarching Principles

We know that not all sources are created equal. We also know that the Pareto Principle broadly applies to dark web sources. We can confidently state that 80% of all activity (probably higher) stems from 20% of all sources (probably less).

This leads us to the first principle:

Quality over quantity, always.

Our experience has told us (repeatedly) that this is not a numbers game in the sense that those who can scrape and inventory the most will also be the best. Our experience tells us that what matters is the quality of the sources and how we derive and extract meaning from them. I will return to this later.

This takes us to the second principle:

There is no complete coverage, only confident coverage.

By that, we mean no vendor will ever have complete coverage or guarantees about having the most complete data. In this sense, what matters is that findings that are not false positives are actionable and lead to meaningful decision-making related to your cybersecurity posture.

Confident coverage means precisely that: We have reasonable grounds due to our processes, knowledge and experience, experts, and analysis to assume we have sufficient source coverage. We have adequate coverage and strike an appropriate balance between being cost-effective and complete.

Lastly, our third principle:

Silence is good; transparent silence is better.

Contrary to what can sometimes be gleaned from discussions about threat intelligence, dark web monitoring, etc., this is mostly a quiet place. You shouldn’t expect results to come in every day or even every week. Vendors often inundate you with information to make it seem that their service provides value.

We don’t do that. We provide signals; as we all know, signals should mean little to no noise. Again, we can circle back to our core principle, focusing on what matters.

This, however, has its own set of challenges. If it’s silent all the time, is it working or not, and is it worth it? How do you know? For some, it may NOT be worth it. But there’s something to learn here; you need to get comfortable with silence.

However, what we can do is explain what happened during the silence. Where were you NOT found? Is activity increasing or decreasing across relevant and current sources?

All right, one overarching purpose and three core principles. Let’s keep digging! And before we do that, we need to understand what we’re trying to achieve in the first place. What’s the outcome we’re trying to give you as a customer?

Why We Collect in the First Place

To talk about how and from where we collect, it’s important to begin by discussing WHY we’re collecting. Our managed dark web monitoring service is geared toward a few base types of reportable events:

Leaked Credentials
Leaked Data
Brand Mentions
Keyword Mentions

The first two are somewhat self-explanatory, but the other two are perhaps not as equally evident as to what they mean. Brand mentions are, well, mentions of your brand. But not all mentions are interesting – again, focusing on what matters. A mention of a brand should be relevant and actionable from a defensive perspective.

The most common categories of sources dealing with brand mentions that matter are forums, ransomware and leak sites, and Telegram channels (for now). You’ll find that even when considered together, there will not be a bazillion sources to monitor, but rather a few thousand. And of these, not all will be as relevant and prominent as others.

But where we collect from is ultimately a question of where the criminals are and how they peddle their wares. If you know and follow that, you’ll also find potentially relevant information.

Quality Control of Sources

A number of controls go into the process of adding new sources, finding them, validating sources over time, and extracting data from sources.

Continuous Discovery – New sources are continuously discovered through crawling and parsing previously collected data and when manually discovered.
Prioritization – You must prioritize sources based on relevance, frequency of updates, and potential impact on results.
Validation – Ensure the “quality” of the source, whether it is helpful to the collection, and whether it provides updates.
Categorization – What type of source is it? A new marketplace, forum, leakware blog?
Cross-Referencing – Is the new source just another address for an already existing source?

As you can see, considerable work is involved in managing sources and collecting from new ones. That brings us to partnerships, the primary way through which we collect data.

Data Collection Through Partners

We don’t collect most data ourselves anymore; instead, we collect it through partners. Just like we don’t build our own endpoint detection and response (EDR) tools, we don’t build our own dark web spiders and scraper tools. We build processing, analysis, and reporting tools.

We do, however, maintain access to certain forums and marketplaces because, every so often, we must manually verify a reported finding or require additional contextual information about an event necessary to make an analytical judgment about a particular finding.

But again, our job, our expertise, lies not with the collection but with what we want to collect and how to interpret and analyze collected data. While we have historically built our own collection machinery for scraping and parsing data, we know how difficult that is to do and do well.

Most importantly, we know what type of data we want collected and from which categories. So, we’ve made sure that our data partners can provide us with that. We also trust them to keep up to date with new forums, ransomware “blogs,” and Telegram channels (for example). But it’s a partnership, and every so often, we come across new sources and leaks that we ask them to index and add to their collection tooling.

However, over time and with experience, you also learn which forums, marketplaces, etc., you must collect from. For example, you’re not covering the most critical criminal forums if you’re not collecting from BreachForums, Exploit, RAMP, and XSS. These things you learn from experience.

In the end, we’ve realized that the time spent finding and evaluating new sources is hard work and something we ultimately do better through partners. We know how to process and analyze data to find what’s relevant and what matters.

Analysis and Processing

This is arguably where we spend most of our effort and time. By this stage, we’ve crafted relevant data queries that should extract what we require on behalf of our customers. Now, it’s time to figure out what’s relevant and what’s not.

Most of the time, if queries have been tuned and defined appropriately, they should be fairly quiet. This is one area where there is a certain unjustified expectation. Many companies seem to be under the impression that there’s a ton of data about them out there and that threat actors are continuously discussing their targets and victims.

This is typically not the case. More commonly, it’s silence – nothing to see here! However, this is what you’re paying for. Let’s do an analogy.

If you’re contracting a physical security company to perform surveillance and monitoring of your premises, you’re not expecting there to be burglars every single day, at all hours. In fact, you’re probably not expecting there to be any at all. Yet, some of us value the peace of mind we get when someone keeps an eye out continuously.

The same thing applies here; most dark web monitoring is and should be a low-frequency system, but when and if you find something, you definitely want to know about it. Leaked credentials are one of those categories that may seem as if they should become less relevant as we introduce more components and secure mechanisms to perform authentication.

But the devil’s in the details. Leaked credential monitoring may become even more important as a “hit” may indicate a leaked session token/cookie, nullifying your current multi-factor authentication steps to a certain extent. There are a few ifs and buts here, but it generally holds true. Until we actually get mechanisms to protect session cookies appropriately, leaked credential monitoring will remain relevant.

In the end, analysis is about determining what’s relevant and what’s not, and that’s what we do and provide for you.

Reporting and Data Metrics

Ultimately, we find nothing because there’s nothing to find. If you’ve read this far, you probably intuitively understand that finding nothing takes resources and effort. In times of silence, we can still turn to raw numbers to appreciate what’s happening, even though it doesn’t directly affect the organization.

How many “findings” were generated during a specific timeframe and consequently processed and analyzed by analysts?
What query terms generated the most findings?
How many new data leaks/breaches, or attacks against named entities, have been discovered during the reporting period?

Finally, Continuous Improvement

Monitoring solutions are only as good as they are current and relevant. If we fail to predict or otherwise anticipate cybercriminal behavior as it relates to where they “congregate” to advertise, sell, and buy services, we’ll fail to provide customers with tangible value.

This necessitates that we continuously improve how we evaluate the quality of our current data providers and align customer needs with our collection, processing, and analysis.

Ultimately, it’s a losing game trying to define the length of an unspecified piece of rope. To a certain extent, what matters most is that you do something about this usually rather dark and mysterious place. Doing nothing can be justified, but most of the time, there’s something to find and do about it.

If you’d like to see what we do, how we help you, and what you can do about it, please don’t hesitate to reach out and ask us really hard questions. We might be able to answer; if not, we’ll tell you!