Introducing Enhanced Forum Structuring: An Interview With our Product and Client Engagement Teams

March 13, 2023

In honor of the launch of our newest product feature, our marketing team sat down with DarkOwl’s Director of Client Engagement, Caryn Farino and Product Manager, Josh Berman to learn more.


Thanks for sitting down with me today! Let’s start with some intros.

Josh: I’m Josh Berman. I’m a Product Manager here at DarkOwl. I’ve been with the company a little over five years – five and a half years. My background prior to this was in digital forensics, and before that, audio engineering. But more recently, got into cybersecurity and started here as a Product Engineer, then moved into product management, where I’ve been for a couple of years.  

Caryn: My name is Caryn Farino. I’m the Director of Client Engagement here at DarkOwl and have been with the organization for just over 2.5 years. I currently manage all of our client relationships. My background is in OSINT, so I am really excited about a lot of the work that DarkOwl does to highlight darknet specific activity.  

Let’s dive into our first question. What are we talking about when we talk about “forums” and “forum structuring”?  

Josh: The old way of doing things was when we would collect a webpage and just scrape all the text out and give that to our clients. The advantage of that was it was more simple from a development point of view and allowed us to really focus on depth and breadth of our data. It was the first step in all of this. From a user perspective, that makes it difficult to understand what you’re looking at – there’s a lot of text on a forum page or a marketplace page or ransomware page. Pretty much anything you’re looking at that is not relevant to what you’re actually looking for. So something like following a forum thread on a document that’s a wall of text is very difficult. Not a lot of fun.  

Forum structuring basically takes out the parts of the page that are irrelevant. So the actual thread, usernames, post-dates, things like that and structure them into our data store in an easier to interpret and interact with way so people can do things like sort and filter by post-date rather than just when we found it, see other activity by that user, specifically what they posted, search within a post and not just on the entire page, etc. It’s a big advantage in terms of how we’re presenting the data and how the users interact with it and how they can understand it.  

Caryn: I would just add on, forums by design are discussion boards. They allow users to create topics and engage in conversations. Because there’s a lot of consistency in that layout, we want to try to replicate that experience for our users. With this revamp of our forum data, we’re allowing our clients to now navigate our data like they would on a forum to be able to look at those individual posts, reconstruct the thread, and look at what other activity might be associated to that user on that board.  

Figures 1 and 2 (left to right): Previous view of a thread versus new enhanced view

Why is having access to this data important in the first place?  

Caryn: There’s a lot of different types of darknet forums, so we’re going to have a variety of different use cases for our clients. Some of the more prominent boards are going to have data leaks, we’re going to have highly technical communities talking about and engaging in hacking and exploit development. We’ll also see traditional fraud use cases – threat actors focusing on banking fraud, healthcare fraud, identity theft, and so on. There’s just a lot of different activity going on on these forums. We really want to be able to expose all of this for our clients to make sure that they understand what these threats are and what information is being put out there, so that they can feed into their threat model frameworks and cyber risk programs.  

Josh: I don’t think I can say much better than that. Criminal stuff happens on these forums and it’s important for not just law enforcement to be able to see these, but cyber security companies looking after their own security need to be able to see this information as well. It’s important for them to see what’s going on on these forums, what people are talking about, and what threat actors are targeting, especially if it is their own business, their employees, or clients.  

What enhancements have been made on the backend to our form processing?  

Josh: Basically, we are treating forum threads post by post rather than page by page. Page by page, like I said, makes it difficult to really track what’s going on. We used to treat the entire page as the same blob of text, whereas now we’re treating it as post by post so we can extract things like the usernames, the post dates, the post body, things like that. This makes it easier to search within and makes it easier to reconstruct that thread in chronological order – to interpret what’s actually going on, rather than looking at an entire page trying to figure out what page it’s related to.  

Caryn: I’ll just highlight that because of that work that our product and engineering teams have done, the presentation layer now within the user interface is a much more streamlined experience for our users to be able to navigate all of that data in an easier method. This is also mirrored for our API clients, giving them the same opportunity to search and present forum data without complex queries. 

Why did the team focus on these improvements? 

Caryn: In working with our clients over the years, we’ve gotten a lot of feedback surrounding document post dates. So, with these improvements, we’ve added in dual capabilities, so clients have the ability not only to see when we’ve crawled that data, but when the data was posted by these forum actors. That really allows clients to look and dive into more specific timelines when they find information of concern.  

What are some of the new features that you both are most excited about?  

Josh: For me, it’s the thread reconstruction. So back to what I said earlier about page by page – there’s really no way to link one page to another. So, a site, a forum on the darknet, might have ten pages in a thread and you might stumble upon page three. Well, how do you find page one, page seven, etc.? There was not really a good way to do that without our thread reconstruction. We’ve now taken care of all of that for you. So regardless of what page it was posted on, if it’s part of the same thread, we can reconstruct that in chronological order. So that’s definitely a feature I’m most excited about.  

Caryn: I would say, for our DarkOwl clients, I think they’re also going to be most excited about that feature as well – the simplicity to be able to navigate and reconstruct all information that was part of a specific discussion/thread. As an analyst, I would say I’m personally excited about the ability to pivot and look at what else the user has said on that forum. I think that’s an extremely valuable add-on to not only look at the posts and threads themselves but to look at what other activity that individual is involved in. We’re also extracting all of the usernames that are within the thread itself. That allows more social network analysis on threat actors communicating on the thread or a specific topic.  

Josh: The other thing I was going to mention was the post-date sorting and filtering. People don’t generally care as much about when we found something, they care when it was actually posted. So maybe we found something yesterday that was posted five years ago. Not really a big deal, but these improvements allow people to show things that were actually posted for the first time within a certain time period. So whatever time period they’re interested in, they can filter to that range. They can sort by post-date to see the most recent stuff first. So it makes it a lot easier to get fresh and relevant data.  

Any other thoughts on how you both see current clients utilizing this?  

Caryn: I want to start with saying that within the last few days, we’ve gotten an overwhelmingly positive response from our clients on these new features. Structured data just overall is easier to work with. But I think the biggest benefit this is going to have is that by breaking out these forum posts into individual documents, we’re going to offer our clients a more concise result set where they can guarantee that their keywords are going to appear in that post, as opposed to scattered across the thread. That’s going to save analysts time in sifting through potentially non-relevant results to find the actual data they care about. And then further, with the addition of the forum usernames to our existing user search feature, clients can now look at what else those threat actors are posting, leading to a more robust dataset to work with. So if you find your keywords in a post, you can quickly create a repository of other activity by that actor. For example, if a threat actor is discussing what organizations are vulnerable to a certain CVE, that triggers your alert, and that same user is later posting on another forum about domain admin or local admin access for sale, but doesn’t list that organization (only location or industry), you can now use that information to support a connection, where you wouldn’t have historically been able to tie those two results together by keyword alone. 


Learn how this enhanced feature can save your analysts time. Contact us.

See why DarkOwl is the Leader in Darknet Data

Copyright © 2024 DarkOwl, LLC All rights reserved.
Privacy Policy
DarkOwl is a Denver-based company that provides the world’s largest index of darknet content and the tools to efficiently find leaked or otherwise compromised sensitive data. We shorten the timeframe to detection of compromised data on the darknet, empowering organizations to swiftly detect security gaps and mitigate damage prior to misuse of their data.