[Webinar Transcription] Why Social Media and Darknet Data go Hand-in-Hand for Robust Cyber Investigations

July 18, 2023

In today’s world, the internet is an integral part of everyone’s personal life and even more so of every organization. Over the past several years, social media platforms have come to play a big part of an organization’s strategy and digital footprint as people connect, share information, and express themselves. In addition, the darknet and darknet adjacent platforms have grown in popularity – characterized by anonymity and illicit activities.

In this webinar, DarkOwl CEO, Mark Turnage and Socialgist CRO, Justin Wyman explore how the two interconnect and dive into the topics of:

Data collection and enhanced insights
Online identities and connections
Social engineering and phishing attacks
Reputational risk
Ethical concerns and legal challenges

For those that would rather read the presentation, we have transcribed it below.

NOTE: Some content has been edited for length and clarity.

Kathy: I would like now to introduce Justin Wyman, the CRO for Socialgist, and Mark Turnage, the CEO for DarkOwl. I’m going to turn it over to them to do some introductions and introduce their companies and then we’ll get started. Justin.

Justin: Thank you, Kathy. I really appreciate you putting this together. And thank you, Mark, for joining me in this webinar.

I feel like our two companies are kind of different sides of the same coin in the sense that we both scour the internet looking for online conversation. Socialgist really specializes in what I would call public conversation, people talking on blogs, message boards, forums and social networks about everything under the sun, including brands, political issues, etcetera. We’ve been doing this since 2001. Our goal is to take all the information on the left, package it in that blue box in the middle, and then distribute it to analytics platforms on the right.

We call this DOS or data as a service. Our core values provide high quality global datasets of the world’s online conversations. The key strengths are important for this webinar. It’s very broad, right? 30 plus languages. We provide a lot of context. That means history. And then we really focus on high quality, low spam data collection. A lot of this is looking for a needle in the haystack, and if you don’t have accurate data, then you’ll get a lot of false needles.

This is just a sample of our data sources. The things to understand are there are many different parts of the internet to potentially mine for insights, blogs. Journaling news is where you watch things spread from social media to online media. Videos like YouTube are obviously important forums or threaded conversations or where you see really hobbyist conversations. And then there’s review sites and social networks reviews being people trying to fish in this example, looking for selling competitive products in social networks, being a parlor, true social, those types of things.

Kathy: Mark, would you like to introduce DarkOwl?

Thank you. And it’s a delight to be here. Thanks for hosting, Kathy, and thanks to Justin. We’ve been looking forward to this webinar. DarkOwl, as Justin said, we’re two sides of the same coin. In fact, the presentation that Justin gave, if you just substituted darknet data for all the data sources that he and Socialgist collect, you would get to DarkOwl. We have been collecting data now for well over a decade. We supply that data to our customers. And we, and just for the for the sake of clarity, we only specialize in darknet and related deep web and surface web sites that repost data from the darknet. And we supply that data through our Vision UI or through a range of APIs and data feeds.

This gives you a sense of what we’re talking about. The bottom of that slide is the traditional definition of a darknet, by which I mean our traditional definition is it usually requires a specialized browser to get to. And once you are in those darknets, your user identity is obfuscated and oftentimes the traffic is encrypted. So the beginnings of the darknet traditionally trace back to the Tor network. As you can see, a range of other darknets have arisen for a variety of different reasons. For example, the third one in called ZeroNet is very popular in China. It’s a blockchain based darknet so that the conversations that occur on ZeroNet are actually distributed around a blockchain. And in order to collect data from ZeroNet, you have to actually continually crawl the entirety of the blockchain to recreate a single conversation. And perhaps unsurprisingly, darknets are popular among the criminal groups because of user obfuscation. And with the rise of cryptocurrency, a relatively anonymous currency, it’s the perfect place to do crimes, the deep web and the surface web we also collect from, but we don’t collect from generally from social media and from the sites that Socialgist collects from, which makes us ideal partners. We collect from authenticated websites and the deep web and then some high-risk surface sites, direct messaging platforms, Discord, Telegram, IRC are new platforms for us where we collect data and increasingly a lot of criminal activity is moving to these direct messaging platforms. And I think the topic we want to discuss here today is how does how does the data that DarkOwl collects, how does it fit in in a cyber investigation and in an analytical context, how does it fit with what Socialgist is doing?

Just very briefly, this gives you a sense of the volume of data that is coming out of the darknet that we collect on a daily, weekly, monthly basis. And you can see some of the types of data that we that we collect as well.

Kathy: Thank you for the introductions to both of your our companies. Today we would like to start off with the first talking point of data collection.

Data Collection

Socialgist specializes in collecting and aggregating social media data while DarkOwl focuses on collecting dark web data. Can you both talk to how these two are connected?

Justin: I will start. I think what’s important to understand is that in an increasingly interconnected world, you have what’s called a butterfly effect, which is where small things can snowball very quickly. And if you see the sources that Mark presented on his slide versus mine, you can see a very interconnected world. Now, the thing that Mark and I have spoken a lot about in our partnership together is how often things that are damaging to brands or cyber investigations start in the dark. That’s where they get organized. But you cannot tell if it’s going to have an impact, especially when the intention of the criminals is to battle perception or brand awareness until it bubbles up into the public net. So it’s smart to look at. Have a wide net across dark and public to see what issues are emerging in the darknet and then going into public and going traditional news. Would you agree with that, Mark?

Mark: Absolutely would agree with that. In fact, we find that threat actors regularly use social media to bubble up threats, to bubble up data that they’ve stolen, to bubble up information to the surface web. We also find, by the way, that in terms of identifying threat actors, most threat actors are very, very active on social media. Whether they do that personally or in their professional, quote unquote, capacity as criminals. And we find that it’s very easy to pivot back and forth between the two in trying to identify who they are. And oftentimes we are able to identify them by virtue of their use of social media and the commonality of what they’re doing in social media with what they’re doing in the darknet. And I have some examples we can talk about later on.

Data Insights

What kind of data can come from social media that helps investigators or threat intelligence teams? And what about from the darknet?

Justin: Any person in crisis communications or PR will kind of have two principles. It’s how fast do you get the insight? And how accurate is the insight? What social media does is take that accuracy component and really helps you understand what’s happening. Or accuracy might be another word for validity. So when you see issues that are very important to you bubbling up in social media, then you know that it has momentum. That snowball is building that butterfly effect. So when I think about how the darknet and social web thing work together, it’s about when it pops up into social media or the public web, that is a very big sign of validity or accuracy. So that’s how you can use that to justify what threats are real or not. Because as somebody that’s in cyber investigations, you’ll have a list of 10, 20, 100 issues and you’re constantly trying to see which ones are real or not. And social media data gives you that validity or accuracy. Okay. This is something we need to pay attention to, especially in information warfare.

Mark: The range of data that is available in the darknet that is of interest to analysts and investigators is very broad. The darknet is a primary repository of threat data. It can be data that’s been hacked or stolen from organizations. It can be vulnerabilities that are being bought and sold or discussed in the darknet as a way to get entry into organizations. It can be a wide range of PII that’s available on the darknet for executives and companies. To Justin’s point, there are disinformation specialists who offer services in the darknet. And so I equate the darknet to the sort of 2 or 3 city blocks in every town where all the crimes occur, and we see ourselves as a primary policeman for those types of activities that occur in the darknet. And obviously, the darknet is growing. It’s a growing phenomenon. That chart I showed earlier shows that what started out as the Tor network is now a number of distributed networks. We, as an example, extract data from 25 to 30,000 darknet sites a day into our platform. But to Justin’s point, when you start to see data bubble up into social media or into the surface web from the darknet or from actors who are very active in the darknet, you know that something has happened. You know they’re bubbling it up for a reason. Usually it’s to draw attention to the fact that they have committed an act or extracted data from an organization or are in the middle of a ransomware attack. And you can easily see that when they when it when it bubbles up to the social media level.

Justin: I thought an important point you brought up on your slides was the ZeroNet Chinese aspect of this. We’ve watched us together, as you know, these 2 or 3 blocks as a great analogy, but those blocks are growing. They’re getting more organized and they’re getting more effective. And ZeroNet in China is a great example of how we watch them organize in the dark web, go up into Chinese forums, then go to more of the US public web. And so the question is, at what point in time are you going to be aware of that? Do you want to be aware of it by the time it hits the public web in the US, that’s probably not the speed you want. If you’re a crisis communications person, you probably want to understand that threat in the ZeroNet so you can prepare for it long in advance. You want to understand that threat as early in that kill chain as possible. And that’s the reason why our two platforms work so very well together.

Mark: And by the way, Justin, the example you cited with respect to ZeroNet also applies to the use of Telegram in the Ukraine Russia conflict and the various spiders that have arisen from the primary use of telegram by threat actors on both the Russian side and the Ukrainian side in spilling and leaking data and attacking each other. It then spreads through a broader social media environment and it has changed, frankly, the landscape of how we think about threats and how we pivot, how we see pivoting by threat actors between social media and the darknet. It is amazing to think about. The lag between traditional news and what we know, what’s happening online when it comes to the Ukraine war, the surprise when certain things happen are not nearly surprising to you and I, because we’ve been watching it for a while. We don’t you know, we obviously can’t predict the future, but we can anticipate it better by using that kill chain, as you described.

There’s no question about that. And it is interesting to me. I think if we were executives in traditional social and traditional media companies how to incorporate the speed with which news travels, particularly in social media, would be a real challenge. I know for, for example, that I go to social media when I hear something is late breaking or newly breaking. I go to social media as a first instance. It beats all the mainstream media sources in terms of speed, there’s no question. And, you know, I’m not unusual in relying on that. I think, you know, certainly the younger generation relies on that almost to the exclusion of any other sources.

Justin: Traditional media no longer breaks news. It’s supposed to analyze news and it always struggles when it tries to do the other thing.

Kathy: We’ve had a question come in and someone would like to know, how do you know when a company is being targeted on the dark web?

Mark: That’s a great question. My first part of that answer is oftentimes they’re named in the darknet. We are attacking XYZ company or we have a back door into XYZ company, and here’s some data we’ve always already exfiltrated. So shockingly, the first thing is look in our platform and see what companies are being named as targets. Secondly, threat actors oftentimes will post IP data of targets that they’re targeting. And if you know your IP range, you can see that you’re being actively targeted. But the most common way to know it is threat. Actors will oftentimes extract data out of a company, post it in the surface web, on surface web sites or in social media and say, we have attacked XYZ company and here’s proof of that, and they will put out some embarrassing documents and they will simultaneously, ransomware operators will simultaneously be discussing talking to the company directly and saying, we have a lot more of this data and we’re going to leak it unless you pay us a ransom. And so, you know, this is a case where seeing what’s happening in the darknet and seeing what’s happening in the surface net go hand in hand. And as Justin said earlier, you don’t want to be on the receiving end of that. You don’t want to see your company’s most confidential data already posted or in whole or in part on social on surface web sites. At that point, you’re way behind. Your response is way behind where it should be. So, you know, it’s pretty easy to see what companies are being targeted in the darknet.

Justin: To build on Mark’s point, what’s interesting about information warfare is for it to be useful, you have to at some point make it public. You have to usually increase the value of the data by saying you have the data in some sort of public way. Now, maybe that starts in the darknet or maybe it starts in the public web. But that’s one advantage I guess the good guys have is when somebody has information on you, it’s only valuable when it’s being used publicly. So eventually they will reveal themselves.

Online Identities and Connections

Using social media and darknet data can help can help paint a picture of a cybercriminal or group. How can these data sets and tools. How can you use these data sets and tools in tandem?

Mark: I’ll give you an example. And I’ve referenced this earlier. A few years ago, one of our clients was being subjected to online disinformation campaigns in Latin America that they thought might originate in Russia. And it was actually causing physical attacks on their facilities in Latin America. They asked us to look at that in the darknet and see what we could find out. And this was a threat actor who was who was actually very active on social media in making threats against our client, but also was very active in the darknet. So we started in the darknet and we were able to trace certain activity and certain identities in the darknet, and we pivoted back to social media. We noticed that in the darknet he was using a specific username that was quite unusual, and we pivoted back to social media and started to see if anyone else was using that username in social media. And we did, we found that there was a user using that username on some fairly obscure social media sites.

We then pivoted to those social media sites and as is the case with many social media sites, we were able to identify both an IP address, located in Siberia of all places, and secondly we were able to locate contact details. We then pivoted back to the darknet and said, is this email address that has been identified on this social media site in use anywhere else in the darknet? And we found that that social media site tied directly to one of the darknet accounts that he was using to launch these disinformation attacks on our client. And we pivoted back and forth and back and forth, and we actually finally came up with, believe it or not, a social media post where the actor had not only posted his picture, but we believed in the end that he was actually acting at the behest of the Russian government. Now, that’s a perfect example where identities are in both the darknet and in social media. And to be honest, he was a bit sloppy in doing so. But that’s a hallmark of many criminals is that they can be sloppy and pivoting between. We would not have been able to do that analysis simply using darknet data. We had to pivot to social media and back several times in order to get to the conclusion that this was a Russian threat actor. It was probably acting at the behest of the Russian government in targeting our client.

Justin: I think what’s interesting about this is their job is not that different from most jobs, meaning if you’re going to have an ongoing concern where you’re trying to achieve objectives, then you need to establish an identity that is known in many worlds, right? Just like I’m on LinkedIn, I’m the same person on Facebook, I’m the same person on Instagram. So while they’re a little more opaque than we would be, obviously you still have to be identifiable across these various mediums and that gives a real opportunity for forensic analysis to follow things along that kill chain.

Social Engineering and Phishing Attacks

How does social engineering differ in social media and on the darknet?

Mark: It depends on what the social engineering is being used for. Phishing attacks are usually emails targeting specific individuals or groups of individuals with a view towards attempting to get them to open a data and corrupt their computer and then get access to their network or to the data that’s on their computer. Social engineering refers to broadly identifying those individuals or those targets ahead of time so that those attacks, those phishing attacks can be much more sophisticated. And I’ll give an example. I’ve been subject to social engineering and phishing attacks and a sophisticated attack, an unsophisticated attack. Is somebody sending me an email and saying, hey, you know, click on this article, it’s of interest. It would be of interest to you. A sophisticated attack appears to come from my CFO and says Mark, attached is a file which I need you to urgently look at and call me now.

Now, to get to that latter email, they have done some research on Mark Turnage. They have to know who my CFO is. They have to then build a template that looks as if it’s coming from my CFO. All of that occurs. All of that data is available in the darknet. My email address is available on my darknet. Biographical information about Mark Turnage is available in the darknet. And for most executives, by the way, it’s also available in the surface net. You can go to the DarkOwl website and see who our management team is. It’s very common for companies to post that data. And so pivoting back and forth between the darknet and social media allows the targeting that we are talking about, targeting of executives, targeting of individuals in organizations and in companies to enable criminals to do what they do.

Justin: The thing that scares me, well, Mark, I’m sure you’ve seen this too, is like how little information you need to do social engineering these days. It’s literally like five seconds of audio and you can clone my voice, basically. And Mark and I were talking before this phone call how we have the first, I think, political campaign ever creating somebody else’s voice today for ads, having somebody literally say what they don’t want to say and publishing that on television. So, I think we’re going to live in a world where social engineering and social media is going to be very personalized. To Mark’s point, because we’re all online, we all have identities, and it’s only going to get easier to trick people with more and more realistic content.

Mark: And to use the example that Justin gave, and I think Justin posted it in social media this morning. When you have deep fakes and you can imitate somebody’s voice or somebody’s a video of somebody really well, in an almost undetectable way. The opportunities for phishing attacks grow exponentially because imagine that that example where I get an email from my CFO saying, Mark, I need you to open this file. Imagine that instead of that being an email, it’s a voicemail. It’s or it’s a voicemail attached to an email that sounds exactly like my CFO. The range of the range of potential abuse of that technology is remarkable. I was just amazed, Justin, that the first use of it was a political presidential campaign. That’s the part that was a surprise to me. Not really phishing. It was just politics.

Justin: When we thought we couldn’t go lower, we go a little bit lower.

Reputation Risk

According to a recent report by Deloitte, 87% of executives rate reputational risks as more important than other strategic initiatives. What are your thoughts on that?

Mark: I think if I had to read behind that statistic, I would say I would guess that the reason most executives are worried about reputational risk versus other strategic initiatives is that they don’t control reputational risk to a large degree. Once an attack, say, a misinformation or disinformation attack is mounted on a company and recovering from an attack, a disinformation attack is inherently more difficult than almost anything else. So to Justin’s earlier point, you want to stay ahead of any disinformation attacks. You want to have a plan in place on how to react to them if they do arise. But if you can get early warning signals from social media, from chat rooms, from forums that people are targeting your company or your organization, and it gives you the chance to stay on the front foot as opposed to be on the on the back foot. I mean, am I right about that, Justin?

Justin: I believe so. What’s interesting about that statistic when I read that was in this business, I still remain very optimistic. People are understanding the risks and how they impact their business. I mean, that’s a very impressive number it generates from the C-suite. I believe most of that responsibility was put on the CEO or CFO of the C-suite, meaning they understand that this is a thing they can’t control. The other thing that was embedded in that study that I thought was really important was consumer perception, which reputational risk is kind of like the bigger version of consumer perception. But when it comes to the world of phishing and social engineering, people are really understanding that this is a problem, probably because they’ve seen many of their peers be burned at this point in time and they’re trying to figure out what to do. The big step now is now that we understand, the problem is how do you execute on it? You know, when those people raise their hands, how do Mark and I help them get systems in place that allows them to be protected?

Ethical Concerns and Legal Challenges

What challenges do you both face?

Mark: We at DarkOwl face a set of ethical challenges every day in terms of how do we collect data from the darknet in an ethical manner and make it available to legitimate clients and while respecting the privacy of people whose data has been posted to the darknet. So as an example, we don’t participate in darknet sites where purchase of data is necessary in order to participate because we don’t want to fund the criminal ecosystem. So there are clearly darknet sites that we will not collect data from. What we’re trying to do is return stolen data to its rightful owners or alert them to the threats that are arising from the darknet. So there’s a natural inherent balancing act that we have between privacy concerns, legitimate privacy concerns on one hand and the need to be continuously monitor this environment from which from which many threats arise.

Justin: In our world, we think a lot about the town square and public conversation and how important that is. And I think that when things are in the public square, our biggest ethical concern is not actually on our side. It’s on the people that are providing the public square. So we have major social networks that are creating these environments to have misinformation spread. A lot of other information is spreading as well. But misinformation is also spreading. And I think the thing we’re seeing, the thing that’s concerning to me and my company is that. These large social networks seem to be, as an attempt to save money, get profitable, abdicating their responsibility to moderate this town square. You can’t sell gasoline to a bunch of people and then be upset when everything is on fire. So that is the big concern I’m seeing, is that moderation is going down, which is causing for a rise of disinformation because they’re filling the vacuum that previously wasn’t there.

Kathy: What are the key technological or macro developments in the space to be aware of?

Mark: I mean, you know, everybody is talking about AI, and rightly so, to be honest. If there’s anybody on this webinar who hasn’t been on ChatGPT or any of the look a likes to ChatGPT, I would I would highly encourage you to do so. AI is moving at a very rapid pace and I think critically it will allow, Justin spoke in his introduction earlier, about the noise to signal ratio and the noisiness of data. Both our companies collect so much data that parsing through that noise to get to your particular signal is oftentimes quite challenging, even with the tools that both our companies provide. AI I think will enable investigators and companies to get to that signal much faster and to monitor in a much more comprehensive way. But with all technologies, it’s also used by the criminals. So we were talking about we were talking about deepfakes, but AI can be used in a criminal context as well. So, you know, it’s going to be an interesting challenge going forward to see both how AI is used to protect companies and how AI is used to attack organizations as well.

Justin: I was reading earlier this morning. So they did a study – they think there’s 220 websites, news websites that are just all AI generated at this point in time. So it’s up from like 73 months ago.

It’s like tools go both ways, right? You can create bad content and you can identify bad content. But if we learn the lessons from the previous versions of AI, which was recommendation engines, where the social networks keep generating more and more or surfacing more and more clickable content, which is usually conspiracy based or negative. Well, soon they’re not recommending the content. In that example, they needed a library of content or people creating content, but they’re going to be able to do that on their own in real time and test and then go, oh, this vein is working. Keep going deeper and deeper. That is a massive macro trend that I think is really going to change how we think about information and maybe in a weird way create a rise of journalism again, because we’re gonna need some validation because we can’t trust what’s in our feeds. And then the last one would be the one I just mentioned previously is as this rise of content is happening, social networks seem to be taking a step back from moderation, which again, I think is going to embolden people with ill intent.

Mark: No, I think that’s you know, I think that’s very clear, by the way. Another potential use of AI on the criminal side is if I were going to mount a disinformation campaign on a company or an organization, it can do so using generative AI could very easily generate an extremely professional sounding set of facts that are misinformation or disinformation and can be used in an offensive capability, and you can generate that almost instantly. So to your point earlier, where companies have at the C-suite level have to be cognizant of the risks they’re facing, that’s a massive risk because instead of responding to a disinformation specialist who’s putting out a rumor that your company did X, Y, Z or was involved in X, Y, Z, criminal act or bad act, you could be facing, you know, what looks like a legitimate article with legitimate sounding facts that’s been generated by AI. And then you’re up against a much steeper cliff in terms of responding. So what is interesting is most of these people are opportunistic. They’re taking a misstep and they’re amplifying it. But soon they’re going to be able to or probably today they’re going to be able to create a perceived misstep and amplify that. So you will be under attack from things that you had no connection to. But that won’t change how the consumer perceives you unless you’re very on top of that.