Navigating the Dark Waters of Leaks and Breaches: The Hidden Challenges of Data Collection

September 17, 2024

It seems like every day a new report is released detailing data has been leaked from an organization. There are very few individuals in the world that do not have some personal data which has been released in a data leak. It is a global problem, and the data leaked can have serious ramifications for the individuals or organizations that are exposed.  

Therefore, it is important that we understand exactly what a leak is, what it means and what challenges there are around collecting them. Furthermore, we need to know what remediation action we should take when our data is bound to be leaked, and understand exactly how our data has made it online and who it is available to. In this blog, we will explore these areas. 

Although terms like “leak” and “breach” tend to be used interchangeably, they do have nuances that explain how the data was obtained, and they do mean different things. There are also several different other definitions which can be used that provide details of how the leak was obtained and what data it might include.   

Leak 

A leak refers to the unintentional or accidental release or exposure of information. It can happen due to a variety of reasons, such as human error, poor security practices, or faulty software. The majority of the time, there is no malicious intent linked to the leak and the information is released in error.  

Examples of leaks can be an organization leaving an FTP server open, or unintentionally releasing private information onto a website. It is not always the case that a malicious actor had identified and obtained this data, but that does often happen.  

One recent example of a leak collected by DarkOwl is the leak of Trello data. Data purported to be from Trello was posted on BreachForums, a hacking forum, on July 16, 2024. According to the post, Trello had an open API endpoint that allowed unauthenticated users to map an email address to a Trello account. Data exposed includes email addresses, names, profile data, user identification numbers (UID), and usernames. According to the threat actor, the leak is from January 16, 2024, and contains 15,111,945 unique email addresses. The threat actor stated that the database is useful for doxing (to publicly name or publish private information (PII) about an unwitting target), noting that email addresses are matched to full names and aliases are matched to personal email addresses. 

Figure 1: Trello Leak on BreachForums 

Breach 

A breach is a deliberate, unauthorized intrusion into a system or network to access, steal, or manipulate data. It is usually carried out with malicious intent by hackers or cybercriminals. This information is then routinely sold or shared online for profit and financial gain. Hackers will often find vulnerabilities in an organization’s network and use these to exfiltrate data. This can be as simple as obtaining a user’s credentials to deploying complex malware. Often the data that is leaked relates to customer data or employee credentials, although other data can also be taken.  

A recent example of breach data obtained by DarkOwl is the National Public Data Breach. Data purported to be from National Public Data (NPD) was posted on BreachForums, once again, on August 6, 2024. According to the post by threat actor Fenice, the full NPD database was breached by SXUL. Data exposed includes full names, dates of birth, physical addresses, phone numbers, and Social Security Numbers.  

The National Public Data leak was first offered for sale by USDoD on BreachForums on April 7, 2024, for $3.5 million USD. The dataset is reported to have 2.9 billion rows and cover data from 2019-2024. USDoD continued to advertise the sale of this data through June 2024. On July 21, 2024, Alexa69 uploaded data from the National Public Data to BreachForums, indicating it came from USDoD’s leak.  

On August 12, 2024, National Public Data disclosed a data security incident believed to have involved a third-party bad actor who hacked into the data late December 2023, and leaking data in April 2024 and July 2024. According to the company’s official statement, the breach contained names, email addresses, phone numbers, and mailing addresses. 

Figure 2: NPD breach advert on BreachForums 

Insider 

An Insider, in this context, is someone who is based within an organization and has access to information or systems and chooses to either release information or share access or assistance with others. There are many reasons that they might do this, but if they do not follow Whistle blower protocols then this is an illegal act.  

These types of leaks can be devastating due to the access that some employees have and the information that they are able to obtain. The data can be released in a variety of ways and is usually made freely available.  

Some of the most famous examples of insider leaks are that of Edward Snowden and Julian Assange, where US classified information was leaked by those individuals to journalists and via their own websites. A more recent example is that of Jack Teixeira, an airman first class of the Massachusetts Air National Guard, who photographed and leaked classified documents on a Discord server which were later shared on other social media networks.  

Figure 3: Image of classified data leaked on Discord 

Ransomware 

Traditionally, ransomware was the act of locking a company’s systems and data and demanding a payment to release that data. However, the modern concept of ransomware is not only locking access to the data but exfiltrating it and also extorting the company in order to not release the data online. This is known as the double extortion technique. However, some groups now only act in terms of releasing the data.  

Ransomware attacks are on the rise with companies of all sizes being possible targets. Most ransomware groups will host a leak site, or shame site, on the dark web where they will list their victims and threaten to release their data if they do not pay. They often provide details of the company, as well as images proving that they have access to the data.  

Unlike other leaks, ransomware leaks tend to be very large in size and contain a full dump of a company’s system. They can include very sensitive information, but often also include documents which provide no real information. Unlike some other leaks, this data is rarely curated, and security experts often have to trawl through this data to establish what exactly has been released and what threat that it poses. However, this should not diminish the huge risk and reputational damage that the release of ransomware leaks poses.  

Below is an example of a Ransomware leak site that DarkOwl collects from. 

Figure 4: Hunter Ransomware leak page 

Scrape 

A scrape is when an individual, usually a threat actor but it also can be security researchers, will scrape data from publicly available websites and amalgamate this to appear as if it is a leak of data.  The information contained in these is all publicly available and can be found using open-source techniques. However, grouping it all together can allow threat actors to use the information for nefarious means and reduce the amount of time that they need to spend researching their targets. It is always recommended that only necessary information is shared by individuals online.  

A recent example of a scraped data leak is the Yellow Pages leak. This was a consolidation of data from yellow pages, which is available online, and released on the dark web. Other companies which have been victim to this kind of activity include LinkedIn. 

Figure 5: Scraped Yellow Pages data available on BreachForums 

Combo 

A combo list is an amalgamation of data that has appeared in other leaks, although the source of the data is not always clear. A combo list traditionally consists of an email address and a password. As it is unclear where the data is from, the leak of this data usually poses a low threat and does not provide much actionable intelligence, although passwords should still be changed.  

However, recently, combo lists from stealer logs have started to be circulated that contain a URL, email address, and password. These pose a larger threat due to the fact that the threat actor could be able to access the site for which the password has been leaked.  

A recent combo list collected by DarkOwl is CHINA COMBOLIST, which was made available on Nulled, on July 26, 2024. According to the post, this data is from China. Data exposed includes email addresses and plaintext passwords. 

Figure 6: Combo list from China 

Although DarkOwl do collect combo lists, we do not prioritize them due to the fact that the data has previously been released and they have limited value. Nonetheless, if an email address appears in a combo list, as the information propagates to additional threat communities, an increase of malicious cyber activity should be expected against individuals represented in the leak. There is also additional risk if the credentials were reused on other systems. 

Stealer Logs 

A stealer is another word for an infostealer, or information stealer. A stealer is “a software-based program, typically malware, that is deployed on victim devices that when executed or downloaded is designed to take credentials, cookies, and sensitive information to take advantage of the victim financially, engage in fraud, and possibly identity theft.” After the stealer has covertly accessed stored information, it will transmit the data back to the cybercriminal.  

Threat actors will make the data stolen through stealer logs available both for free and for sale on both the darknet and Telegram. They will release information which includes, URLs of sites visited, associated usernames or email addresses and passwords as well as cookies. This data can also include details of the software installed on a machine, cryptocurrency wallets, gaming platforms and other data.  

Data from stealer logs is generally fairly fresh and released soon after the data is stolen which provides a higher risk that the passwords released are up to date and have not been changed. They can therefore pose a very high risk to individual, and companies affected.  

Figure 7: Sample of recent stealer log collected by DarkOwl 

Now that we have covered the different types of leaks that are made available, it is important to explore the ways in which these leaks are shared and where this information is available, as this can form part of the risk assessment of the threat posed by the release of the data. In this section the term “leak” will be used generically to cover all types of leaks listed above unless otherwise stated.  

For Sale 

Many leaks are made available on dark web forums and marketplaces for sale. Depending on the data that the threat actor has stolen and the value that they think it will have will depend on the price that it is sold for.  

It is illegal to purchase stolen data unless you are the original owner of the data! 

In some cases, after a period of time and if the seller has made enough money, the data may become freely available, also in some cases other threat actors who have been able to obtain the data will subsequently share it for free on the dark web. However, there are some leaks that never become available for free.  

For Free 

Many threat actors will release data for free on forums and marketplaces. Sometimes they do this in order to increase their reputation in the community or because they do not think that there is much value in the data. If information is made available for free it is considered open-source data and can be collected.  

Ransomware 

If a company does not pay the ransom, ransomware groups will release the data, usually on their leak site, at the time they previously designated. They will make all of the files available for free on the site for others to download. These will likely be collected by security researchers and threat actors alike. The data in these leaks can be used for further attacks or to cause reputational damage.  

There are also some ransomware groups that will seek to make further money off of the data that they have stolen, and they will occasionally make the release of the data available to the highest bidder. This is especially true for high value targets.  

Subscriptions 

Some threat actors will offer subscriptions to the data that they have stolen, this is usually the case with actors who are operating stealer malware. As new logs come in each day, they will offer subscriptions to view this data. Subscriptions can be for varying periods of time form a week to a month to a lifetime subscription.  

Figure 8: Example of a TG channel offering a data subscription 

Reputation/Credits 

Although a threat actor may offer a leak for free, on certain sites you will only be able to access the download link if you use credits which you have earnt on the site. Credits can be purchased or can be earned via reputation on a site, by making posts, sharing data, reacting to other posts, etc.  

Figure 9: Example of required credits to release a leak 

Or not released…. Nation state actors 

There are some leaks that never appear to be released. We know that they happened as the company affected reported the breach to their regulator as they are mandated to do in certain countries, but we never see the data shared on the dark web or in any other area. In most cases it is likely that this information was stolen by a nation-state actor who is using the data for their own intelligence needs. However, some actors may choose to keep the data to themselves for their own reasons.  

It is very important to collect leaks in order to understand what data a company has exposed and therefore what potential risk they have. This is also important on an individual basis as people can be subject to financial crime and identity theft. While threat actors will use this data to commit further crimes, security researchers use this data to protect organizations and companies. However, we all face similar challenges when dealing with this data.  

Volume 

The sheer number of leaks and breaches and others that are released on a daily basis is a challenge in of itself. It is hard to keep up with what has been posted on the various dark web sites, as well as personal websites for certain threat actors. Analysts have to trawl through this data on a daily basis to keep up and then make as assessment about what data is real, verified and will be useful to others. Some data released is much more actionable than other and unfortunately a judgment sometimes needs to be made about what to prioritize. In an ideal world we would be able to mitigate all the risk posed but this simply cannot be done for every single leak. 

Availability 

Availability is also an issue. Often reports with appear in the media highlighting a leak and often people will want access to this leak. However, there can be a variety of reasons why it might not be available. The leak may not have been released. It may be available but only for sale. The data may have been confidentially shared with a third party, either by a threat actor or sometimes law enforcement which means that it is not available to the wider security community.  

Formats 

Due to the nature of leaks, that they can take many different forms, as described above, and come from a variety of different victims the format that the data appears in can provide a challenge. No two leaks are the same and to make sure that you are exporting the most relevant and useful data it is often required to analyze a review the data and normalize it in order to understand what it contains. This can be a difficult process that takes time to achieve.  

Size of data and the slowness of TOR 

Some leaks are very large, particularly those that come from Ransomware attacks. This can pose issues in downloading the data, particularly if it is being shared via TOR. TOR is notoriously slow. Downloading large amounts of data over it is a challenge. It is not uncommon that downloading a ransomware leak with take weeks or months to achieve. However, threat actors do attempt to get around this challenge by providing download leaks to third party file hosting providers or making the download available via torrent.  

DarkOwl actively collects leaks which are freely available and makes these available to our customers to ensure they are able to monitor for any exposure that they might have. We seek to obtain leaks which contain data which is high value and is most likely to be used in ongoing attacks. We actively seek leaks which include PII and offer unique data which is not shared elsewhere.  

Furthermore, we seek to ensure that we collect leaks which a global in nature, not focusing on one geographical location. Every area of the world is at risk from data leak, and we seek to make sure we can support the protection of as many areas as possible.  

We also seek to collect leaks, where possible that are most important to our customers and will pursue leaks wherever possible that are requested. This includes ongoing monitoring of our vast dark web data to identify, as soon as possible, if and when a leak is made available.  

There are several steps that both companies and individuals can take in order to remediate the risk that is posed by data leaks. The following are examples of actions that can be taken.  

  • Freeze your credit report 
  • Create and maintain a strong password policy 
  • Use of password managers 
  • Active monitoring of exposure in leaks 
  • Vigilant for social engineering and phishing attacks 
  • Change passwords if included in a breach, or on a regular basis 
  • Enable 2FA on all available accounts 
  • Limit the amount of personal data that you share online, including social media sites and other sources

Curious to learn more about DarkOwl’s collection process? Contact us.

See why DarkOwl is the Leader in Darknet Data

Copyright © 2024 DarkOwl, LLC All rights reserved.
Privacy Policy
DarkOwl is a Denver-based company that provides the world’s largest index of darknet content and the tools to efficiently find leaked or otherwise compromised sensitive data. We shorten the timeframe to detection of compromised data on the darknet, empowering organizations to swiftly detect security gaps and mitigate damage prior to misuse of their data.