Because of their fluid and often unpredictable nature, accurately characterizing and capturing a snapshot of the various darknets can prove rather challenging. Our analysts are currently engaged in an ongoing research project to provide a solution to this, and have successfully begun to map the darknet using quantitative analytics and machine learning techniques.
Today's blog post details more about our darknet analysis project and offers a first glimpse at what we're doing to understand the current state of the darknet.
The challenge and our approach
Hidden services on Tor frequently come and go, as criminals often change their onion addresses to avoid apprehension and many servers are operated out of personal residences with uptime fluctuating with their daily schedules. Quantitative analysis helps to provide us with an indication of whether we are successfully collecting data that is relevant to our customers, improves the greater understanding of the network, and offers opportunity to fine tune collection methodologies.
The majority of darknet sites can be categorized into a couple of dozen subjects, ranging from X-rated content to drugs and social media sites used for communication.
The engine that powers our OWL Vision platform is constantly and intelligently scraping the darknet for new content, using machine learning to categorize the services captured in an effort to understand the shape and feel its current state.
The following chart shows what the breakdown of what roughly 60,000 sites across Tor and I2P look like, with data analyzed up through the August 8, 2017.
While the chart above provides a snapshot of what we've crawled up until now, we also monitor the “breath” of the darknet by tagging sites who have recently updated their content. This metric provides insight into where current events and developments are taking place.
The following chart shows a breakdown of the most recently updated or modified segments of the darknet, only looking at sites which have uploaded new content over the last five (5) days. The total number of onions and eepsites in this summary contains roughly 15,000 unique addresses.
- Interestingly, most of the current activity on the darknet occurs in the Hackers community, with ~26% of the darknet containing hacking-related content or materials.
- Social Media & Chatrooms and File-Sharing services doubled in proportion, relative to the entire address space.
- Wiki-related sites, such as The Hidden Wiki, have a large footprint on the darknet but have little to no activity over the last week, supporting claims that many Wikimirror sites are scams as opposed to links to legitimate darknet onions.
Curious about something you've read on our blog? Want to learn more? Please reach out. We're more than happy to have a conversation.