In the digital age, understanding user behavior and engagement within online communities is crucial for any OSINT or dark web investigator. Increasingly, Telegram channels have been used by threat actors to communicate, sell illicit goods, share disinformation, and generally communicate among other activities. Monitoring of these channels is important to track the activities of these groups and mitigate any threats they may pose to individuals and/or organizations.
However, the amount of data that can be included in these channels can be very large in volume. DarkOwl, therefore, wanted to establish if AI (artificial intelligence) could be used to analyze the data included in a specific channel and what could be discerned from that data.
Disclaimer: This blog is based on A.I analysis of a Telegram channel content. This covered the period of August 2022 to February 2024. DarkOwl does not take responsibility for the analysis conducted and is shared for informational purposes only. Other AI platforms are available.
The Telegram channel “Star Fraud Chat” was selected as a channel to analyze to see what information could be gleaned about this group from their activity on Telegram. This group have been attributed by security researchers to a group known as Scattered Spider which is assessed to have been responsible for the attack on MGM Casinos. Furthermore, they are assessed to be part of a group which describes itself as “the Com” who are reported to conduct cyber attacks as well as cyber bullying and swatting attacks.
DarkOwl analysts used open-source tools to extract a download of the Telegram chat content from when it was started in August 2022 to February 2024. This generated a CSV file which included the usernames of participants, timestamps of activity, content, media shares, replies, and reactions. ChatGPT was then used to analyze the CSV file, with specific questions being asked to identify what information could be gleaned.
This blog was written with a mix of AI analysis produced directly by ChatGPT as well as analyst review and enrichment.
ChatGPT was asked to analyze the CSV file to see what information could be found. It initially had issues due to the formatting of the document but was able to overcome these issues after a few tries.
ChatGPT said:
The initial attempt to read the CSV file encountered issues due to irregularities in the data format, such as varying numbers of fields per line and improperly escaped or quoted strings. By inferring the delimiter and adjusting parsing strategies, we successfully re-parsed the data using a semicolon (;) as the delimiter. This correction allowed us to structure the dataset properly, revealing 56,210 rows and 42 columns, including information on message IDs, display names, user IDs, message texts, timestamps, and reactions.
First, we wanted to understand which topics were most commonly being discussed in this Telegram channel to understand what threat that the users of this channel could pose. We therefore asked ChatGPT to analyze the message content itself and provide us with a summary of what was discussed.
ChatGPT said:
The most common words in the messages are general and do not provide specific insight into the topics discussed. To better understand the themes, we should look for specific terms and phrases related to the context of the chat.
We wanted more information, so we then asked ChatGPT to do a deeper analysis of the text used related to fraud and take samples of the data. ChatGPT itself suggested ways in which we may be able to conduct further analysis. The output was as follows:
ChatGPT said:
The chat primarily revolves around fraudulent activities, including the sale of financial and personal information, hacking services, social engineering techniques, and requests for various account types. There is also a market for managing and providing access to certain panels or services related to these activities. This environment indicates a highly active and diverse exchange of illegal services and information.
Once the document had been reviewed and we understood what information was being discussed, we then wanted to understand more about the users of the channel. We began to ask ChatGPT specific questions about the users starting with an analysis of which users were the most active.
ChatGPT said:
The analysis determined the most active users based on the number of messages sent. The top 10 most active users are:
1. **”.”** – 1,076 messages
2. **netbox** – 1,062 messages
3. **ㅤ** (a whitespace character) – 718 messages
4. **MY UZI HOLDS A HUNDRED ROUND CONSCIENCE** – 697 messages
5. **ʹ** (a single quote) – 504 messages
6. **Osama (GENERATING)** – 469 messages
7. **Perc** – 468 messages
8. **bloed** – 347 messages
9. **Systoned** – 327 messages
10. **boomb** – 320 messages
These users have significantly contributed to the conversation, making them central figures in the community.
The analysis highlighted that many of the most active users didn’t have a specific display name, with punctuation or blank spaces being used. This analysis also did not account for the fact that users are able to change their display names, so there may be messages within the data from the same user that appear to be different. It is possible to conduct this analysis based on the user IDs as that was included in the data extracted from Telegram.
The admins of the account are listed on the Telegram channel, this includes a bot which is likely used to administer new joiners etc. None of these appear in the most active users:
We then wanted to establish if we could look at the activity on the channel from a different perspective, not just who had posted in the chat, but who had provided a reaction. Telegram users can react to posts on Telegram using emojis etc. We looked at which users post had received the most reactions from other users.
ChatGPT said:
Reactions to messages indicate the level of engagement and resonance of user contributions. The users who received the most reactions are:
1. **Star Sanctuary** – 34 reactions
2. **netbox** – 15 reactions
3. **Jack** – 12 reactions
4. **ㅤ** (a whitespace character) – 9 reactions
5. **.** (a period) – 8 reactions
6. **Perc** – 6 reactions
7. **digital cannibal** – 6 reactions
8. **MY UZI HOLDS A HUNDRED ROUND CONSCIENCE** – 6 reactions
9. **Stardusts in my memoryCard** – 5 reactions
10. **Panama Creative** – 5 reactions
This list highlights the users whose contributions garnered the most engagement from others.
This did include posts by some of the admins, the number of reactions given the volume of posts appeared to be relatively low.
We then wanted to understand how the users were interacting to do this we asked ChatGPT to analyze replies between users. However, this did not take into consideration the nature of the posts, as some posts may garner more interest than others. Added context would be needed to understand how they users were interacting.
ChatGPT said:
Understanding who interacts with whom provides insights into the community dynamics. The top user interactions, based on reply activity, include:
1. **’** replied to **fan (real)** – 6 times
2. **Perc** replied to **netbox** – 5 times
3. **nef** replied to **boxingloves** – 5 times
4. **Systoned** replied to **Osama (GENERATING)** – 5 times
5. **ㅤ** (whitespace character) replied to **Osama (GENERATING)** – 4 times
6. **Osama (GENERATING)** replied to **Systoned** – 4 times
7. **ㅤ** replied to **geekypoo** – 3 times
8. **Osama (GENERATING)** replied to **ㅤ RR BB** – 3 times
9. **King** replied to **𐕣₩** – 3 times
10. **netbox** replied to **Uno[Shop in Bio]** – 3 times
These interactions highlight the most engaged pairs of users within the community.
However, this again did not take into consideration that display names could change, nor does it factor in who made the most replies. Which user generally received the most replies and to what content. It is possible that this information could be obtained from the data using ChatGPT with more precise questions.
We wanted to see if it was possible to visualize these interactions, network analysis is a really useful tool for analysts see connections between individuals and identify who is communicating with who and if patterns can be identified. So, we asked ChatGPT to create this visualization and to clean it up.
ChatGPT said:
A network graph was created to visualize the interactions between users. Each node represents a user, and each directed edge indicates that one user has replied to another. This graph illustrates the structure of interactions, showing which users are more centrally involved in conversations and how clusters of users interact more closely with each other.
Being able to review this information can be incredibly useful for analysis and gives us an insight into which users in the StarFraud chat interact the most and are the most active on the channel. Being able to manipulate this graph and see which users are most active could give analysts a starting point for which users it may be most useful to investigate.
We then wanted to know if it was possible to identify the time that the chat was most active, as this could maybe help us identify which time zone the users were in helping us to narrow down a location. We therefore asked ChatGPT to review the timestamps.
ChatGPT said:
Analyzing the timestamp data revealed that users are most active at 1 AM, with a total of 3,973 messages sent during this hour. This peak in activity suggests that the majority of users might be located in a specific time zone or find this time particularly convenient for engaging in conversations.
However, it was not possible to ChatGPT to identify the time zone without additional context. The timestamp provided by Telegram appeared to use GMT however which would indicate that most of the posts were made during the early hours of the morning in Europe or late afternoon US time.
As this analysis was inconclusive, we wanted to see if there was any information in the messages sent which may link to a location. So, we asked ChatGPT to look for this information within the posts.
ChatGPT said:
A search for location-related keywords identified 152 messages referencing specific locations. Examples include:
– ““victory park” its a 2 story police department…”
– “u in middle of street”
– “ghost town”
– “smh u the same ni**a scared to meet up and we …”
– “**🎉 Get the Best RDP/VPS Deal with StealthRDP!…”
These references vary in context, ranging from conversational mentions to specific addresses or locations.
This analysis did not add much value as the locations mentioned were very generic in nature, again more specific questions and directions may be beneficial to improve results here, perhaps focusing on country or city locations rather than more generic terms.
The analysis of StarFraud Chat provides valuable insights into user activity, engagement, and interaction patterns. By understanding the most active users, the times of peak activity, and the dynamics of user interactions, we can better understand how this group is operating and where best to focus our analysis. We can also analyze the topics discussed to understand what threats this group poses.
This also highlights how AI can be used by analysts to assist in their investigations, allowing them to speed up how long it takes so review large amounts of data. However, these specific examples also highlight the importance of asking AI models very specific questions and ensure they understand the information you are seeking to obtain, these models are only as good as the seed questions that are being asked.
Products
Services
Use Cases