The Dark Side of AI

August 14, 2024

The speed and scale at which large language models (LLMs) have captured the attention of investors, the public, and tech startups cannot be overstated. This technology will undoubtedly revolutionize not only our personal interactions with technology but also business, analysis, medicine, and nearly every industry in some capacity. However, there is a dark side to this revolutionary technology. As the founder of Linux, Linus Torvalds, often stated, “With great power comes great responsibility.”

Bad actors have already begun leveraging these LLMs for nefarious purposes, and cybersecurity professionals have been highlighting proof-of-concepts to warn of various ways these models can be exploited. This blog will point out some of the early examples and known vulnerabilities of these LLMs and speculate on where they could lead us in the not-so-distant future.

Types of LLM Exploitation

Prompt Injection – Exploitation of LLMs for nefarious purposes has manifested in various forms. One prevalent method is prompt injection, which ranges from straightforward to sophisticated techniques. At its basic level, malicious users attempt to bypass security filters by presenting prompts in a manner that deceives the LLM into providing unintended responses. For instance, they might present multiple prompts where one is benign and the other malicious, hoping the model responds to the malicious one. This creates a continual challenge for developers and security experts who must constantly adapt to new tactics. On the more advanced end of prompt injection, techniques like encoding malicious questions in base64 can evade security measures by using encoded prompts to evade detection of harmful content. Developers are made aware of these types of obfuscations and appear quick to mitigate them in later releases.

Jailbreaking – Jailbreaking GPT models and prompt injection share significant similarities. When bad actors successfully devise a sophisticated set of prompts that circumvent multiple security measures, such as crafting prompts to generate malware, and the LLM consistently responds, it qualifies as a “jailbroken LLM.” In this compromised state, the model retains these malicious prompts, enabling users to interact with it normally while evading security filters. These jailbroken models are actively traded on the dark web for various illicit purposes.

One notorious example is FraudGPT, prominently featured on the dark web. It purports to execute a wide range of malicious activities, including generating phishing emails, creating keyloggers, producing malware in multiple programming languages, obfuscating malware to evade detection, scanning websites for vulnerabilities, crafting phishing pages, and more. The below image was extracted from a dark web site selling subscriptions of their version of a jailbroken LLM. If you want to learn more about Jailbreak GPTs you should check our a previously written DarkOwl blog that dives deeper into these GPTs.

Training Data Poisoning – Emerging as a significant concern for cybersecurity professionals and LLM engineers, in this method, threat actors and black hat hackers introduce malicious data during the model training process. This tainted training data becomes embedded in the model’s algorithms, eliminating the need for prompt injection. Consequently, malicious or unsafe responses are ingrained in the model’s core functionalities. Depending on the nature of the maliciously infused data, this could potentially enable outputs ranging from the generation of malware to the production of deepfakes and dissemination of misinformation directly from the model’s core algorithm.

Leakage – Leakage refers to various methods that enable LLMs to return sensitive data inadvertently captured during training, which was not intended for redistribution to users. This includes access tokens, personally identifiable information, cookies, and other data types assimilated during the model’s training phase. Such leaks can happen through prompt injection or more advanced techniques. Below is an example posted on X of a user whose cell phone number was captured and used in the output of an OpenAI ChatGPT response. As these models get access to more and more user data, you can imagine the impact of these leakages becoming even more concerning.

AI Agents – This represents a slightly more sophisticated form of exploitation compared to our previous examples. With the rise of AI integration in programming and its accessibility via APIs, there is a burgeoning interest in “AI Agents.” These agents operate autonomously and sometimes possess special privileges on the host computer. For instance, a program could scan files, read data, copy logs, inspect system defenses like Windows Defender, and relay this information to an LLM. Each “agent” is tasked with retrieving specific information—such as scanning logs for leaked passwords or identifying vulnerabilities in a WordPress instance running on a server—using the LLM model. Finally, another agent might execute commands on the host computer based on the information gathered. These agents perform autonomous actions, resembling a sophisticated virus operating intelligently within your environment.

Summary

As we explore the realm of Large Language Models, their rapid advancement offers promising potential across diverse industries—from streamlining business processes to advancing medical diagnostics. However, alongside these opportunities, there are significant challenges. Malicious actors exploit vulnerabilities such as prompt injection and training data poisoning, utilizing these powerful tools for cyber threats and manipulation. It’s crucial to remain vigilant and aware of potential misuse of these tools and mitigate the risk—from potential data breaches to orchestrated misinformation or even AI agent malware.