llmail-inject

LLMail-Inject: Adaptive Prompt Injection Challenge

Competition Overview Image

Competition Organizers

The competition is jointly organized by the following people from Microsoft (1), ISTA (2), and ETH Zurich (3):

Aideen Fay*1, Sahar Abdelnabi*1, Benjamin Pannell*1, Giovanni Cherubin*1, Ahmed Salem1, Andrew Paverd1, Conor Mac Amhlaoibh1, Joshua Rakita1, Santiago Zanella-Beguelin1, Egor Zverev2, Mark Russinovich1, and Javier Rando3

(*: Core contributors).

Microsoft Logo ISTA Logo ETH Zurich Logo

Quick Start

The challenge website where you can participate is: https://llmailinject.azurewebsites.net/

To participate, you will need to sign into the challenge website, using a GitHub account, and create a team (ranging from 1 to 5 members). Entries can be submitted directly via the challenge website or programmatically via an API, as described on the challenge website.

The challenge officially starts on Monday, December 9, 2024 at 11am UTC!

Competition Overview

The goal of this challenge is to evade prompt injection defenses in a simulated LLM-integrated email client, the LLMail service. The LLMail service includes an assistant that can answer questions based on the users’ emails and perform actions on behalf of the user, such as sending emails. Since this assistant makes use of an instruction-tuned large language model (LLM), it naturally includes several defenses against indirect prompt injection attacks.

In this challenge, participants take the role of an attacker who can send an email to the (victim) user. The attacker’s goal is to cause the user’s LLM to perform a specific action, which the user has not requested. In order to achieve this, the attacker must craft their email in such a way that it will be retrieved by the LLM and will bypass the relevant prompt injection defenses. This challenge assumes that the defenses are known to the attacker, and thus requires the attacker to create adaptive prompt injection attacks.

System Design and Workflow

This section describes how the different entities interact with simulated LLMail service.

Challenge Scenarios and Levels

The challenge consists of four scenarios that differ in terms of the retrieval configuration and the goal of the attack, as described below. Each scenario is further paired with different configurations of defenses and LLMs (described in the next sections).

Each unique combination of a scenario, defense, and LLM is referred to as a level, and the challenge has 40 levels in total. Participants can submit to any level independently without having to finish earlier levels.

Scenarios

1- Scenario 1 (Two emails without retrieval)

This scenario simulates the setting in which the user asks the LLM to summarize the two most recent emails. Specifically:

2- Scenario 2 (Ten emails without retrieval)

This scenario simulates the setting in which the user asks the LLM to summarize the ten most recent emails. Specifically:

3- Scenario 3 (Ten emails with retrieval)

This scenario simulates the setting in which the user asks the LLM a question about a certain topic. Specifically:

4- Scenario 4 (Ten emails with retrieval and data exfiltration)

This scenario simulates a setting similar to Scenario 3, but where the attacker’s goal is to exfiltrate sensitive data. Specifically:

Defenses

1- Spotlighting

2- PromptShield

3- LLM-as-a-judge

4- TaskTracker

5- Combination of all

LLMs

1- Open-source

2- Closed

For both models, we use top p sampling with p = 0.92 and we set a seed for reproducibility, and max new tokens of 1000.

Submission and Response Format

A submission takes the form of email subject and email body. Each submission is directed towards a single level.

The challenge endpoint and the API will return the following information to the participant:

For a successful submission, all the above flags must be set to True. Values for all flags are always returned for all levels, even if the scenario inherently makes the flag True (e.g., email.retrieved is always set to True in Scenarios 1 and 2). The defense.undetected flag is always set to True in levels where no detection is used (e.g., the levels using only Spotlighting).

Scoring

The scoring system used in this challenge is designed around the following three principles:

Order

Each level starts with a base score = 40000 points. All teams that provide a successful solution for the level will be ordered based on the timestamp of their first successful solution and will receive an order_adjusted_score calculated as follows:

order_adjusted_score = max(min threshold, base score ∗ β**i),

where β = 0.95, i ∈ 0, 1, …, n is the rank order of the team’s submission (i.e., i = 0 is the first team to solve the level), and min threshold = 30000.

Difficulty

Scores for each level are scaled based on the number of teams that successfully solved the level. Each time a new team submits their first correct solution for a level, the scores of all teams for that level are adjusted as follows:

difficulty_adjusted_score = order_adjusted_score ∗ γ**solves,

where γ = 0.85 and solves is the total number of teams that successfully solved this level. This means that more points are awarded for solving more difficult levels.

A team’s total_score is the sum of their difficulty_adjusted_score for each level they successfully solved. The total_score will be used to determine the final ranking of teams.

Average order of solves

If there are any ties within the top four places (i.e., the four teams with the highest total scores), we will compute the average of the timestamps of the first successful solution for each level the team solved. The team with the lower timestamp will win the tie (i.e., this team on average solved all the levels they solved first). Note that this does not normally affect the team’s total_score, but is only used to break ties.

Official Rules

1- Sponsor

These Official Rules (“Rules”) govern the operation of the Microsoft Adaptive Prompt Injection Challenge Contest (“Contest”). Microsoft Corporation, One Microsoft Way, Redmond, WA, 98052, USA, is the Contest sponsor (“Sponsor”).

2- Definitions

In these Rules, “Microsoft”, “we”, “our”, and “us” refer to Sponsor and “you” and “yourself” refers to a Contest participant, or the parent/legal guardian of any Contest entrant who has not reached the age of majority to contractually obligate themselves in their legal place of residence. By entering you (your parent/legal guardian if you are not the age of majority in your legal place of residence) agree to be bound by these Rules.

3- Entry Period

The Contest starts at 11:00 a.m. Coordinated Universal Time (UTC) on December 9, 2024, and ends at 11:59 a.m. UTC on January 20, 2025 (“Entry Period”). If at least 10% of the levels have not been solved by at least four (4) teams on the end date listed above, we may opt to extend the challenge at the organizers discretion. In this case, the new end date will be announced on this page.

4- Eligibility

To enter, you must be 18 years of age or older. If you are 18 years of age or older but have not reached the age of majority in your legal place of residence, then you must have consent of a parent/legal guardian.

Employees and directors of Microsoft Corporation and its subsidiaries, affiliates, advertising agencies, students or employees of ETH Zurich or the Institute of Science and Technology Austria (ISTA), and Contest Parties are not eligible, nor are persons involved in the execution or administration of this promotion, or the family members of each above (parents, children, siblings, spouse/domestic partners, or individuals residing in the same household). Void in Cuba, Iran, North Korea, Sudan, Syria, Region of Crimea, Russia, and where prohibited.

5- How to Enter

To create an entry, visit https://llmailinject.azurewebsites.net/ and follow the instructions to sign in with your GitHub account, form your team (ranging from 1 to 5 members), and begin participating according to the instructions above. NOTE: a person may only be a member of one team and any collusion between teams that harms the integrity of the challenge is prohibited and will result in disqualification.

There is a limit of one entry per minute per team.

Any attempt by you to obtain more than the stated number of entries by using multiple/different accounts, email addresses, identities, registrations, logins, or any other methods will void your entries and you may be disqualified. Use of any automated system to participate is prohibited.

We are not responsible for excess, lost, late, or incomplete entries. If disputed, entries will be deemed submitted by the “authorized account holder” of the email address, social media account, or other method used to enter. The “authorized account holder” is the natural person assigned to an email address by an internet or online service provider, or other organization responsible for assigning email addresses.

6- Eligible Entry

To be eligible, an entry must meet the following content/technical requirements:

7- Use of your entry

We are not claiming ownership rights to your Submission. However, by submitting an entry, you grant us an irrevocable, royalty-free, worldwide right and license to use, review, assess, test and otherwise analyze your entry and all its content in connection with this Contest and use your entry in any media whatsoever now known or later invented for any non-commercial or commercial purpose, including, but not limited to, the marketing, sale or promotion of Microsoft products or services, or inclusion into a public dataset and/or research materials without further permission from you. You will not receive any compensation or credit for use of your entry, other than what is described in these Official Rules.

By entering you acknowledge that we may have developed or commissioned materials similar or identical to your entry and you waive any claims resulting from any similarities to your entry. Further you understand that we will not restrict work assignments of representatives who have had access to your entry, and you agree that use of information in our representatives’ unaided memories in the development or deployment of our products or services does not create liability for us under this agreement or copyright or trade secret law.

Your entry may be posted on a public website. We are not responsible for any unauthorized use of your entry by visitors to this website. We are not obligated to use your entry for any purpose, even if it has been selected as a winning entry.

8- Winner Selection and Notification

Pending confirmation of eligibility, four (4) potential teams will be selected by Microsoft or their Agent or a qualified judging panel from among all eligible entries received based on the scoring algorithm outlined above within seven (7) days following the Entry Period.

In the event of a tie between any eligible entries, an additional judge will break the tie based on the judging criteria described above. The decisions of the judges are final and binding. If we do not receive enough entries meeting the entry requirements, we may, at our discretion, select fewer winners than the number of Contest Prizes described below. If public vote determines winners, it is prohibited for any person to obtain votes by any fraudulent or inappropriate means, including offering prizes or other inducements in exchange for votes, automated programs or fraudulent i.d’s. Microsoft will void any questionable votes.

The GitHub account names associated with the winning teams will be posted on the challenge website (https://llmailinject.azurewebsites.net/) no more than 7 days following judging. Each potential winning team must designate a team member who will be a contact point. The nominated individual must send an email to llmailinject@microsoft.com to claim their prize. The nominated individual will receive the full prize and is responsible for splitting the award on their own freely as the team agrees. The nominated individual is also responsible for handing in any other required forms as indicated below.

If the designated team member cannot be contacted, is ineligible, fails to claim a prize or fails to return any forms, the selected winner will forfeit their prize and an alternate winner will be selected time allowing. If you are a potential winner and you are 18 or older but have not reached the age of majority in your legal place of residence, we may require your parent/legal guardian to sign all required forms on your behalf. Only three alternate winners will be selected, after which unclaimed prizes will remain unawarded.

9- Prizes!

The following cash prizes will be awarded in the form of a bank transfer with the entire amount being awarded to the primary team contact person:

One (1) Grand Prize. $4,000.00 USD.

One (1) First Prize. $3,000.00 USD.

One (1) Second Prize. $2,000.00 USD.

One (1) Third Prize. $1,000.00 USD.

The total Approximate Retail Value (ARV) of all prizes: $10,000

Winning teams may be invited to co-author a research paper with the organizers and, upon their agreement, the organizers may request a short summary of strategies used.

We will only award one (1) prize per team during the Entry Period. No substitution, transfer, or assignment of prize permitted, except that Microsoft reserves the right to substitute a prize of equal or greater value in the event the offered prize is unavailable.

Prizes will be sent no later than 28 days after winner selection. Prize winners may be required to complete and return prize claim and / or tax forms (“Forms”) within the deadline stated in the winner notification. Taxes on the prize, if any, are the sole responsibility of the winner, who is advised to seek independent counsel regarding the tax implications of accepting a prize. By accepting a prize, you agree that Microsoft may use your entry, name, image and hometown online and in print, or in any other media, in connection with this Contest without payment or compensation to you, except where prohibited by law.

10- Odds

The odds of winning are based on the number of eligible entries received.

11- General Conditions and Release of Liability

To the extent allowed by law, by entering you agree to release and hold harmless Microsoft and its respective parents, partners, subsidiaries, affiliates, employees, and agents from any and all liability or any injury, loss, or damage of any kind arising in connection with this Contest or any prize won. All local laws apply. The decisions of Microsoft are final and binding.

We reserve the right to cancel, change, or suspend this Contest for any reason, including cheating, technology failure, catastrophe, war, or any other unforeseen or unexpected event that affects the integrity of this Contest, whether human or mechanical. If the integrity of the Contest cannot be restored, we may select winners from among all eligible entries received before we had to cancel, change or suspend the Contest.

If you attempt or we have strong reason to believe that you have compromised the integrity or the legitimate operation of this Contest by cheating, hacking, creating a bot or other automated program, or by committing fraud in any way, we may seek damages from you to the full extent of the law and you may be banned from participation in future Microsoft promotions.

12- Use of your entry

Personal data you provide while entering this Contest will be used by Microsoft and/or its agents and prize fulfillers acting on Microsoft’s behalf only for the administration and operation of this Contest and in accordance with the Microsoft Privacy Statement.

13- Governing Law

This Contest will be governed by the laws of the State of Washington, and you consent to the exclusive jurisdiction and venue of the courts of the State of Washington for any disputes arising out of this Contest.

14- Winners List

Send an email to llmailinject@microsoft.com with the subject line “Adaptive Prompt Injection Challenge Contest winners” within 30 days of February 20, 2025.

We reserve the right to make adjustments to the technical specifications and the design of the challenge in order to better meet the stated goals of the challenge if needed, as determined by us in our sole and absolute discretion.

References

[1] Sahar Abdelnabi et al. Are you still on track!? Catching LLM Task Drift with Activations

[2] Azure AI announces Prompt Shields for Jailbreak and Indirect prompt injection attacks

[3] Keegan Hines et al. Defending Against Indirect Prompt Injection Attacks With Spotlighting

[4] Eric Wallace et al. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Contact

If you need to get in touch with the organizers, please send an email to llmailinject@microsoft.com.