All academic papers, research blogs, and technical reports referenced throughout the PyRIT documentation.
- Aakanksha, Ahmadian, A., Ermis, B., Goldfarb-Tarrant, S., Kreutzer, J., Fadaee, M., & Hooker, S. (2024). The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm. arXiv Preprint arXiv:2406.18682. https://arxiv.org/abs/2406.18682
- Adversa AI. (2023). Universal LLM Jailbreak: ChatGPT, GPT-4, Bard, Bing, Anthropic, and Beyond. https://adversa.ai/blog/universal-llm-jailbreak-chatgpt-gpt-4-bard-bing-anthropic-and-beyond/
- Andriushchenko, M., & Flammarion, N. (2024). Does Refusal Training in LLMs Generalize to the Past Tense? arXiv Preprint arXiv:2407.11969. https://arxiv.org/abs/2407.11969
- Anthropic. (2024). Many-Shot Jailbreaking. https://www.anthropic.com/research/many-shot-jailbreaking
- Aqrawi, A., & Abbasi, A. (2024). Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA). arXiv Preprint arXiv:2409.03131. https://arxiv.org/abs/2409.03131
- Lin, K.-H., & ATR Community. (2026). ATR: Agent Threat Rules — Open Detection Standard for AI Agent Threats. 10.5281/zenodo.19178002
- Bethany, E., Bethany, M., Flores, J. A. N., Jha, S. K., & Najafirad, P. (2024). Jailbreaking Large Language Models with Symbolic Mathematics. arXiv Preprint arXiv:2409.11445. https://arxiv.org/abs/2409.11445
- Bhardwaj, R., & Poria, S. (2023). Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment. arXiv Preprint arXiv:2308.09662. https://arxiv.org/abs/2308.09662
- Bhardwaj, R., Anh, D. D., & Poria, S. (2024). Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. arXiv Preprint arXiv:2402.11746. https://arxiv.org/abs/2402.11746
- Brahman, F., Kumar, S., Balachandran, V., Dasigi, P., Pyatkin, V., Ravichander, A., Wiegreffe, S., Dziri, N., Chandu, K., Hessel, J., Tsvetkov, Y., Smith, N. A., Choi, Y., & Hajishirzi, H. (2024). The Art of Saying No: Contextual Noncompliance in Language Models. arXiv Preprint arXiv:2407.12043. https://arxiv.org/abs/2407.12043
- Bryan, P., Severi, G., de Gruyter, J., Jones, D., Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Fourney, A., Maxwell, W., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., … Kumar, R. S. S. (2025). Taxonomy of Failure Mode in Agentic AI Systems. https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Taxonomy-of-Failure-Mode-in-Agentic-AI-Systems-Whitepaper.pdf
- Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Maxwell, W., de Gruyter, J., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., Hines, K., Jones, D., Severi, G., Lundeen, R., … Russinovich, M. (2025). Lessons From Red Teaming 100 Generative AI Products. https://arxiv.org/abs/2501.07238
- Bullwinkel, B., Russinovich, M., Salem, A., Zanella-Beguelin, S., Jones, D., Severi, G., Kim, E., Hines, K., Minnich, A., Zunger, Y., & Kumar, R. S. S. (2025). A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks. arXiv Preprint arXiv:2507.02956. https://arxiv.org/abs/2507.02956
- Bullwinkel, B., Severi, G., Hines, K., Minnich, A., Kumar, R. S. S., & Zunger, Y. (2026). The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers. arXiv Preprint arXiv:2602.03085. https://arxiv.org/abs/2602.03085
- Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., & Wong, E. (2023). Jailbreaking Black Box Large Language Models in Twenty Queries. arXiv Preprint arXiv:2310.08419. https://arxiv.org/abs/2310.08419