Bibliography - PyRIT Documentation

All academic papers, research blogs, and technical reports referenced throughout the PyRIT documentation.

References¶

Aakanksha, Ahmadian, A., Ermis, B., Goldfarb-Tarrant, S., Kreutzer, J., Fadaee, M., & Hooker, S. (2024). The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm. arXiv Preprint arXiv:2406.18682. https://arxiv.org/abs/2406.18682
Adversa AI. (2023). Universal LLM Jailbreak: ChatGPT, GPT-4, Bard, Bing, Anthropic, and Beyond. https://adversa.ai/blog/universal-llm-jailbreak-chatgpt-gpt-4-bard-bing-anthropic-and-beyond/
Andriushchenko, M., & Flammarion, N. (2024). Does Refusal Training in LLMs Generalize to the Past Tense? arXiv Preprint arXiv:2407.11969. https://arxiv.org/abs/2407.11969
Anthropic. (2024). Many-Shot Jailbreaking. https://www.anthropic.com/research/many-shot-jailbreaking
Aqrawi, A., & Abbasi, A. (2024). Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA). arXiv Preprint arXiv:2409.03131. https://arxiv.org/abs/2409.03131
Lin, K.-H., & ATR Community. (2026). ATR: Agent Threat Rules — Open Detection Standard for AI Agent Threats. 10.5281/zenodo.19178002
Bethany, E., Bethany, M., Flores, J. A. N., Jha, S. K., & Najafirad, P. (2024). Jailbreaking Large Language Models with Symbolic Mathematics. arXiv Preprint arXiv:2409.11445. https://arxiv.org/abs/2409.11445
Bhardwaj, R., & Poria, S. (2023). Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment. arXiv Preprint arXiv:2308.09662. https://arxiv.org/abs/2308.09662
Bhardwaj, R., Anh, D. D., & Poria, S. (2024). Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. arXiv Preprint arXiv:2402.11746. https://arxiv.org/abs/2402.11746
Brahman, F., Kumar, S., Balachandran, V., Dasigi, P., Pyatkin, V., Ravichander, A., Wiegreffe, S., Dziri, N., Chandu, K., Hessel, J., Tsvetkov, Y., Smith, N. A., Choi, Y., & Hajishirzi, H. (2024). The Art of Saying No: Contextual Noncompliance in Language Models. arXiv Preprint arXiv:2407.12043. https://arxiv.org/abs/2407.12043
Bryan, P., Severi, G., de Gruyter, J., Jones, D., Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Fourney, A., Maxwell, W., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., … Kumar, R. S. S. (2025). Taxonomy of Failure Mode in Agentic AI Systems. https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Taxonomy-of-Failure-Mode-in-Agentic-AI-Systems-Whitepaper.pdf
Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Maxwell, W., de Gruyter, J., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., Hines, K., Jones, D., Severi, G., Lundeen, R., … Russinovich, M. (2025). Lessons From Red Teaming 100 Generative AI Products. https://arxiv.org/abs/2501.07238
Bullwinkel, B., Russinovich, M., Salem, A., Zanella-Beguelin, S., Jones, D., Severi, G., Kim, E., Hines, K., Minnich, A., Zunger, Y., & Kumar, R. S. S. (2025). A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks. arXiv Preprint arXiv:2507.02956. https://arxiv.org/abs/2507.02956
Bullwinkel, B., Severi, G., Hines, K., Minnich, A., Kumar, R. S. S., & Zunger, Y. (2026). The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers. arXiv Preprint arXiv:2602.03085. https://arxiv.org/abs/2602.03085
Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., & Wong, E. (2023). Jailbreaking Black Box Large Language Models in Twenty Queries. arXiv Preprint arXiv:2310.08419. https://arxiv.org/abs/2310.08419