Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Bibliography

All academic papers, research blogs, and technical reports referenced throughout the PyRIT documentation.

Citation Keys
References
  1. Aakanksha, Ahmadian, A., Ermis, B., Goldfarb-Tarrant, S., Kreutzer, J., Fadaee, M., & Hooker, S. (2024). The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm. arXiv Preprint arXiv:2406.18682. https://arxiv.org/abs/2406.18682
  2. Adversa AI. (2023). Universal LLM Jailbreak: ChatGPT, GPT-4, Bard, Bing, Anthropic, and Beyond. https://adversa.ai/blog/universal-llm-jailbreak-chatgpt-gpt-4-bard-bing-anthropic-and-beyond/
  3. Andriushchenko, M., & Flammarion, N. (2024). Does Refusal Training in LLMs Generalize to the Past Tense? arXiv Preprint arXiv:2407.11969. https://arxiv.org/abs/2407.11969
  4. Anthropic. (2024). Many-Shot Jailbreaking. https://www.anthropic.com/research/many-shot-jailbreaking
  5. Aqrawi, A., & Abbasi, A. (2024). Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA). arXiv Preprint arXiv:2409.03131. https://arxiv.org/abs/2409.03131
  6. Lin, K.-H., & ATR Community. (2026). ATR: Agent Threat Rules — Open Detection Standard for AI Agent Threats. 10.5281/zenodo.19178002
  7. Bethany, E., Bethany, M., Flores, J. A. N., Jha, S. K., & Najafirad, P. (2024). Jailbreaking Large Language Models with Symbolic Mathematics. arXiv Preprint arXiv:2409.11445. https://arxiv.org/abs/2409.11445
  8. Bhardwaj, R., & Poria, S. (2023). Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment. arXiv Preprint arXiv:2308.09662. https://arxiv.org/abs/2308.09662
  9. Bhardwaj, R., Anh, D. D., & Poria, S. (2024). Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. arXiv Preprint arXiv:2402.11746. https://arxiv.org/abs/2402.11746
  10. Brahman, F., Kumar, S., Balachandran, V., Dasigi, P., Pyatkin, V., Ravichander, A., Wiegreffe, S., Dziri, N., Chandu, K., Hessel, J., Tsvetkov, Y., Smith, N. A., Choi, Y., & Hajishirzi, H. (2024). The Art of Saying No: Contextual Noncompliance in Language Models. arXiv Preprint arXiv:2407.12043. https://arxiv.org/abs/2407.12043
  11. Bryan, P., Severi, G., de Gruyter, J., Jones, D., Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Fourney, A., Maxwell, W., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., … Kumar, R. S. S. (2025). Taxonomy of Failure Mode in Agentic AI Systems. https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Taxonomy-of-Failure-Mode-in-Agentic-AI-Systems-Whitepaper.pdf
  12. Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Maxwell, W., de Gruyter, J., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., Hines, K., Jones, D., Severi, G., Lundeen, R., … Russinovich, M. (2025). Lessons From Red Teaming 100 Generative AI Products. https://arxiv.org/abs/2501.07238
  13. Bullwinkel, B., Russinovich, M., Salem, A., Zanella-Beguelin, S., Jones, D., Severi, G., Kim, E., Hines, K., Minnich, A., Zunger, Y., & Kumar, R. S. S. (2025). A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks. arXiv Preprint arXiv:2507.02956. https://arxiv.org/abs/2507.02956
  14. Bullwinkel, B., Severi, G., Hines, K., Minnich, A., Kumar, R. S. S., & Zunger, Y. (2026). The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers. arXiv Preprint arXiv:2602.03085. https://arxiv.org/abs/2602.03085
  15. Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., & Wong, E. (2023). Jailbreaking Black Box Large Language Models in Twenty Queries. arXiv Preprint arXiv:2310.08419. https://arxiv.org/abs/2310.08419