Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Bibliography

All academic papers, research blogs, and technical reports referenced throughout the PyRIT documentation.

Citation Keys

Aakanksha et al., 2024Adversa AI, 2023Andriushchenko & Flammarion, 2024Anthropic, 2024Aqrawi & Abbasi, 2024Lin & ATR Community, 2026Bethany et al., 2024Bhardwaj & Poria, 2023Bhardwaj et al., 2024Brahman et al., 2024Bryan et al., 2025Bullwinkel et al., 2025Bullwinkel et al., 2025Bullwinkel et al., 2026Chao et al., 2023Chao et al., 2024Cui et al., 2024Apart Research, 2025Derczynski et al., 2024Ding et al., 2023Rehberger, 2024Rehberger, 2025Gehman et al., 2020Ghosh et al., 2025Ghosh et al., 2025Gong et al., 2025Gupta et al., 2024Haider et al., 2024Han et al., 2024Hines et al., 2024Inie et al., 2025Ji et al., 2023Ji et al., 2024Jiang et al., 2025Jones et al., 2025Kingma & Ba, 2014Li et al., 2024Li et al., 2024Li et al., 2024Lin et al., 2023Liu et al., 2024Liu et al., 2024Munoz et al., 2024Luo et al., 2024Lv et al., 2024Mazeika et al., 2023Mazeika et al., 2024McKee & Noever, 2024Mehrotra et al., 2023Microsoft Security Response Center, 2024Mozilla 0DIN, 2024Palaskar et al., 2025Pfohl et al., 2024Webster, 2025Priyanshu, 2024Roccia, 2024Röttger et al., 2023Röttger et al., 2025Russinovich et al., 2024Russinovich et al., 2025Scheuerman et al., 2025Shaikh et al., 2022Shayegani et al., 2025Shen et al., 2023Sheshadri et al., 2024Souly et al., 2024Alexandersson, 2023Tan et al., 2026Tang et al., 2025Tedeschi et al., 2024Taylor, 2024Vidgen et al., 2023Wang et al., 2023Wang et al., 2023Wang et al., 2025Wei et al., 2023Xie et al., 2024Yu et al., 2023Yuan et al., 2023Zeng et al., 2024Zhang et al., 2024Ziems et al., 2022Zou et al., 2023

References
  1. Aakanksha, Ahmadian, A., Ermis, B., Goldfarb-Tarrant, S., Kreutzer, J., Fadaee, M., & Hooker, S. (2024). The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm. arXiv Preprint arXiv:2406.18682. https://arxiv.org/abs/2406.18682
  2. Adversa AI. (2023). Universal LLM Jailbreak: ChatGPT, GPT-4, Bard, Bing, Anthropic, and Beyond. https://adversa.ai/blog/universal-llm-jailbreak-chatgpt-gpt-4-bard-bing-anthropic-and-beyond/
  3. Andriushchenko, M., & Flammarion, N. (2024). Does Refusal Training in LLMs Generalize to the Past Tense? arXiv Preprint arXiv:2407.11969. https://arxiv.org/abs/2407.11969
  4. Anthropic. (2024). Many-Shot Jailbreaking. https://www.anthropic.com/research/many-shot-jailbreaking
  5. Aqrawi, A., & Abbasi, A. (2024). Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA). arXiv Preprint arXiv:2409.03131. https://arxiv.org/abs/2409.03131
  6. Lin, K.-H., & ATR Community. (2026). ATR: Agent Threat Rules — Open Detection Standard for AI Agent Threats. 10.5281/zenodo.19178002
  7. Bethany, E., Bethany, M., Flores, J. A. N., Jha, S. K., & Najafirad, P. (2024). Jailbreaking Large Language Models with Symbolic Mathematics. arXiv Preprint arXiv:2409.11445. https://arxiv.org/abs/2409.11445
  8. Bhardwaj, R., & Poria, S. (2023). Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment. arXiv Preprint arXiv:2308.09662. https://arxiv.org/abs/2308.09662
  9. Bhardwaj, R., Anh, D. D., & Poria, S. (2024). Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. arXiv Preprint arXiv:2402.11746. https://arxiv.org/abs/2402.11746
  10. Brahman, F., Kumar, S., Balachandran, V., Dasigi, P., Pyatkin, V., Ravichander, A., Wiegreffe, S., Dziri, N., Chandu, K., Hessel, J., Tsvetkov, Y., Smith, N. A., Choi, Y., & Hajishirzi, H. (2024). The Art of Saying No: Contextual Noncompliance in Language Models. arXiv Preprint arXiv:2407.12043. https://arxiv.org/abs/2407.12043
  11. Bryan, P., Severi, G., de Gruyter, J., Jones, D., Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Fourney, A., Maxwell, W., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., … Kumar, R. S. S. (2025). Taxonomy of Failure Mode in Agentic AI Systems. https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Taxonomy-of-Failure-Mode-in-Agentic-AI-Systems-Whitepaper.pdf
  12. Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Maxwell, W., de Gruyter, J., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., Hines, K., Jones, D., Severi, G., Lundeen, R., … Russinovich, M. (2025). Lessons From Red Teaming 100 Generative AI Products. https://arxiv.org/abs/2501.07238
  13. Bullwinkel, B., Russinovich, M., Salem, A., Zanella-Beguelin, S., Jones, D., Severi, G., Kim, E., Hines, K., Minnich, A., Zunger, Y., & Kumar, R. S. S. (2025). A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks. arXiv Preprint arXiv:2507.02956. https://arxiv.org/abs/2507.02956
  14. Bullwinkel, B., Severi, G., Hines, K., Minnich, A., Kumar, R. S. S., & Zunger, Y. (2026). The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers. arXiv Preprint arXiv:2602.03085. https://arxiv.org/abs/2602.03085
  15. Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., & Wong, E. (2023). Jailbreaking Black Box Large Language Models in Twenty Queries. arXiv Preprint arXiv:2310.08419. https://arxiv.org/abs/2310.08419