LLMLingua Prompt Compression#
Introduction#
LLMLingua Prompt Compression tool enables you to speed up large language model’s inference and enhance large language model’s perceive of key information, compress the prompt with minimal performance loss.
Requirements#
PyPI package: llmlingua-promptflow
.
For Azure users: follow the wiki for AzureML or the wiki for AI Studio to prepare the compute session.
For local users:
pip install llmlingua-promptflow
You may also want to install the Prompt flow for VS Code extension.
Prerequisite#
Create a MaaS deployment for large language model in Azure model catalog. Take the Llama model as an example, you can learn how to deploy and consume Meta Llama models with model as a service by the guidance for Azure AI Studio.
Inputs#
The tool accepts the following inputs:
Name |
Type |
Description |
Required |
---|---|---|---|
prompt |
string |
The prompt that needs to be compressed. |
Yes |
myconn |
CustomConnection |
The created connection to a MaaS resource for calculating log probability. |
Yes |
rate |
float |
The maximum compression rate target to be achieved. Default value is 0.5. |
No |
Outputs#
Return Type |
Description |
---|---|
string |
The resulting compressed prompt. |
Sample Flows#
Find example flows using the llmlingua-promptflow
package here.
Contact#
Please reach out to LLMLingua Team (llmlingua@microsoft.com) with any issues.