Huggingface Integration
Introduction
This document outlines the integrations between Olive and Huggingface. Discover how to use Huggingface resources within Olive.
Input Model
Use the HfModel
type if you want to optimize a Huggingface model, or evaluate a Huggingface model. The default task
is text-generation-with-past
.
Huggingface Hub model
Olive can automatically retrieve models from Huggingface hub:
"input_model":{
"type": "HfModel",
"model_path": "meta-llama/Llama-2-7b-hf"
}
Local model
If you have the Huggingface model prepared in local:
"input_model":{
"type": "HfModel",
"model_path": "path/to/local/model"
}
Note: You must also have the tokenizer and other necessary files in the same local directory.
Azure ML model
Olive supports loading model from your Azure Machine Learning workspace. Find detailed configurations here.
Example: Llama-2-7b from Azure ML model catalog:
"input_model":{
"type": "HfModel",
"model_path": {
"type": "azureml_registry_model",
"name": "Llama-2-7b",
"registry_name": "azureml-meta",
"version": "13"
}
}
Model config loading
Olive can automatically retrieve model configurations from Huggingface hub:
Olive retrieves model configuration from transformers for future usage.
Olive simplifies the process by automatically fetching configurations such as IO config and dummy input required for the
OnnxConversion
pass from OnnxConfig ifoptimum
is installed and themodel_type
andtask
are supported. This means there’s no need for you to manually specify the IO config when using theOnnxConversion
pass.
You can also provide your own IO config which will override the automatically fetched IO config and dummy inputs:
"input_model": {
"type": "HfModel",
"model_path": "meta-llama/Llama-2-7b-hf",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int64", "int64", "int64" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
}
}
}
Huggingface datasets
Olive supports automatically downloading and applying Huggingface datasets to Passes and Evaluators.
Datasets can be added to data_configs
section in the configuration file with "type": "HuggingfaceContainer"
. More details about data_configs
can be found here.
You can reference the dataset by its name in the Pass config
Example: datasets in data_configs
:
"data_configs": [{
"name": "oasst1_train",
"type": "HuggingfaceContainer",
"load_dataset_config": {
"data_name": "timdettmers/openassistant-guanaco",
"split": "train"
},
"pre_process_data_config": {
"text_cols": ["text"],
"strategy": "line-by-line",
"max_seq_len": 512,
"pad_to_max_len": false
}
}]
Pass config:
"session_params_tuning": {
"type": "OrtSessionParamsTuning",
"data_config": "oasst1_train"
}
Huggingface metrics
Huggingface metrics in Olive are supported by Huggingface evaluate. You can refer to Huggingface metrics page for a complete list of available metrics.
Example metric config
{
"name": "accuracy",
"type": "accuracy",
"backend": "huggingface_metrics",
"data_config": "oasst1_train",
"sub_types": [
{"name": "accuracy", "priority": -1},
{"name": "f1"}
]
}
Please refer to metrics for more details.
Huggingface login
For certain gated models or datasets, you need to log in to your Huggingface account to access them. If the Huggingface resources you are using require a token, please add hf_token: true
to the Olive system configuration. Olive will then automatically manage the Huggingface login process, allowing you to access these gated resources.
Local system, docker system and Python environment system
For local system, docker system and Python environment system, please run command huggingface-cli login
in your terminal to login your Huggingface account. Find more details about login here.
AzureML system
Follow these steps to enable Huggingface login for AzureML system:
Get your Huggingface token string from Settings -> Access Tokens.
Create or use an existing Azure Key Vault. Assume the key vault is named
my_keyvault_name
. Add a new secret namedhf-token
, and set the value as the token from the first step. It is important to note that Olive reserveshf-token
secret name specifically for Huggingface login. Do not use this name in this keyvault for other purpose.Make sure you have
azureml_client
section in your configuration file, and add a new attributekeyvault_name
to it. For example:"azureml_client": { "subscription_id": "<subscription_id>", "resource_group": "<resource_group>", "workspace_name": "<workspace_name>", "keyvault_name" : "my_keyvault_name" }
Configure the Managed Service Identity (MSI) for the host compute or target compute. Detailed instruction can be found here. Then grant the host compute or target compute access to the key vault resource following this guide
Add
hf_token: True
to AzureML system configuration:"aml_system": { "type": "AzureML", "config": { "hf_token": true } }
With the above steps, Olive can automatically retrieve your Huggingface token from the hf-token
secret in the my_keyvault_name
key vault, and log in your Huggingface account in the AML job.
E2E example
For the complete example, please refer to Bert Optimization with PTQ on CPU.