# Huggingface Integration

## Introduction

This document outlines the integrations between Olive and Huggingface. Discover how to use Huggingface resources within Olive.

## Input Model

Use the `HfModel` type if you want to optimize a Huggingface model, or evaluate a Huggingface model. The default `task` is `text-generation-with-past`.

### Huggingface Hub model

Olive can automatically retrieve models from Huggingface hub:

```json
"input_model":{
    "type": "HfModel",
    "model_path": "meta-llama/Llama-2-7b-hf"
}
```

### Local model

If you have the Huggingface model prepared in local:

```json
"input_model":{
    "type": "HfModel",
    "model_path": "path/to/local/model"
}
```

**Note:** You must also have the tokenizer and other necessary files in the same local directory.

### Azure ML model

Olive supports loading model from your Azure Machine Learning workspace. Find detailed configurations [here](./azureml_integration.md).

Example: [Llama-2-7b](https://ml.azure.com/models/Llama-2-7b/version/13/catalog/registry/azureml-meta) from Azure ML model catalog:

```json
"input_model":{
    "type": "HfModel",
    "model_path": {
        "type": "azureml_registry_model",
        "name": "Llama-2-7b",
        "registry_name": "azureml-meta",
        "version": "13"
    }
}
```

### Model config loading

Olive can automatically retrieve model configurations from Huggingface hub:

- Olive retrieves model [configuration](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoConfig) from transformers for future usage.

- Olive simplifies the process by automatically fetching configurations such as IO config and dummy input required for the `OnnxConversion` pass from [OnnxConfig](https://huggingface.co/docs/optimum/main/en/exporters/onnx/package_reference/configuration#optimum.exporters.onnx.OnnxConfig) if `optimum` is installed and the `model_type` and `task` are supported. This means there's no need for you to manually specify the IO config when using the `OnnxConversion` pass.

You can also provide your own IO config which will override the automatically fetched IO config and dummy inputs:

```json
"input_model": {
    "type": "HfModel",
    "model_path": "meta-llama/Llama-2-7b-hf",
    "io_config": {
        "input_names": [ "input_ids", "attention_mask", "position_ids" ],
        "output_names": [ "logits" ],
        "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
        "input_types": [ "int64", "int64", "int64" ],
        "dynamic_axes": {
            "input_ids": { "0": "batch_size", "1": "sequence_length" },
            "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
            "position_ids": { "0": "batch_size", "1": "sequence_length" }
        }
    }
}
```

## Huggingface datasets

Olive supports automatically downloading and applying [Huggingface datasets](https://huggingface.co/datasets) to Passes and Evaluators.

Datasets can be added to `data_configs` section in the configuration file with `"type": "HuggingfaceContainer"`. More details about `data_configs` can be found [here](../tutorials/configure_data.rst).

You can reference the dataset by its name in the Pass config

Example: datasets in `data_configs`:

```json
"data_configs": [{
    "name": "oasst1_train",
    "type": "HuggingfaceContainer",
    "load_dataset_config": {
        "data_name": "timdettmers/openassistant-guanaco",
        "split": "train"
    },
    "pre_process_data_config": {
        "text_cols": ["text"],
        "strategy": "line-by-line",
        "max_seq_len": 512,
        "pad_to_max_len": false
    }
}]
```

Pass config:

```json
"session_params_tuning": {
    "type": "OrtSessionParamsTuning",
    "data_config": "oasst1_train"
}
```

## Huggingface metrics
Huggingface metrics in Olive are supported by [Huggingface evaluate](https://huggingface.co/docs/evaluate/index). You can refer to [Huggingface metrics page](https://huggingface.co/metrics) for a complete list of available metrics.

Example metric config

```json
{
    "name": "accuracy",
    "type": "accuracy",
    "backend": "huggingface_metrics",
    "data_config": "oasst1_train",
    "sub_types": [
        {"name": "accuracy", "priority": -1},
        {"name": "f1"}
    ]
}
```

Please refer to [metrics](../overview/options.md#metrics) for more details.

## Huggingface login

For certain gated models or datasets, you need to log in to your Huggingface account to access them. If the Huggingface resources you are using require a token, please add `hf_token: true` to the Olive system configuration. Olive will then automatically manage the Huggingface login process, allowing you to access these gated resources.

### Local system, docker system and Python environment system

For local system, docker system and Python environment system, please run command `huggingface-cli login` in your terminal to login your Huggingface account. Find more details about login [here](https://huggingface.co/docs/huggingface_hub/quick-start#login).

### AzureML system

Follow these steps to enable Huggingface login for AzureML system:

1. Get your Huggingface token string from Settings -> [Access Tokens](https://huggingface.co/settings/tokens).

1. Create or use an existing [Azure Key Vault](https://learn.microsoft.com/en-us/azure/key-vault/general/overview). Assume the key vault is named `my_keyvault_name`. Add a new secret named `hf-token`, and set the value as the token from the first step. It is important to note that Olive reserves `hf-token` secret name specifically for Huggingface login. Do not use this name in this keyvault for other purpose.

1. Make sure you have `azureml_client` section in your configuration file, and add a new attribute `keyvault_name` to it. For example:

    ```json
    "azureml_client": {
        "subscription_id": "<subscription_id>",
        "resource_group": "<resource_group>",
        "workspace_name": "<workspace_name>",
        "keyvault_name" : "my_keyvault_name"
    }
    ```

1. Configure the Managed Service Identity (MSI) for the host compute or target compute. Detailed instruction can be found [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?view=azureml-api-2&tabs=sdk#configure-a-managed-identity). Then grant the host compute or target compute access to the key vault resource following this [guide](https://learn.microsoft.com/en-us/azure/key-vault/general/assign-access-policy?tabs=azure-portal)

1. Add `hf_token: True` to AzureML system configuration:

    ```json
    "aml_system": {
        "type": "AzureML",
        "config": {
            "hf_token": true
        }
    }
    ```

With the above steps, Olive can automatically retrieve your Huggingface token from the `hf-token` secret in the `my_keyvault_name` key vault, and log in your Huggingface account in the AML job.

## E2E example

For the complete example, please refer to [Bert Optimization with PTQ on CPU](https://github.com/microsoft/Olive/tree/main/examples/bert#bert-optimization-with-ptq-on-cpu).