Skip to content

CLI Reference

This page documents the command-line interface of the benchmark-qed data download package.

Download Command

The data download command downloads datasets from GitHub and optionally uploads them to Azure Blob Storage.

!!! note "Supported cloud backends" Only Azure Blob Storage (--storage-type blob) is currently supported. Azure Cosmos DB and other backends are not supported.

Arguments

Argument Description
dataset The dataset to download. One of: AP_news, podcast, example_answers.
output_dir The directory (local) or path prefix (blob) to save the downloaded dataset.

Options

Option Type Default Description
--storage-type str None Storage backend: blob for Azure Blob Storage. Omit for local filesystem.
--container-name str None The blob container name. Required when --storage-type is set.
--account-url str None The storage account URL. Uses managed identity (DefaultAzureCredential) for authentication.
--connection-string str None The storage connection string. Alternative to --account-url for authentication.
--base-dir str None Base prefix in blob storage. Files are stored under {base_dir}/{output_dir}/. If omitted, files are stored under {output_dir}/ only.

Local Filesystem

benchmark-qed data download AP_news ./input

The dataset is extracted to the specified output_dir (e.g., ./input).

Azure Blob Storage

benchmark-qed data download AP_news input \
  --storage-type blob \
  --container-name my-container \
  --account-url https://myaccount.blob.core.windows.net \
  --base-dir my-project

Or with a connection string:

benchmark-qed data download AP_news input \
  --storage-type blob \
  --container-name my-container \
  --connection-string "$AZURE_STORAGE_CONNECTION_STRING"

Path Structure

  • With --base-dir: Files are stored as {base_dir}/{output_dir}/{file_path}
  • Example: my-project/input/2023/11/22/file.json
  • Without --base-dir: Files are stored as {output_dir}/{file_path}
  • Example: input/2023/11/22/file.json

This ensures blob storage mirrors your local directory structure.

config

Download the specified dataset from the GitHub repository. For local filesystem, the dataset is extracted to output_dir. For Azure Blob Storage: - Files are stored under {base_dir}/{output_dir} if base_dir is provided. - Files are stored under {output_dir} if base_dir is omitted. This ensures blob storage mirrors the local directory structure.

Usage

config [OPTIONS] DATASET:{AP_news|podcast|example_answers} OUTPUT_DIR

Arguments

Name Description Required
DATASET:{AP_news|podcast|example_answers} The dataset to download. Yes
OUTPUT_DIR The directory to save the downloaded dataset. Yes

Options

Name Description Required Default
--storage-type TEXT Storage type: 'blob' for Azure Blob Storage. Omit for local filesystem. No -
--container-name TEXT The blob container name. No -
--account-url TEXT The storage account URL (uses managed identity). No -
--connection-string TEXT The storage connection string (alternative to account_url). No -
--base-dir TEXT Base prefix in blob storage. Files will be stored as: base_dir/output_dir/. If omitted, files are stored under output_dir/ only. No -
--install-completion Install completion for the current shell. No -
--show-completion Show completion for the current shell, to copy it or customize the installation. No -
--help Show this message and exit. No -

Commands

No commands available