pe.data.text.openreview module

namedtuple pe.data.text.openreview.DownloadInfo(url, type)

Bases: namedtuple()

DownloadInfo(url, type)

Fields:

url – Alias for field number 0
type – Alias for field number 1

class pe.data.text.openreview.OpenReview(root_dir='data', split='train', **kwargs)[source]

Bases: TextCSV

The OpenReview dataset in the ICML 2024 Spotlight paper, “Differentially Private Synthetic Data via Foundation Model APIs 2: Text” (https://arxiv.org/abs/2403.01749).

DOWNLOAD_INFO_DICT = {'test': ('https://raw.githubusercontent.com/AI-secure/aug-pe/bca21c90921bd1151aa7627e676c906165e205a0/data/openreview/iclr23_reviews_test.csv', 'direct'), 'train': ('https://raw.githubusercontent.com/AI-secure/aug-pe/bca21c90921bd1151aa7627e676c906165e205a0/data/openreview/iclr23_reviews_train.csv', 'direct'), 'val': ('https://raw.githubusercontent.com/AI-secure/aug-pe/bca21c90921bd1151aa7627e676c906165e205a0/data/openreview/iclr23_reviews_val.csv', 'direct')}: The download information for the OpenReview dataset.

__init__(root_dir='data', split='train', **kwargs)[source]

Constructor.

Parameters:

root_dir (str, optional) – The root directory of the dataset. If the dataset is not there, it will be downloaded automatically. Defaults to “data”
split (str, optional) – The split of the dataset. It should be either “train”, “val”, or “test”, defaults to “train”

_download(download_info, data_path, processed_data_path)[source]

Download the dataset.

Parameters:

download_info (pe.data.text.openreview.DownloadInfo) – The download information
data_path (str) – The path to the raw data
processed_data_path (str) – The path to the processed data

Raises:

ValueError – If the download type is unknown