pe.data.text.openreview module
- namedtuple pe.data.text.openreview.DownloadInfo(url, type)
Bases:
namedtuple()
DownloadInfo(url, type)
- Fields:
url – Alias for field number 0
type – Alias for field number 1
- class pe.data.text.openreview.OpenReview(root_dir='data', split='train', **kwargs)[source]
Bases:
TextCSV
The OpenReview dataset in the ICML 2024 Spotlight paper, “Differentially Private Synthetic Data via Foundation Model APIs 2: Text” (https://arxiv.org/abs/2403.01749).
- DOWNLOAD_INFO_DICT = {'test': ('https://raw.githubusercontent.com/AI-secure/aug-pe/bca21c90921bd1151aa7627e676c906165e205a0/data/openreview/iclr23_reviews_test.csv', 'direct'), 'train': ('https://raw.githubusercontent.com/AI-secure/aug-pe/bca21c90921bd1151aa7627e676c906165e205a0/data/openreview/iclr23_reviews_train.csv', 'direct'), 'val': ('https://raw.githubusercontent.com/AI-secure/aug-pe/bca21c90921bd1151aa7627e676c906165e205a0/data/openreview/iclr23_reviews_val.csv', 'direct')}
The download information for the OpenReview dataset.
- __init__(root_dir='data', split='train', **kwargs)[source]
Constructor.
- Parameters:
root_dir (str, optional) – The root directory of the dataset. If the dataset is not there, it will be downloaded automatically. Defaults to “data”
split (str, optional) – The split of the dataset. It should be either “train”, “val”, or “test”, defaults to “train”
- _download(download_info, data_path, processed_data_path)[source]
Download the dataset.
- Parameters:
download_info (pe.data.text.openreview.DownloadInfo) – The download information
data_path (str) – The path to the raw data
processed_data_path (str) – The path to the processed data
- Raises:
ValueError – If the download type is unknown