pe.data.text.yelp module
- namedtuple pe.data.text.yelp.DownloadInfo(url, type)
Bases:
namedtuple()
DownloadInfo(url, type)
- Fields:
url – Alias for field number 0
type – Alias for field number 1
- class pe.data.text.yelp.Yelp(root_dir='data', split='train', **kwargs)[source]
Bases:
TextCSV
The Yelp dataset in the ICML 2024 Spotlight paper, “Differentially Private Synthetic Data via Foundation Model APIs 2: Text” (https://arxiv.org/abs/2403.01749).
- DOWNLOAD_INFO_DICT = {'test': ('https://raw.githubusercontent.com/AI-secure/aug-pe/bca21c90921bd1151aa7627e676c906165e205a0/data/yelp/test.csv', 'direct'), 'train': ('https://drive.google.com/uc?id=1epLuBxCk5MGnm1GiIfLcTcr-tKgjCrc2', 'gdown'), 'val': ('https://raw.githubusercontent.com/AI-secure/aug-pe/bca21c90921bd1151aa7627e676c906165e205a0/data/yelp/dev.csv', 'direct')}
The download information for the Yelp dataset.
- __init__(root_dir='data', split='train', **kwargs)[source]
Constructor.
- Parameters:
root_dir (str, optional) – The root directory of the dataset. If the dataset is not there, it will be downloaded automatically. Defaults to “data”
split (str, optional) – The split of the dataset. It should be either “train”, “val”, or “test”, defaults to “train”
- _download(download_info, data_path, processed_data_path)[source]
Download the dataset.
- Parameters:
download_info (pe.data.text.yelp.DownloadInfo) – The download information
data_path (str) – The path to the raw data
processed_data_path (str) – The path to the processed data
- Raises:
ValueError – If the download type is unknown