automl.nlp.huggingface.training_args
TrainingArgumentsForAuto Objects
@dataclass
class TrainingArgumentsForAuto(TrainingArguments)
FLAML custom TrainingArguments.
Arguments:
task
str - the task name for NLP tasks, e.g., seq-classification, token-classificationoutput_dir
str - data root directory for outputing the log, etc.model_path
str, optional, defaults to "facebook/muppet-roberta-base" - A string, the path of the language model file, either a path from huggingface model card huggingface.co/models, or a local path for the model.fp16
bool, optional, defaults to "False" - A bool, whether to use FP16.max_seq_length
int, optional, defaults to 128 - An integer, the max length of the sequence. For token classification task, this argument will be ineffective. pad_to_max_length (bool, optional, defaults to "False"): whether to pad all samples to model maximum sentence length. If False, will pad the samples dynamically when batching to the maximum length in the batch.per_device_eval_batch_size
int, optional, defaults to 1 - An integer, the per gpu evaluation batch size.label_list
List[str], optional, defaults to None - A list of string, the string list of the label names. When the task is sequence labeling/token classification, there are two formats of the labels: (1) The token labels, i.e., [B-PER, I-PER, B-LOC]; (2) Id labels. For (2), need to pass the label_list (e.g., [B-PER, I-PER, B-LOC]) to convert the Id to token labels when computing the metric with metric_loss_score. See the example in a simple token classification example.