View on GitHub

nlp-recipes

Natural Language Processing Best Practices & Examples

Text Classification

This folder contains examples and best practices, written in Jupyter notebooks, for building text classification models. We use the utility scripts in the utils_nlp folder to speed up data preprocessing and model building for text classification.
The models can be used in a wide variety of applications, such as sentiment analysis, document indexing in digital libraries, hate speech detection, and general-purpose categorization in medical, academic, legal, and many other domains. Currently, we focus on fine-tuning pre-trained BERT and XLNet models. We plan to continue adding state-of-the-art models as they come up and welcome community contributions.

What is Text Classification?

Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. The state-of-the-art methods are based on neural networks of different architectures as well as pre-trained language models or word embeddings.

Summary

The following summarizes each notebook for Text Classification. Each notebook provides more details and guiding in principles on building state of the art models.

Notebook Environment Description Dataset
BERT for text classification on AzureML Azure ML A notebook which walks through fine-tuning and evaluating pre-trained BERT model on a distributed setup with AzureML. MultiNLI
Text Classification of MultiNLI Sentences using Multiple Transformer Models Local A notebook which walks through fine-tuning and evaluating a number of pre-trained transformer models MultiNLI
Text Classification of Multi Language Datasets using Transformer Model Local A notebook which walks through fine-tuning and evaluating a pre-trained transformer model for multiple datasets in different language MultiNLI
BBC Hindi News
DAC
Text Classification of Multi Language Datasets using MTDNN Model Local A notebook which walks through fine-tuning and evaluating a pre-trained MT-DNN model on the MNLI dataset MultiNLI