Tutorials#

Step-by-step guides to learn torchTextClassifiers through practical examples.

Overview#

These tutorials guide you through common text classification tasks, from basic binary classification to advanced multiclass scenarios.

Available Tutorials#

Getting Started#

Binary Classification

Recommended first tutorial

Build a sentiment classifier for product reviews. Learn the complete workflow from data preparation to evaluation.

What you’ll learn:

  • Creating and training tokenizers

  • Configuring models

  • Training with validation data

  • Making predictions

  • Evaluating performance

Difficulty: Beginner | Time: 15 minutes

Binary Classification Tutorial

Intermediate Tutorials#

Multiclass Classification

Classify text into 3+ categories with proper handling of class imbalance and evaluation metrics.

What you’ll learn:

  • Multiclass model configuration

  • Class distribution analysis

  • Reproducibility with seeds

  • Confusion matrices

  • Advanced evaluation metrics

Difficulty: Intermediate | Time: 20 minutes

Multiclass Classification Tutorial
Mixed Features

Combine text with categorical variables for improved classification performance.

What you’ll learn:

  • Adding categorical features alongside text

  • Configuring categorical embeddings

  • Comparing performance improvements

  • Feature combination strategies

Difficulty: Intermediate | Time: 25 minutes

Mixed Features Classification

Advanced Tutorials#

Explainability

Understand which words and characters drive your model’s predictions.

What you’ll learn:

  • Generating attribution scores with Captum

  • Word-level and character-level visualizations

  • Identifying influential tokens

  • Interactive explainability mode

Difficulty: Advanced | Time: 30 minutes

Model Explainability
Multilabel Classification

Assign multiple labels to each text sample for complex classification scenarios.

What you’ll learn:

  • Ragged lists vs. one-hot encoding

  • Configuring BCEWithLogitsLoss

  • Multilabel evaluation metrics

  • Handling variable labels per sample

Difficulty: Advanced | Time: 30 minutes

Multilabel Classification

Learning Path#

We recommend following this learning path:

  1. Start with: Quick Start - Get familiar with the basics

  2. Then: Binary Classification Tutorial - Understand the complete workflow

  3. Next: Multiclass Classification Tutorial - Handle multiple classes

  4. Branch out: Mixed Features Classification for categorical features OR Multilabel Classification for multiple labels

  5. Master: Model Explainability - Understand your model’s predictions

Tutorial Format#

Each tutorial follows a consistent structure:

Learning Objectives

What you’ll be able to do after completing the tutorial

Prerequisites

What you need to know before starting

Complete Code

Full working example you can copy and run

Step-by-Step Walkthrough

Detailed explanation of each step

Customization

How to adapt the code to your needs

Common Issues

Troubleshooting tips and solutions

Next Steps

Where to go after finishing

Tips for Learning#

Run the Code#

Don’t just read - run the examples! Modify them to see what happens:

# Try different values
model_config = ModelConfig(
    embedding_dim=128,  # Was 64 - what changes?
    num_classes=2
)

Start Simple#

Begin with the Quick Start, then move to Binary Classification. Don’t skip ahead!

Use Your Own Data#

Once you understand the examples, try them with your own text data:

# Your data
my_texts = ["your", "text", "samples"]
my_labels = [0, 1, 0]

# Same workflow
classifier.train(my_texts, my_labels, training_config)

Experiment#

  • Try different tokenizers (WordPiece vs NGram)

  • Adjust hyperparameters (learning rate, embedding dim)

  • Compare model sizes

  • Test different batch sizes

Read the Errors#

Error messages are helpful! They often tell you exactly what’s wrong:

# Error: num_classes=2 but got label 3
# Solution: Check your labels - should be 0, 1 (not 1, 2, 3)

Getting Help#

Stuck on a tutorial? Here’s how to get help:

  1. Check Common Issues: Each tutorial has a troubleshooting section

  2. Read the API docs: API Reference for detailed parameter descriptions

  3. Review architecture: Architecture Overview for how components work

  4. Ask questions: GitHub Discussions

  5. Report bugs: GitHub Issues

Additional Resources#

Example Scripts#

All tutorials are based on runnable examples in the repository:

Jupyter Notebooks#

Interactive notebooks for hands-on learning:

Contributing#

Want to contribute a tutorial? We welcome:

  • New use cases

  • Alternative approaches

  • Real-world examples

  • Performance tips

See our contributing guidelines to get started!

What’s Next?#

Ready to start? Choose your path: