Architecture#
This section explains the design and architecture of torchTextClassifiers.
Overview#
torchTextClassifiers is built on a modular, component-based pipeline that balances simplicity for beginners with flexibility for advanced users.
The core pipeline consists of four main components:
Tokenizer: Converts text strings into numerical tokens
Text Embedder: Creates semantic embeddings from tokens (with optional attention)
Categorical Handler: Processes additional categorical features (optional)
Classification Head: Produces final class predictions
This design allows you to:
Understand the clear data flow through the model
Mix and match components for your specific needs
Start simple and add complexity as required
Use the high-level API or drop down to PyTorch for full control
Quick Links#
Architecture Overview: Complete architecture explanation with examples
API Reference: API reference for all components
Design Philosophy#
The architecture follows these principles:
- Modularity
Each component (Tokenizer, Embedder, Categorical Handler, Classification Head) is independent and can be used separately or replaced with custom implementations
- Clear Data Flow
The pipeline shows exactly how data moves from text input through embeddings to predictions, making the model transparent and understandable
- Composability
Components can be mixed and matched to create custom architectures—use text-only, add categorical features, or build entirely custom combinations
- Flexibility
Start with the high-level
torchTextClassifierswrapper for simplicity, or compose components directly with PyTorch for maximum control- Type Safety
Extensive use of type hints and dataclasses for better IDE support and fewer runtime errors
- Framework Integration
All components are standard
torch.nn.Moduleobjects with seamless PyTorch and PyTorch Lightning integration