Models¶
This module includes abstractions for the model implementation in this package, defining the basic signature of the __init__() method and some mandatory methods that have to be implemented by subclasses.
The idea is to provide all implemented models in two versions to accommodate different users:
nlp_uncertainty_zoo.models.model.Module
only defines the basic model logic, but no training loop etc. This is ideal for research projects where you would like to integrate the model into your own repository structure or tinkering.
nlp_uncertainty_zoo.models.model.Model
defines and out-of-the-box solution for direct application. The model can simply be trained callingnlp_uncertainty_zoo.models.model.Model.fit()
with some training dataloader.
The Module class¶
The nlp_uncertainty_zoo.models.model.Module
class is supposed to mirror PyTorch’s `nn.Module<https://pytorch.org/docs/stable/generated/torch.nn.Module.html>`_ class, only including the bare-bones logic of the
model. By default, input parameters include the number of layers, as well as the sizes for the model vocabulary, embeddings
(input_size), hidden activations and number of classes (output_size).
One also needs to specify the PyTorch device (like “cpu” or “cuda”) and whether the model is used for sequence classification (is_sequence_classifier=True) or sequence labelling (is_sequence_classifier=False). Other kind of tasks like regression, parsing or generation are not supported at the moment.
The following methods have to be implemented by every subclass of Module:
nlp_uncertainty_zoo.models.model.Module.get_logits()
: Return the logits for a given input, which come in the form of a torch.FloatTensor with dimensions batch_size x sequence_length x output_size for models with only a single prediction or batch_size x num_predictions x sequence_length x output_size for models with multiple predictions, such as MC Dropout or ensembles. If the model is a sequence classifier, sequence_length will be 1.
nlp_uncertainty_zoo.models.model.Module.predict()
: Same asnlp_uncertainty_zoo.models.model.Module.get_logits()
, except that values on the last axis are actual probabilities summing up to 1.
nlp_uncertainty_zoo.models.model.Module.get_sequence_representation()
: Returns the representation of a sequence of a certain model as a torch.FloatTensor of size batch_size x hidden_size. Fornlp_uncertainty_zoo.models.bert
andnlp_uncertainty_zoo.models.transformer
models, the sequence representation is obtained by using the top-layer hidden activations of the first time step (often corresponding to the [CLS] token) after an additional pooler layer, or the last step hidden activations of the last layer of the unidirectionalnlp_uncertainty_zoo.models.lstm
classes.
nlp_uncertainty_zoo.models.model.Module.get_uncertainty()
: Return the uncertainty estimates for an input batch, with the return tensor possessing the same shape as withnlp_uncertainty_zoo.models.model.Module.get_logits()
ornlp_uncertainty_zoo.models.model.Module.predict. If no value is specified for `metric_name()
, the metric stored in the attribute default_uncertainty_metric is used (which usually refers to predictive entropy. If another metric should be used, one of the names in the keys of the attributes single_prediction_uncertainty_metrics or multi_prediction_uncertainty_metrics.
The Model class¶
The nlp_uncertainty_zoo.models.model.Model
class is aimed as a complete drop-in solution for anyone who does not want to write training logic and similar aspects.
As input parameters, nlp_uncertainty_zoo.models.model.Model
expected the name of the model as a string, a reference to a class, the model parameters
as a dictionary of keyword arguments that is passed to the nlp_uncertainty_zoo.models.model.Module.__init__()
function of the given nlp_uncertainty_zoo.models.model.Module
subclass.
The model_dir is an optional argument that specified the path to which the model is saved during training.
The user mainly interacts with the nlp_uncertainty_zoo.models.model.Model
class using the nlp_uncertainty_zoo.models.model.Model.fit()
, nlp_uncertainty_zoo.models.model.Model.predict()
and nlp_uncertainty_zoo.models.model.Model.get_uncertainty()
functions,
where the latter two mirror the function implementations in Module. nlp_uncertainty_zoo.models.model.Model.fit()
expects a training and validation set
`torch.utils.data.DataLoader<https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader>`_ instances.
Models Documentation¶
- Define common methods of models. This done by separating the logic into two parts:
Module: This class only defines the model architecture and forward pass. This is also done so that others can easily copy and adapt the code if necessary.
Model: This wrapper class defines all the other logic necessary to use a model in practice: Training, loss computation, saving and loading, etc.
- class nlp_uncertainty_zoo.models.model.Model(model_name: str, module_class: type, lr: float, weight_decay: float, optimizer_class: ~typing.Type[~torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, scheduler_class: ~typing.Optional[~typing.Type[~torch.optim.lr_scheduler._LRScheduler]] = None, scheduler_kwargs: ~typing.Optional[~typing.Dict[str, ~typing.Any]] = None, model_dir: ~typing.Optional[str] = None, device: ~typing.Union[~torch.device, str] = 'cpu', **model_params)¶
Bases:
ABC
Abstract model class. It is a wrapper that defines data loading, batching, training and evaluation loops, so that the core module class can only define the model’s forward pass.
- property available_uncertainty_metrics: Dict[str, Callable]¶
Return a dictionary of all available uncertainty metrics of the current model.
- compute_loss_weights(train_split: DataLoader, *args, **kwargs) FloatTensor ¶
Compute loss weights for unbalanced training sets.
- Parameters:
- train_split: DataLoader
- Returns:
- torch.FloatTensor
Tensor containing loss weights (should be of size K).
- eval(data_split: DataLoader, wandb_run: Optional[Run] = None) Tensor ¶
Evaluate a data split.
- Parameters:
- data_split: DataSplit
Data split the model should be evaluated on.
- wandb_run: Optional[WandBRun]
Weights and Biases run to track training statistics. Training and validation loss (if applicable) are tracked by default, everything else is defined in _epoch_iter() and _finetune() depending on the model.
- Returns:
- torch.Tensor
Loss on evaluation split.
- fit(train_split: DataLoader, num_training_steps: int, valid_split: Optional[DataLoader] = None, weight_loss: bool = False, grad_clip: float = 10, validation_interval: Optional[int] = None, early_stopping_pat: int = inf, early_stopping: bool = False, verbose: bool = True, wandb_run: Optional[Run] = None, **training_kwargs)¶
Fit the model to training data.
- Parameters:
- train_split: DataLoader
Dataset the model is being trained on.
- num_training_steps: int
Number of training steps until completion.
- valid_split: Optional[DataLoader]
Validation set the model is being evaluated on if given.
- verbose: bool
Whether to display information about current loss.
- weight_loss: bool
Weight classes in loss function. Default is False.
- grad_clip: float
Parameter grad norm value before it will be clipped. Default is 10.
- validation_interval: Optional[int]
Interval of training steps between validations on the validation set. If None, the model is evaluated after each pass through the training data.
- early_stopping_pat: int
Patience in number of training steps before early stopping kicks in. Default is np.inf.
- early_stopping: bool
Whether early stopping should be used. Default is False.
- wandb_run: Optional[WandBRun]
Weights and Biases run to track training statistics. Training and validation loss (if applicable) are tracked by default, everything else is defined in _epoch_iter() and _finetune() depending on the model.
- get_loss(X: Tensor, y: Tensor, wandb_run: Optional[Run] = None, **kwargs) Tensor ¶
Get loss for a single batch. This just uses cross-entropy loss, but can be adjusted in subclasses by overwriting this function.
- Parameters:
- X: torch.Tensor
Batch input.
- y: torch.Tensor
Batch labels.
- wandb_run: Optional[WandBRun] = None
Weights and Biases run to track training statistics.
- Returns:
- torch.Tensor
Batch loss.
- get_uncertainty(input_: LongTensor, *args, metric_name: Optional[str] = None, **kwargs) FloatTensor ¶
Get the uncertainty scores for the current batch.
- Parameters:
- input_: torch.LongTensor
(Batch of) Indexed input sequences.
- metric_name: Optional[str]
Name of uncertainty metric being used. If None, use metric defined under the default_uncertainty_metric attribute.
- Returns:
- torch.FloatTensor
Uncertainty scores for the current batch.
- static load(model_path: str)¶
Load model from path.
- Parameters:
- model_path: str
Path model was saved to.
- Returns:
- Model
Loaded model.
- predict(X: Tensor, *args, **kwargs) Tensor ¶
Make a prediction for some input.
- Parameters:
- X: torch.Tensor
Input data points.
- Returns:
- torch.Tensor
Predictions.
- to(device: Union[device, str])¶
Move model to another device.
- Parameters:
- device: Device
Device the model should be moved to.
- class nlp_uncertainty_zoo.models.model.Module(num_layers: int, vocab_size: int, input_size: int, hidden_size: int, output_size: int, is_sequence_classifier: bool, device: Union[device, str], **build_params)¶
Bases:
ABC
,Module
Abstract module class, defining how the forward pass of a model looks.
- property available_uncertainty_metrics: Dict[str, Callable]¶
Return a dictionary of all available uncertainty metrics of the current model.
- abstract forward(input_: LongTensor, *args, **kwargs) FloatTensor ¶
Forward pass of the model.
- Parameters:
- input_: torch.LongTensor
(Batch of) Indexed input sequences.
- Returns:
- torch.FloatTensor
Output predictions for input.
Obtain hidden representations for the current input.
- Parameters:
- input_: torch.LongTensor
Inputs ids for a sentence.
- Returns:
- torch.FloatTensor
Representation for the current sequence.
- abstract get_logits(input_: LongTensor, *args, **kwargs) FloatTensor ¶
Get the logits for an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type. Used to create inputs for the uncertainty metrics defined in nlp_uncertainty_zoo.metrics.
- Parameters:
- input_: torch.LongTensor
(Batch of) Indexed input sequences.
- Returns:
- torch.FloatTensor
Logits for current input.
- get_num_learnable_parameters() int ¶
Return the total number of (learnable) parameters in the model.
- Returns:
- int
Number of learnable parameters.
- get_sequence_representation(input_: LongTensor, *args, **kwargs) FloatTensor ¶
Define how the representation for an entire sequence is extracted from the input ids. This is relevant in sequence classification. For example, this could be the last hidden state for a unidirectional LSTM or the first hidden state for a transformer, adding a pooler layer.
- Parameters:
- input_: torch.LongTensor
Inputs ids for a sentence.
- Returns:
- torch.FloatTensor
Representation for the current sequence.
Define how the representation for an entire sequence is extracted from a number of hidden states. This is relevant in sequence classification. For example, this could be the last hidden state for a unidirectional LSTM or the first hidden state for a transformer, adding a pooler layer.
- Parameters:
- hidden: torch.FloatTensor
Hidden states of a model for a sequence.
- Returns:
- torch.FloatTensor
Representation for the current sequence.
- get_uncertainty(input_: LongTensor, metric_name: Optional[str] = None, **kwargs) FloatTensor ¶
Get the uncertainty scores for the current batch.
- Parameters:
- input_: torch.LongTensor
(Batch of) Indexed input sequences.
- metric_name: Optional[str]
Name of uncertainty metric being used. If None, use metric defined under the default_uncertainty_metric attribute.
- Returns:
- torch.FloatTensor
Uncertainty scores for the current batch.
- predict(input_: LongTensor, *args, **kwargs) FloatTensor ¶
Output a probability distribution over classes given an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type.
- Parameters:
- input_: torch.LongTensor
(Batch of) Indexed input sequences.
- Returns:
- torch.FloatTensor
Logits for current input.
- training: bool¶
- class nlp_uncertainty_zoo.models.model.MultiPredictionMixin(num_predictions: int)¶
Bases:
object
Mixin class that is used to bundle certain methods for modules that use multiple predictions to estimate uncertainty.