Models¶

This module includes abstractions for the model implementation in this package, defining the basic signature of the __init__() method and some mandatory methods that have to be implemented by subclasses.

The idea is to provide all implemented models in two versions to accommodate different users:

nlp_uncertainty_zoo.models.model.Module only defines the basic model logic, but no training loop etc. This is ideal for research projects where you would like to integrate the model into your own repository structure or tinkering.

nlp_uncertainty_zoo.models.model.Model defines and out-of-the-box solution for direct application. The model can simply be trained calling nlp_uncertainty_zoo.models.model.Model.fit() with some training dataloader.

The Module class¶

The nlp_uncertainty_zoo.models.model.Module class is supposed to mirror PyTorch’s `nn.Module<https://pytorch.org/docs/stable/generated/torch.nn.Module.html>`_ class, only including the bare-bones logic of the model. By default, input parameters include the number of layers, as well as the sizes for the model vocabulary, embeddings (input_size), hidden activations and number of classes (output_size).

One also needs to specify the PyTorch device (like “cpu” or “cuda”) and whether the model is used for sequence classification (is_sequence_classifier=True) or sequence labelling (is_sequence_classifier=False). Other kind of tasks like regression, parsing or generation are not supported at the moment.

The following methods have to be implemented by every subclass of Module:

nlp_uncertainty_zoo.models.model.Module.get_logits(): Return the logits for a given input, which come in the form of a torch.FloatTensor with dimensions batch_size x sequence_length x output_size for models with only a single prediction or batch_size x num_predictions x sequence_length x output_size for models with multiple predictions, such as MC Dropout or ensembles. If the model is a sequence classifier, sequence_length will be 1.

nlp_uncertainty_zoo.models.model.Module.predict(): Same as nlp_uncertainty_zoo.models.model.Module.get_logits(), except that values on the last axis are actual probabilities summing up to 1.

nlp_uncertainty_zoo.models.model.Module.get_sequence_representation(): Returns the representation of a sequence of a certain model as a torch.FloatTensor of size batch_size x hidden_size. For nlp_uncertainty_zoo.models.bert and nlp_uncertainty_zoo.models.transformer models, the sequence representation is obtained by using the top-layer hidden activations of the first time step (often corresponding to the [CLS] token) after an additional pooler layer, or the last step hidden activations of the last layer of the unidirectional nlp_uncertainty_zoo.models.lstm classes.

nlp_uncertainty_zoo.models.model.Module.get_uncertainty(): Return the uncertainty estimates for an input batch, with the return tensor possessing the same shape as with nlp_uncertainty_zoo.models.model.Module.get_logits() or nlp_uncertainty_zoo.models.model.Module.predict. If no value is specified for `metric_name(), the metric stored in the attribute default_uncertainty_metric is used (which usually refers to predictive entropy. If another metric should be used, one of the names in the keys of the attributes single_prediction_uncertainty_metrics or multi_prediction_uncertainty_metrics.

The Model class¶

The nlp_uncertainty_zoo.models.model.Model class is aimed as a complete drop-in solution for anyone who does not want to write training logic and similar aspects. As input parameters, nlp_uncertainty_zoo.models.model.Model expected the name of the model as a string, a reference to a class, the model parameters as a dictionary of keyword arguments that is passed to the nlp_uncertainty_zoo.models.model.Module.__init__() function of the given nlp_uncertainty_zoo.models.model.Module subclass. The model_dir is an optional argument that specified the path to which the model is saved during training.

The user mainly interacts with the nlp_uncertainty_zoo.models.model.Model class using the nlp_uncertainty_zoo.models.model.Model.fit(), nlp_uncertainty_zoo.models.model.Model.predict() and nlp_uncertainty_zoo.models.model.Model.get_uncertainty() functions, where the latter two mirror the function implementations in Module. nlp_uncertainty_zoo.models.model.Model.fit() expects a training and validation set `torch.utils.data.DataLoader<https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader>`_ instances.

Models Documentation¶

Define common methods of models. This done by separating the logic into two parts:

Module: This class only defines the model architecture and forward pass. This is also done so that others can easily copy and adapt the code if necessary.
Model: This wrapper class defines all the other logic necessary to use a model in practice: Training, loss computation, saving and loading, etc.

class nlp_uncertainty_zoo.models.model.Model(model_name: str, module_class: type, lr: float, weight_decay: float, optimizer_class: ~typing.Type[~torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, scheduler_class: ~typing.Optional[~typing.Type[~torch.optim.lr_scheduler._LRScheduler]] = None, scheduler_kwargs: ~typing.Optional[~typing.Dict[str, ~typing.Any]] = None, model_dir: ~typing.Optional[str] = None, device: ~typing.Union[~torch.device, str] = 'cpu', **model_params)¶

Bases: ABC

Abstract model class. It is a wrapper that defines data loading, batching, training and evaluation loops, so that the core module class can only define the model’s forward pass.

property available_uncertainty_metrics: Dict[str, Callable]¶: Return a dictionary of all available uncertainty metrics of the current model.

compute_loss_weights(train_split: DataLoader, *args, **kwargs) → FloatTensor¶

Compute loss weights for unbalanced training sets.

Parameters:

train_split: DataLoader

Returns:

torch.FloatTensor: Tensor containing loss weights (should be of size K).

eval(data_split: DataLoader, wandb_run: Optional[Run] = None) → Tensor¶

Evaluate a data split.

Parameters:

data_split: DataSplit: Data split the model should be evaluated on.
wandb_run: Optional[WandBRun]: Weights and Biases run to track training statistics. Training and validation loss (if applicable) are tracked by default, everything else is defined in _epoch_iter() and _finetune() depending on the model.

Returns:

torch.Tensor: Loss on evaluation split.

fit(train_split: DataLoader, num_training_steps: int, valid_split: Optional[DataLoader] = None, weight_loss: bool = False, grad_clip: float = 10, validation_interval: Optional[int] = None, early_stopping_pat: int = inf, early_stopping: bool = False, verbose: bool = True, wandb_run: Optional[Run] = None, **training_kwargs)¶

Fit the model to training data.

Parameters:

train_split: DataLoader: Dataset the model is being trained on.
num_training_steps: int: Number of training steps until completion.
valid_split: Optional[DataLoader]: Validation set the model is being evaluated on if given.
verbose: bool: Whether to display information about current loss.
weight_loss: bool: Weight classes in loss function. Default is False.
grad_clip: float: Parameter grad norm value before it will be clipped. Default is 10.
validation_interval: Optional[int]: Interval of training steps between validations on the validation set. If None, the model is evaluated after each pass through the training data.
early_stopping_pat: int: Patience in number of training steps before early stopping kicks in. Default is np.inf.
early_stopping: bool: Whether early stopping should be used. Default is False.
wandb_run: Optional[WandBRun]: Weights and Biases run to track training statistics. Training and validation loss (if applicable) are tracked by default, everything else is defined in _epoch_iter() and _finetune() depending on the model.

get_loss(X: Tensor, y: Tensor, wandb_run: Optional[Run] = None, **kwargs) → Tensor¶

Get loss for a single batch. This just uses cross-entropy loss, but can be adjusted in subclasses by overwriting this function.

Parameters:

X: torch.Tensor: Batch input.
y: torch.Tensor: Batch labels.
wandb_run: Optional[WandBRun] = None: Weights and Biases run to track training statistics.

Returns:

torch.Tensor: Batch loss.

get_uncertainty(input_: LongTensor, *args, metric_name: Optional[str] = None, **kwargs) → FloatTensor¶

Get the uncertainty scores for the current batch.

Parameters:

input_: torch.LongTensor: (Batch of) Indexed input sequences.
metric_name: Optional[str]: Name of uncertainty metric being used. If None, use metric defined under the default_uncertainty_metric attribute.

Returns:

torch.FloatTensor: Uncertainty scores for the current batch.

static load(model_path: str)¶

Load model from path.

Parameters:

model_path: str: Path model was saved to.

Returns:

Model: Loaded model.

predict(X: Tensor, *args, **kwargs) → Tensor¶

Make a prediction for some input.

Parameters:

X: torch.Tensor: Input data points.

Returns:

torch.Tensor: Predictions.

to(device: Union[device, str])¶

Move model to another device.

Parameters:

device: Device: Device the model should be moved to.

class nlp_uncertainty_zoo.models.model.Module(num_layers: int, vocab_size: int, input_size: int, hidden_size: int, output_size: int, is_sequence_classifier: bool, device: Union[device, str], **build_params)¶

Bases: ABC, Module

Abstract module class, defining how the forward pass of a model looks.

property available_uncertainty_metrics: Dict[str, Callable]¶: Return a dictionary of all available uncertainty metrics of the current model.

abstract forward(input_: LongTensor, *args, **kwargs) → FloatTensor¶

Forward pass of the model.

Parameters:

input_: torch.LongTensor: (Batch of) Indexed input sequences.

Returns:

torch.FloatTensor: Output predictions for input.

abstract get_hidden_representation(input_: LongTensor, *args, **kwargs) → FloatTensor¶

Obtain hidden representations for the current input.

Parameters:

input_: torch.LongTensor: Inputs ids for a sentence.

Returns:

torch.FloatTensor: Representation for the current sequence.

abstract get_logits(input_: LongTensor, *args, **kwargs) → FloatTensor¶

Get the logits for an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type. Used to create inputs for the uncertainty metrics defined in nlp_uncertainty_zoo.metrics.

Parameters:

input_: torch.LongTensor: (Batch of) Indexed input sequences.

Returns:

torch.FloatTensor: Logits for current input.

get_num_learnable_parameters() → int¶

Return the total number of (learnable) parameters in the model.

Returns:

int: Number of learnable parameters.

get_sequence_representation(input_: LongTensor, *args, **kwargs) → FloatTensor¶

Define how the representation for an entire sequence is extracted from the input ids. This is relevant in sequence classification. For example, this could be the last hidden state for a unidirectional LSTM or the first hidden state for a transformer, adding a pooler layer.

Parameters:

input_: torch.LongTensor: Inputs ids for a sentence.

Returns:

torch.FloatTensor: Representation for the current sequence.

abstract get_sequence_representation_from_hidden(hidden: FloatTensor) → FloatTensor¶

Define how the representation for an entire sequence is extracted from a number of hidden states. This is relevant in sequence classification. For example, this could be the last hidden state for a unidirectional LSTM or the first hidden state for a transformer, adding a pooler layer.

Parameters:

hidden: torch.FloatTensor: Hidden states of a model for a sequence.

Returns:

torch.FloatTensor: Representation for the current sequence.

get_uncertainty(input_: LongTensor, metric_name: Optional[str] = None, **kwargs) → FloatTensor¶

Get the uncertainty scores for the current batch.

Parameters:

input_: torch.LongTensor: (Batch of) Indexed input sequences.
metric_name: Optional[str]: Name of uncertainty metric being used. If None, use metric defined under the default_uncertainty_metric attribute.

Returns:

torch.FloatTensor: Uncertainty scores for the current batch.

predict(input_: LongTensor, *args, **kwargs) → FloatTensor¶

Output a probability distribution over classes given an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type.

Parameters:

input_: torch.LongTensor: (Batch of) Indexed input sequences.

Returns:

torch.FloatTensor: Logits for current input.

training: bool¶

class nlp_uncertainty_zoo.models.model.MultiPredictionMixin(num_predictions: int)¶

Bases: object

Mixin class that is used to bundle certain methods for modules that use multiple predictions to estimate uncertainty.