Models

This module includes abstractions for the model implementation in this package, defining the basic signature of the __init__() method and some mandatory methods that have to be implemented by subclasses.

The idea is to provide all implemented models in two versions to accommodate different users:

The Module class

The nlp_uncertainty_zoo.models.model.Module class is supposed to mirror PyTorch’s `nn.Module<https://pytorch.org/docs/stable/generated/torch.nn.Module.html>`_ class, only including the bare-bones logic of the model. By default, input parameters include the number of layers, as well as the sizes for the model vocabulary, embeddings (input_size), hidden activations and number of classes (output_size).

One also needs to specify the PyTorch device (like “cpu” or “cuda”) and whether the model is used for sequence classification (is_sequence_classifier=True) or sequence labelling (is_sequence_classifier=False). Other kind of tasks like regression, parsing or generation are not supported at the moment.

The following methods have to be implemented by every subclass of Module:

The Model class

The nlp_uncertainty_zoo.models.model.Model class is aimed as a complete drop-in solution for anyone who does not want to write training logic and similar aspects. As input parameters, nlp_uncertainty_zoo.models.model.Model expected the name of the model as a string, a reference to a class, the model parameters as a dictionary of keyword arguments that is passed to the nlp_uncertainty_zoo.models.model.Module.__init__() function of the given nlp_uncertainty_zoo.models.model.Module subclass. The model_dir is an optional argument that specified the path to which the model is saved during training.

The user mainly interacts with the nlp_uncertainty_zoo.models.model.Model class using the nlp_uncertainty_zoo.models.model.Model.fit(), nlp_uncertainty_zoo.models.model.Model.predict() and nlp_uncertainty_zoo.models.model.Model.get_uncertainty() functions, where the latter two mirror the function implementations in Module. nlp_uncertainty_zoo.models.model.Model.fit() expects a training and validation set `torch.utils.data.DataLoader<https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader>`_ instances.

Models Documentation

Define common methods of models. This done by separating the logic into two parts:
  • Module: This class only defines the model architecture and forward pass. This is also done so that others can easily copy and adapt the code if necessary.

  • Model: This wrapper class defines all the other logic necessary to use a model in practice: Training, loss computation, saving and loading, etc.

class nlp_uncertainty_zoo.models.model.Model(model_name: str, module_class: type, lr: float, weight_decay: float, optimizer_class: ~typing.Type[~torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, scheduler_class: ~typing.Optional[~typing.Type[~torch.optim.lr_scheduler._LRScheduler]] = None, scheduler_kwargs: ~typing.Optional[~typing.Dict[str, ~typing.Any]] = None, model_dir: ~typing.Optional[str] = None, device: ~typing.Union[~torch.device, str] = 'cpu', **model_params)

Bases: ABC

Abstract model class. It is a wrapper that defines data loading, batching, training and evaluation loops, so that the core module class can only define the model’s forward pass.

property available_uncertainty_metrics: Dict[str, Callable]

Return a dictionary of all available uncertainty metrics of the current model.

compute_loss_weights(train_split: DataLoader, *args, **kwargs) FloatTensor

Compute loss weights for unbalanced training sets.

Parameters:
train_split: DataLoader
Returns:
torch.FloatTensor

Tensor containing loss weights (should be of size K).

eval(data_split: DataLoader, wandb_run: Optional[Run] = None) Tensor

Evaluate a data split.

Parameters:
data_split: DataSplit

Data split the model should be evaluated on.

wandb_run: Optional[WandBRun]

Weights and Biases run to track training statistics. Training and validation loss (if applicable) are tracked by default, everything else is defined in _epoch_iter() and _finetune() depending on the model.

Returns:
torch.Tensor

Loss on evaluation split.

fit(train_split: DataLoader, num_training_steps: int, valid_split: Optional[DataLoader] = None, weight_loss: bool = False, grad_clip: float = 10, validation_interval: Optional[int] = None, early_stopping_pat: int = inf, early_stopping: bool = False, verbose: bool = True, wandb_run: Optional[Run] = None, **training_kwargs)

Fit the model to training data.

Parameters:
train_split: DataLoader

Dataset the model is being trained on.

num_training_steps: int

Number of training steps until completion.

valid_split: Optional[DataLoader]

Validation set the model is being evaluated on if given.

verbose: bool

Whether to display information about current loss.

weight_loss: bool

Weight classes in loss function. Default is False.

grad_clip: float

Parameter grad norm value before it will be clipped. Default is 10.

validation_interval: Optional[int]

Interval of training steps between validations on the validation set. If None, the model is evaluated after each pass through the training data.

early_stopping_pat: int

Patience in number of training steps before early stopping kicks in. Default is np.inf.

early_stopping: bool

Whether early stopping should be used. Default is False.

wandb_run: Optional[WandBRun]

Weights and Biases run to track training statistics. Training and validation loss (if applicable) are tracked by default, everything else is defined in _epoch_iter() and _finetune() depending on the model.

get_loss(X: Tensor, y: Tensor, wandb_run: Optional[Run] = None, **kwargs) Tensor

Get loss for a single batch. This just uses cross-entropy loss, but can be adjusted in subclasses by overwriting this function.

Parameters:
X: torch.Tensor

Batch input.

y: torch.Tensor

Batch labels.

wandb_run: Optional[WandBRun] = None

Weights and Biases run to track training statistics.

Returns:
torch.Tensor

Batch loss.

get_uncertainty(input_: LongTensor, *args, metric_name: Optional[str] = None, **kwargs) FloatTensor

Get the uncertainty scores for the current batch.

Parameters:
input_: torch.LongTensor

(Batch of) Indexed input sequences.

metric_name: Optional[str]

Name of uncertainty metric being used. If None, use metric defined under the default_uncertainty_metric attribute.

Returns:
torch.FloatTensor

Uncertainty scores for the current batch.

static load(model_path: str)

Load model from path.

Parameters:
model_path: str

Path model was saved to.

Returns:
Model

Loaded model.

predict(X: Tensor, *args, **kwargs) Tensor

Make a prediction for some input.

Parameters:
X: torch.Tensor

Input data points.

Returns:
torch.Tensor

Predictions.

to(device: Union[device, str])

Move model to another device.

Parameters:
device: Device

Device the model should be moved to.

class nlp_uncertainty_zoo.models.model.Module(num_layers: int, vocab_size: int, input_size: int, hidden_size: int, output_size: int, is_sequence_classifier: bool, device: Union[device, str], **build_params)

Bases: ABC, Module

Abstract module class, defining how the forward pass of a model looks.

property available_uncertainty_metrics: Dict[str, Callable]

Return a dictionary of all available uncertainty metrics of the current model.

abstract forward(input_: LongTensor, *args, **kwargs) FloatTensor

Forward pass of the model.

Parameters:
input_: torch.LongTensor

(Batch of) Indexed input sequences.

Returns:
torch.FloatTensor

Output predictions for input.

abstract get_hidden_representation(input_: LongTensor, *args, **kwargs) FloatTensor

Obtain hidden representations for the current input.

Parameters:
input_: torch.LongTensor

Inputs ids for a sentence.

Returns:
torch.FloatTensor

Representation for the current sequence.

abstract get_logits(input_: LongTensor, *args, **kwargs) FloatTensor

Get the logits for an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type. Used to create inputs for the uncertainty metrics defined in nlp_uncertainty_zoo.metrics.

Parameters:
input_: torch.LongTensor

(Batch of) Indexed input sequences.

Returns:
torch.FloatTensor

Logits for current input.

get_num_learnable_parameters() int

Return the total number of (learnable) parameters in the model.

Returns:
int

Number of learnable parameters.

get_sequence_representation(input_: LongTensor, *args, **kwargs) FloatTensor

Define how the representation for an entire sequence is extracted from the input ids. This is relevant in sequence classification. For example, this could be the last hidden state for a unidirectional LSTM or the first hidden state for a transformer, adding a pooler layer.

Parameters:
input_: torch.LongTensor

Inputs ids for a sentence.

Returns:
torch.FloatTensor

Representation for the current sequence.

abstract get_sequence_representation_from_hidden(hidden: FloatTensor) FloatTensor

Define how the representation for an entire sequence is extracted from a number of hidden states. This is relevant in sequence classification. For example, this could be the last hidden state for a unidirectional LSTM or the first hidden state for a transformer, adding a pooler layer.

Parameters:
hidden: torch.FloatTensor

Hidden states of a model for a sequence.

Returns:
torch.FloatTensor

Representation for the current sequence.

get_uncertainty(input_: LongTensor, metric_name: Optional[str] = None, **kwargs) FloatTensor

Get the uncertainty scores for the current batch.

Parameters:
input_: torch.LongTensor

(Batch of) Indexed input sequences.

metric_name: Optional[str]

Name of uncertainty metric being used. If None, use metric defined under the default_uncertainty_metric attribute.

Returns:
torch.FloatTensor

Uncertainty scores for the current batch.

predict(input_: LongTensor, *args, **kwargs) FloatTensor

Output a probability distribution over classes given an input. Results in a tensor of size batch_size x seq_len x output_size or batch_size x num_predictions x seq_len x output_size depending on the model type.

Parameters:
input_: torch.LongTensor

(Batch of) Indexed input sequences.

Returns:
torch.FloatTensor

Logits for current input.

training: bool
class nlp_uncertainty_zoo.models.model.MultiPredictionMixin(num_predictions: int)

Bases: object

Mixin class that is used to bundle certain methods for modules that use multiple predictions to estimate uncertainty.