spuco.utils
Utility classes and functions.
Trainer
- class Trainer(trainset: Dataset, model: Module, batch_size: int, optimizer: Optimizer, lr_scheduler: _LRScheduler | None = None, max_grad_norm: float | None = None, criterion: Module = CrossEntropyLoss(), forward_pass: Callable[[Any], Tuple[Tensor, Tensor, Tensor]] | None = None, sampler: Sampler | None = None, device: device = device(type='cpu'), verbose: bool = False)
Bases:
object
- __init__(trainset: Dataset, model: Module, batch_size: int, optimizer: Optimizer, lr_scheduler: _LRScheduler | None = None, max_grad_norm: float | None = None, criterion: Module = CrossEntropyLoss(), forward_pass: Callable[[Any], Tuple[Tensor, Tensor, Tensor]] | None = None, sampler: Sampler | None = None, device: device = device(type='cpu'), verbose: bool = False) None
Initializes an instance of the Trainer class.
- Parameters:
trainset (torch.utils.data.Dataset) – The training set.
model (torch.nn.Module) – The PyTorch model to train.
batch_size (int) – The batch size to use during training.
optimizer (torch.optim.Optimizer) – The optimizer to use for training.
criterion (torch.nn.Module, optional) – The loss function to use during training. Default is nn.CrossEntropyLoss().
forward_pass (Callable[[Any], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]], optional) – The forward pass function to use during training. Default is None.
sampler (torch.utils.data.Sampler, optional) – The sampler to use for creating batches. Default is None.
device (torch.device, optional) – The device to use for computations. Default is torch.device(“cpu”).
verbose (bool, optional) – Whether to print training progress. Default is False.
- train(num_epochs: int)
Trains for given number of epochs
- Parameters:
num_epochs (int) – Number of epochs to train for
- train_epoch(epoch: int) None
Trains the PyTorch model for 1 epoch
- Parameters:
epoch (int) – epoch number that is being trained (only used by logging)
- static compute_accuracy(outputs: Tensor, labels: Tensor) float
Computes the accuracy of the PyTorch model.
- Parameters:
outputs (torch.Tensor) – The predicted outputs of the model.
labels (torch.Tensor) – The ground truth labels.
- Returns:
The accuracy of the model.
- Return type:
- get_trainset_outputs()
Gets output of model on trainset
Custom Indices Sampler
Exemplar Clustering (K-Medoids)
- cluster_by_exemplars(similarity_matrix, num_exemplars, verbose=False) Dict[int, List[int]]
Returns a dictionary mapping exemplar index to a list of indices.
- closest_exemplar(sample_index, exemplar_indices, similarity_matrix)
Finds the closest exemplar to a given sample index.
Miscellaneous Functions
- convert_labels_to_partition(labels: List[int]) Dict[int, List[int]]
Converts a list of labels into a partition dictionary.
- convert_partition_to_labels(partition: Dict[int, List[int]]) List[int]
Converts a partition dictionary into a list of labels.
- label_examples(unlabled_dataloader: DataLoader, model: Module, device: device)
Labels examples using a trained model.
- Parameters:
unlabeled_dataloader (torch.utils.data.DataLoader) – Dataloader containing unlabeled examples.
model (torch.nn.Module) – Trained model for labeling examples.
device (torch.device) – Device to use for computations.
- Returns:
List of predicted labels.
- Return type:
List[int]
- pairwise_similarity(Z1: tensor, Z2: tensor, block_size: int = 1024)
Computes pairwise similarity between two sets of embeddings.
- Parameters:
Z1 (torch.tensor) – Tensor containing the first set of embeddings.
Z2 (torch.tensor) – Tensor containing the second set of embeddings.
block_size (int) – Size of the blocks for computing similarity. Default is 1024.
- Returns:
Pairwise similarity matrix.
- Return type:
np.array