spuco.utils

Utility classes and functions.

Trainer

class Trainer(trainset: Dataset, model: Module, batch_size: int, optimizer: Optimizer, lr_scheduler: _LRScheduler | None = None, max_grad_norm: float | None = None, criterion: Module = CrossEntropyLoss(), forward_pass: Callable[[Any], Tuple[Tensor, Tensor, Tensor]] | None = None, sampler: Sampler | None = None, device: device = device(type='cpu'), verbose: bool = False)

Bases: object

__init__(trainset: Dataset, model: Module, batch_size: int, optimizer: Optimizer, lr_scheduler: _LRScheduler | None = None, max_grad_norm: float | None = None, criterion: Module = CrossEntropyLoss(), forward_pass: Callable[[Any], Tuple[Tensor, Tensor, Tensor]] | None = None, sampler: Sampler | None = None, device: device = device(type='cpu'), verbose: bool = False) → None

Initializes an instance of the Trainer class.

Parameters:

trainset (torch.utils.data.Dataset) – The training set.
model (torch.nn.Module) – The PyTorch model to train.
batch_size (int) – The batch size to use during training.
optimizer (torch.optim.Optimizer) – The optimizer to use for training.
criterion (torch.nn.Module, optional) – The loss function to use during training. Default is nn.CrossEntropyLoss().
forward_pass (Callable[[Any], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]], optional) – The forward pass function to use during training. Default is None.
sampler (torch.utils.data.Sampler, optional) – The sampler to use for creating batches. Default is None.
device (torch.device, optional) – The device to use for computations. Default is torch.device(“cpu”).
verbose (bool, optional) – Whether to print training progress. Default is False.

train(num_epochs: int)

Trains for given number of epochs

Parameters:: num_epochs (int) – Number of epochs to train for

train_epoch(epoch: int) → None

Trains the PyTorch model for 1 epoch

Parameters:: epoch (int) – epoch number that is being trained (only used by logging)

static compute_accuracy(outputs: Tensor, labels: Tensor) → float

Computes the accuracy of the PyTorch model.

Parameters:

outputs (torch.Tensor) – The predicted outputs of the model.
labels (torch.Tensor) – The ground truth labels.

Returns:

The accuracy of the model.

Return type:

float

get_trainset_outputs(): Gets output of model on trainset

Custom Indices Sampler

class CustomIndicesSampler(indices: List[int], shuffle: bool = False)

Bases: Sampler[int]

Samples from the specified indices (pass indices - upsampled, downsampled, group balanced etc. to this class) Default is no shuffle.

__init__(indices: List[int], shuffle: bool = False)

Samples elements from the specified indices.

Parameters:

indices (list[int]) – The list of indices to sample from.
shuffle (bool, optional) – Whether to shuffle the indices. Default is False.

Exemplar Clustering (K-Medoids)

cluster_by_exemplars(similarity_matrix, num_exemplars, verbose=False) → Dict[int, List[int]]

Returns a dictionary mapping exemplar index to a list of indices.

Parameters:

similarity_matrix (numpy.ndarray) – The similarity matrix.
num_exemplars (int) – The number of exemplars to select.
verbose (bool, optional) – Whether to print progress information.

Returns:

A dictionary mapping exemplar index to a list of indices.

Return type:

dict[int, list[int]]]

closest_exemplar(sample_index, exemplar_indices, similarity_matrix)

Finds the closest exemplar to a given sample index.

Parameters:

sample_index (int) – The index of the sample.
exemplar_indices (list[int]) – The indices of the exemplars.
similarity_matrix (numpy.ndarray) – The similarity matrix.

Returns:

The index of the closest exemplar and the similarity score.

Return type:

tuple[int, float]

Miscellaneous Functions

convert_labels_to_partition(labels: List[int]) → Dict[int, List[int]]

Converts a list of labels into a partition dictionary.

Parameters:: labels (List[int]) – List of labels.
Returns:: Partition dictionary mapping labels to their corresponding indices.
Return type:: Dict[int, List[int]]

convert_partition_to_labels(partition: Dict[int, List[int]]) → List[int]

Converts a partition dictionary into a list of labels.

Parameters:: partition (Dict[int, List[int]]) – Partition dictionary mapping labels to their corresponding indices.
Returns:: List of labels.
Return type:: List[int]

label_examples(unlabled_dataloader: DataLoader, model: Module, device: device)

Labels examples using a trained model.

Parameters:

unlabeled_dataloader (torch.utils.data.DataLoader) – Dataloader containing unlabeled examples.
model (torch.nn.Module) – Trained model for labeling examples.
device (torch.device) – Device to use for computations.

Returns:

List of predicted labels.

Return type:

List[int]

pairwise_similarity(Z1: tensor, Z2: tensor, block_size: int = 1024)

Computes pairwise similarity between two sets of embeddings.

Parameters:

Z1 (torch.tensor) – Tensor containing the first set of embeddings.
Z2 (torch.tensor) – Tensor containing the second set of embeddings.
block_size (int) – Size of the blocks for computing similarity. Default is 1024.

Returns:

Pairwise similarity matrix.

Return type:

np.array

get_group_ratios(indices: List[int], group_partition: Dict[Tuple[int, int], List[int]])

Returns the ratio of each group found in the given indices

Parameters:

Z1 (torch.tensor) – Tensor containing the first set of embeddings.
Z2 (torch.tensor) – Tensor containing the second set of embeddings.