spuco.last_layer_retrain

Last Layer Retraining methods.

Deep Feature Reweighting (DFR)

class DFR(group_labeled_set: GroupLabeledDatasetWrapper, model: SpuCoModel, n_lin_models: int = 20, labeled_valset_size: float = 0.5, C_range: List[float] = [1.0, 0.7, 0.3, 0.1, 0.07, 0.03, 0.01], class_weight_options: List[Dict] | None = None, validation_set: GroupLabeledDatasetWrapper | None = None, device: device = device(type='cpu'), verbose: bool = False, data_for_scaler: Dataset | None = None)

Bases: object

__init__(group_labeled_set: GroupLabeledDatasetWrapper, model: SpuCoModel, n_lin_models: int = 20, labeled_valset_size: float = 0.5, C_range: List[float] = [1.0, 0.7, 0.3, 0.1, 0.07, 0.03, 0.01], class_weight_options: List[Dict] | None = None, validation_set: GroupLabeledDatasetWrapper | None = None, device: device = device(type='cpu'), verbose: bool = False, data_for_scaler: Dataset | None = None)

Initializes the DFR object.

Parameters:
  • group_labeled_set (Dataset) – The group-labeled dataset.

  • model (SpucoModel) – The base model.

  • n_lin_models – Number of linear models to average.

  • labeled_valset_size (float) – The ratio of the labeled data to be used for validation if no validation set is given.

  • C_range (list) – Options of C, which is the inverse of l1 regularization strength as in sklearn.

  • class_weight_options (list) – options for class weight.

  • validation_set (GroupLabeledDatasetWrapper) – data used for hyperparameter selection. If not provided, half of the group labeled data will be used.

  • data_for_scaler (Dataset) – Data used for fitting the sklearn scaler. If not provided, group labeled data will be used.

train_single_model(C, X_train, y_train, g_train, class_weight)

Trains a single model.

Parameters:
  • C (float) – Regularization parameter C for the SVM model.

  • X_train (numpy.ndarray) – Training features.

  • y_train (numpy.ndarray) – Training labels.

  • g_train (numpy.ndarray) – Training group labels.

  • class_weight (dict or 'balanced', optional) – Weight associated with each class.

train_multiple_model(C, X_labeled_train, y_labeled_train, g_labeled_train, class_weight)

Trains the DFR model.

Parameters:
  • C (float) – Regularization parameter C for the SVM model.

  • X_labeled_train (numpy.ndarray) – Labeled training features.

  • y_labeled_train (numpy.ndarray) – Labeled training labels.

  • g_labeled_train (numpy.ndarray) – Labeled training group labels.

  • class_weight (dict or 'balanced', optional) – Weight associated with each class.

hyperparam_selection(X_labeled_train, y_labeled_train, g_labeled_train, X_labeled_val, y_labeled_val, g_labeled_val)

Performs hyperparameter selection for the DFR model.

Parameters:
  • X_labeled_train (numpy.ndarray) – Labeled training features.

  • y_labeled_train (numpy.ndarray) – Labeled training labels.

  • g_labeled_train (numpy.ndarray) – Labeled training group labels.

  • X_labeled_val (numpy.ndarray) – Labeled validation features.

  • y_labeled_val (numpy.ndarray) – Labeled validation labels.

  • g_labeled_val (numpy.ndarray) – Labeled validation group labels.

train()

Retrain last layer

evaluate_worstgroup_acc(C, coef, intercept, X_val, y_val, g_val)

Evaluates the worst-group accuracy for the DFR model.

Parameters:
  • C (float) – Regularization parameter C for the SVM model.

  • coef (numpy.ndarray) – Coefficients of the linear SVM model.

  • intercept (numpy.ndarray) – Intercept of the linear SVM model.

  • X_val (numpy.ndarray) – Validation features.

  • y_val (numpy.ndarray) – Validation labels.

  • g_val (numpy.ndarray) – Validation group labels.

encode_dataset(dataset)

Encodes the training set using the DFR model.

Parameters:

dataset (torch.utils.data.Dataset) – The training dataset.

Returns:

The encoded features and labels of the training set.

Return type:

Tuple[torch.Tensor, torch.Tensor]

DISPEL

class DISPEL(group_labeled_set: GroupLabeledDatasetWrapper, model: SpuCoModel, labeled_valset_size: float = 0.5, C_range: List[float] = [1.0, 0.7, 0.3, 0.1, 0.07, 0.03, 0.01], s_range: List[float] = [1.0, 0.9, 0.8, 0.7], class_weight_options: List[Dict] | None = None, validation_set: GroupLabeledDatasetWrapper | None = None, alpha_range: List[float] = [1.0], size_of_mixed: int | None = None, n_lin_models: int = 20, data_for_scaler: Dataset | None = None, group_unlabeled_set: Dataset | None = None, device: device = device(type='cpu'), verbose: bool = False)

Bases: DFR

__init__(group_labeled_set: GroupLabeledDatasetWrapper, model: SpuCoModel, labeled_valset_size: float = 0.5, C_range: List[float] = [1.0, 0.7, 0.3, 0.1, 0.07, 0.03, 0.01], s_range: List[float] = [1.0, 0.9, 0.8, 0.7], class_weight_options: List[Dict] | None = None, validation_set: GroupLabeledDatasetWrapper | None = None, alpha_range: List[float] = [1.0], size_of_mixed: int | None = None, n_lin_models: int = 20, data_for_scaler: Dataset | None = None, group_unlabeled_set: Dataset | None = None, device: device = device(type='cpu'), verbose: bool = False)

Initializes the DISPEL object.

Parameters:
  • group_labeled_set (Dataset) – The group-labeled dataset.

  • model (SpucoModel) – The base model.

  • n_lin_models – Number of linear models to average.

  • labeled_valset_size (float) – The ratio of the labeled data to be used for validation if no validation set is given.

  • C_range (list) – Options of C, which is the inverse of l1 regularization strength as in sklearn.

  • s_range (list) – Options of s, which is the weights assigned to the group-labeled data when generating mixed data.

  • alpha_range (list) – Options of alpha, which is the probabilty of mixing data

  • size_of_mixed – size of the mixed dataset.

  • group_unlabeled_set (Dataset) – group unlabeled dataset. If provided, it will be included in the group unbalanced dataset.

  • class_weight_options (list) – options for class weight.

  • validation_set (GroupLabeledDatasetWrapper) – data used for hyperparameter selection. If not provided, half of the group labeled data will be used.

  • data_for_scaler (Dataset) – Data used for fitting the sklearn scaler. If not provided, group labeled data will be used.

train_single_model(alpha, s, C, X_labeled, y_labeled, g_labeled, class_weight)

Trains a single model.

Parameters:
  • alpha (float) – Trade-off parameter between accuracy and fairness.

  • s (float) – Sensitivity parameter for fairness regularization.

  • C (float) – Regularization parameter C for the SVM model.

  • X_labeled (numpy.ndarray) – Labeled training features.

  • y_labeled (numpy.ndarray) – Labeled training labels.

  • g_labeled (numpy.ndarray) – Labeled training group labels.

  • class_weight (dict or 'balanced', optional) – Weight associated with each class.

train_multiple_model(alpha, s, C, X_labeled_train, y_labeled_train, g_labeled_train, class_weight)

Trains the DFR model.

Parameters:
  • alpha (float) – Trade-off parameter between accuracy and fairness.

  • s (float) – Sensitivity parameter for fairness regularization.

  • C (float) – Regularization parameter C for the SVM model.

  • X_labeled_train (numpy.ndarray) – Labeled training features.

  • y_labeled_train (numpy.ndarray) – Labeled training labels.

  • g_labeled_train (numpy.ndarray) – Labeled training group labels.

  • class_weight (dict or 'balanced', optional) – Weight associated with each class.

hyperparam_selection(X_labeled_train, y_labeled_train, g_labeled_train, X_labeled_val, y_labeled_val, g_labeled_val)

Performs hyperparameter selection for the DFR model.

Parameters:
  • X_labeled_train (numpy.ndarray) – Labeled training features.

  • y_labeled_train (numpy.ndarray) – Labeled training labels.

  • g_labeled_train (numpy.ndarray) – Labeled training group labels.

  • X_labeled_val (numpy.ndarray) – Labeled validation features.

  • y_labeled_val (numpy.ndarray) – Labeled validation labels.

  • g_labeled_val (numpy.ndarray) – Labeled validation group labels.

train()

Last Layer Retraining.