spuco.last_layer_retrain
Last Layer Retraining methods.
Deep Feature Reweighting (DFR)
- class DFR(group_labeled_set: GroupLabeledDatasetWrapper, model: SpuCoModel, n_lin_models: int = 20, labeled_valset_size: float = 0.5, C_range: List[float] = [1.0, 0.7, 0.3, 0.1, 0.07, 0.03, 0.01], class_weight_options: List[Dict] | None = None, validation_set: GroupLabeledDatasetWrapper | None = None, device: device = device(type='cpu'), verbose: bool = False, data_for_scaler: Dataset | None = None)
Bases:
object
- __init__(group_labeled_set: GroupLabeledDatasetWrapper, model: SpuCoModel, n_lin_models: int = 20, labeled_valset_size: float = 0.5, C_range: List[float] = [1.0, 0.7, 0.3, 0.1, 0.07, 0.03, 0.01], class_weight_options: List[Dict] | None = None, validation_set: GroupLabeledDatasetWrapper | None = None, device: device = device(type='cpu'), verbose: bool = False, data_for_scaler: Dataset | None = None)
Initializes the DFR object.
- Parameters:
group_labeled_set (Dataset) – The group-labeled dataset.
model (SpucoModel) – The base model.
n_lin_models – Number of linear models to average.
labeled_valset_size (float) – The ratio of the labeled data to be used for validation if no validation set is given.
C_range (list) – Options of C, which is the inverse of l1 regularization strength as in sklearn.
class_weight_options (list) – options for class weight.
validation_set (GroupLabeledDatasetWrapper) – data used for hyperparameter selection. If not provided, half of the group labeled data will be used.
data_for_scaler (Dataset) – Data used for fitting the sklearn scaler. If not provided, group labeled data will be used.
- train_single_model(C, X_train, y_train, g_train, class_weight)
Trains a single model.
- train_multiple_model(C, X_labeled_train, y_labeled_train, g_labeled_train, class_weight)
Trains the DFR model.
- Parameters:
C (float) – Regularization parameter C for the SVM model.
X_labeled_train (numpy.ndarray) – Labeled training features.
y_labeled_train (numpy.ndarray) – Labeled training labels.
g_labeled_train (numpy.ndarray) – Labeled training group labels.
class_weight (dict or 'balanced', optional) – Weight associated with each class.
- hyperparam_selection(X_labeled_train, y_labeled_train, g_labeled_train, X_labeled_val, y_labeled_val, g_labeled_val)
Performs hyperparameter selection for the DFR model.
- Parameters:
X_labeled_train (numpy.ndarray) – Labeled training features.
y_labeled_train (numpy.ndarray) – Labeled training labels.
g_labeled_train (numpy.ndarray) – Labeled training group labels.
X_labeled_val (numpy.ndarray) – Labeled validation features.
y_labeled_val (numpy.ndarray) – Labeled validation labels.
g_labeled_val (numpy.ndarray) – Labeled validation group labels.
- train()
Retrain last layer
- evaluate_worstgroup_acc(C, coef, intercept, X_val, y_val, g_val)
Evaluates the worst-group accuracy for the DFR model.
- Parameters:
C (float) – Regularization parameter C for the SVM model.
coef (numpy.ndarray) – Coefficients of the linear SVM model.
intercept (numpy.ndarray) – Intercept of the linear SVM model.
X_val (numpy.ndarray) – Validation features.
y_val (numpy.ndarray) – Validation labels.
g_val (numpy.ndarray) – Validation group labels.
- encode_dataset(dataset)
Encodes the training set using the DFR model.
- Parameters:
dataset (torch.utils.data.Dataset) – The training dataset.
- Returns:
The encoded features and labels of the training set.
- Return type:
Tuple[torch.Tensor, torch.Tensor]
DISPEL
- class DISPEL(group_labeled_set: GroupLabeledDatasetWrapper, model: SpuCoModel, labeled_valset_size: float = 0.5, C_range: List[float] = [1.0, 0.7, 0.3, 0.1, 0.07, 0.03, 0.01], s_range: List[float] = [1.0, 0.9, 0.8, 0.7], class_weight_options: List[Dict] | None = None, validation_set: GroupLabeledDatasetWrapper | None = None, alpha_range: List[float] = [1.0], size_of_mixed: int | None = None, n_lin_models: int = 20, data_for_scaler: Dataset | None = None, group_unlabeled_set: Dataset | None = None, device: device = device(type='cpu'), verbose: bool = False)
Bases:
DFR
- __init__(group_labeled_set: GroupLabeledDatasetWrapper, model: SpuCoModel, labeled_valset_size: float = 0.5, C_range: List[float] = [1.0, 0.7, 0.3, 0.1, 0.07, 0.03, 0.01], s_range: List[float] = [1.0, 0.9, 0.8, 0.7], class_weight_options: List[Dict] | None = None, validation_set: GroupLabeledDatasetWrapper | None = None, alpha_range: List[float] = [1.0], size_of_mixed: int | None = None, n_lin_models: int = 20, data_for_scaler: Dataset | None = None, group_unlabeled_set: Dataset | None = None, device: device = device(type='cpu'), verbose: bool = False)
Initializes the DISPEL object.
- Parameters:
group_labeled_set (Dataset) – The group-labeled dataset.
model (SpucoModel) – The base model.
n_lin_models – Number of linear models to average.
labeled_valset_size (float) – The ratio of the labeled data to be used for validation if no validation set is given.
C_range (list) – Options of C, which is the inverse of l1 regularization strength as in sklearn.
s_range (list) – Options of s, which is the weights assigned to the group-labeled data when generating mixed data.
alpha_range (list) – Options of alpha, which is the probabilty of mixing data
size_of_mixed – size of the mixed dataset.
group_unlabeled_set (Dataset) – group unlabeled dataset. If provided, it will be included in the group unbalanced dataset.
class_weight_options (list) – options for class weight.
validation_set (GroupLabeledDatasetWrapper) – data used for hyperparameter selection. If not provided, half of the group labeled data will be used.
data_for_scaler (Dataset) – Data used for fitting the sklearn scaler. If not provided, group labeled data will be used.
- train_single_model(alpha, s, C, X_labeled, y_labeled, g_labeled, class_weight)
Trains a single model.
- Parameters:
alpha (float) – Trade-off parameter between accuracy and fairness.
s (float) – Sensitivity parameter for fairness regularization.
C (float) – Regularization parameter C for the SVM model.
X_labeled (numpy.ndarray) – Labeled training features.
y_labeled (numpy.ndarray) – Labeled training labels.
g_labeled (numpy.ndarray) – Labeled training group labels.
class_weight (dict or 'balanced', optional) – Weight associated with each class.
- train_multiple_model(alpha, s, C, X_labeled_train, y_labeled_train, g_labeled_train, class_weight)
Trains the DFR model.
- Parameters:
alpha (float) – Trade-off parameter between accuracy and fairness.
s (float) – Sensitivity parameter for fairness regularization.
C (float) – Regularization parameter C for the SVM model.
X_labeled_train (numpy.ndarray) – Labeled training features.
y_labeled_train (numpy.ndarray) – Labeled training labels.
g_labeled_train (numpy.ndarray) – Labeled training group labels.
class_weight (dict or 'balanced', optional) – Weight associated with each class.
- hyperparam_selection(X_labeled_train, y_labeled_train, g_labeled_train, X_labeled_val, y_labeled_val, g_labeled_val)
Performs hyperparameter selection for the DFR model.
- Parameters:
X_labeled_train (numpy.ndarray) – Labeled training features.
y_labeled_train (numpy.ndarray) – Labeled training labels.
g_labeled_train (numpy.ndarray) – Labeled training group labels.
X_labeled_val (numpy.ndarray) – Labeled validation features.
y_labeled_val (numpy.ndarray) – Labeled validation labels.
g_labeled_val (numpy.ndarray) – Labeled validation group labels.
- train()
Last Layer Retraining.