evoaug.augment
Library of data augmentations for genomic sequence data.
To contribute a custom augmentation, use the following syntax:
class CustomAugmentation(AugmentBase):
def __init__(self, param1, param2):
self.param1 = param1
self.param2 = param2
def __call__(self, x: torch.Tensor) -> torch.Tensor:
# Perform augmentation
return x_aug
Module Contents
Classes
Base class for EvoAug augmentations for genomic sequences. |
|
Randomly deletes a contiguous stretch of nucleotides from sequences in a training |
|
Randomly inserts a contiguous stretch of nucleotides from sequences in a training |
|
Randomly cuts sequence in two pieces and shifts the order for each in a training |
|
Randomly inverts a contiguous stretch of nucleotides from sequences in a training |
|
Randomly mutates sequences in a training batch according to a user-defined |
|
Randomly applies a reverse-complement transformation to each sequence in a training |
|
Randomly add Gaussian noise to a batch of sequences with according to a user-defined |
- class evoaug.augment.AugmentBase
Base class for EvoAug augmentations for genomic sequences.
- abstract __call__(x)
Return an augmented version of x.
- Parameters:
x (torch.Tensor) – Batch of one-hot sequences (shape: (N, A, L)).
- Returns:
Batch of one-hot sequences with random augmentation applied.
- Return type:
- class evoaug.augment.RandomDeletion(delete_min=0, delete_max=20)
Bases:
AugmentBase
Randomly deletes a contiguous stretch of nucleotides from sequences in a training batch according to a random number between a user-defined delete_min and delete_max. A different deletion is applied to each sequence.
- Parameters:
- __call__(x)
Randomly delete segments in a set of one-hot DNA sequences.
- Parameters:
x (torch.Tensor) – Batch of one-hot sequences (shape: (N, A, L)).
- Returns:
Sequences with randomly deleted segments (padded to correct shape with random DNA)
- Return type:
- class evoaug.augment.RandomInsertion(insert_min=0, insert_max=20)
Bases:
AugmentBase
Randomly inserts a contiguous stretch of nucleotides from sequences in a training batch according to a random number between a user-defined insert_min and insert_max. A different insertions is applied to each sequence. Each sequence is padded with random DNA to ensure same shapes.
- Parameters:
- __call__(x)
Randomly inserts segments of random DNA to a set of DNA sequences.
- Parameters:
x (torch.Tensor) – Batch of one-hot sequences (shape: (N, A, L)).
- Returns:
Sequences with randomly inserts segments of random DNA. All sequences are padded with random DNA to ensure same shape.
- Return type:
- class evoaug.augment.RandomTranslocation(shift_min=0, shift_max=20)
Bases:
AugmentBase
Randomly cuts sequence in two pieces and shifts the order for each in a training batch. This is implemented with a roll transformation with a user-defined shift_min and shift_max. A different roll (positive or negative) is applied to each sequence. Each sequence is padded with random DNA to ensure same shapes.
- Parameters:
- __call__(x)
Randomly shifts sequences in a batch, x.
- Parameters:
x (torch.Tensor) – Batch of one-hot sequences (shape: (N, A, L)).
- Returns:
Sequences with random translocations.
- Return type:
- class evoaug.augment.RandomInversion(invert_min=0, invert_max=20)
Bases:
AugmentBase
Randomly inverts a contiguous stretch of nucleotides from sequences in a training batch according to a user-defined invert_min and invert_max. A different insertions is applied to each sequence. Each sequence is padded with random DNA to ensure same shapes.
- Parameters:
- __call__(x)
Randomly inverts segments of random DNA to a set of one-hot DNA sequences.
- Parameters:
x (torch.Tensor) – Batch of one-hot sequences (shape: (N, A, L)).
- Returns:
Sequences with randomly inverted segments of random DNA.
- Return type:
- class evoaug.augment.RandomMutation(mutate_frac=0.05)
Bases:
AugmentBase
Randomly mutates sequences in a training batch according to a user-defined mutate_frac. A different set of mutations is applied to each sequence.
- Parameters:
mutate_frac (float, optional) – Probability of mutation for each nucleotide, defaults to 0.05.
- __call__(x)
Randomly introduces mutations to a set of one-hot DNA sequences.
- Parameters:
x (torch.Tensor) – Batch of one-hot sequences (shape: (N, A, L)).
- Returns:
Sequences with randomly mutated DNA.
- Return type:
- class evoaug.augment.RandomRC(rc_prob=0.5)
Bases:
AugmentBase
Randomly applies a reverse-complement transformation to each sequence in a training batch according to a user-defined probability, rc_prob. This is applied to each sequence independently.
- Parameters:
rc_prob (float, optional) – Probability to apply a reverse-complement transformation, defaults to 0.5.
- __call__(x)
Randomly transforms sequences in a batch with a reverse-complement transformation.
- Parameters:
x (torch.Tensor) – Batch of one-hot sequences (shape: (N, A, L)).
- Returns:
Sequences with random reverse-complements applied.
- Return type:
- class evoaug.augment.RandomNoise(noise_mean=0.0, noise_std=0.2)
Bases:
AugmentBase
Randomly add Gaussian noise to a batch of sequences with according to a user-defined noise_mean and noise_std. A different set of noise is applied to each sequence.
- Parameters:
- __call__(x)
Randomly adds Gaussian noise to a set of one-hot DNA sequences.
- Parameters:
x (torch.Tensor) – Batch of one-hot sequences (shape: (N, A, L)).
- Returns:
Sequences with random noise.
- Return type: