ahcore.data package#
Submodules#
ahcore.data.dataset module#
Utilities to construct datasets and DataModule’s from manifests.
- class ahcore.data.dataset.DlupDataModule(data_description: DataDescription, pre_transform: Callable[[bool], Callable[[dict[str, Any]], dict[str, Any]]], batch_size: int = 32, validate_batch_size: int | None = None, num_workers: int = 16, persistent_workers: bool = False, pin_memory: bool = False)[source]#
Bases:
LightningDataModule
Datamodule for the Ahcore framework. This datamodule is based on dlup.
Construct a DataModule based on a manifest.
- Parameters:
- data_descriptionDataDescription
See ahcore.utils.data.DataDescription for more information.
- pre_transformCallable
A pre-transform is a callable which is directly applied to the output of the dataset before collation in the dataloader. The transforms typically convert the image in the output to a tensor, convert the WsiAnnotations to a mask or similar.
- batch_sizeint
The batch size of the data loader.
- validate_batch_sizeint, optional
Sometimes the batch size for validation can be larger. If so, set this variable. Will also use this for prediction.
- num_workersint
The number of workers used to fetch tiles.
- persistent_workersbool
Whether to use persistent workers. Check the pytorch documentation for more information.
- pin_memorybool
Whether to use cuda pin workers. Check the pytorch documentation for more information.
- property data_manager: DataManager#
- predict_dataloader() DataLoader[dict[str, Any]] | None [source]#
An iterable or collection of iterables specifying prediction samples.
For more information about multiple dataloaders, see this section.
It’s recommended that all data downloads and preparation happen in
prepare_data()
.predict()
prepare_data()
- Note:
Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
- Return:
A
torch.utils.data.DataLoader
or a sequence of them specifying prediction samples.
- setup(stage: str) None [source]#
Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.
- Args:
stage: either
'fit'
,'validate'
,'test'
, or'predict'
Example:
class LitModel(...): def __init__(self): self.l1 = None def prepare_data(self): download_data() tokenize() # don't do this self.something = else def setup(self, stage): data = load_data(...) self.l1 = nn.Linear(28, data.num_classes)
- teardown(stage: str | None = None) None [source]#
Called at the end of fit (train + validate), validate, test, or predict.
- Args:
stage: either
'fit'
,'validate'
,'test'
, or'predict'
- test_dataloader() DataLoader[dict[str, Any]] | None [source]#
An iterable or collection of iterables specifying test samples.
For more information about multiple dataloaders, see this section.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
test()
prepare_data()
- Note:
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- Note:
If you don’t need a test dataset and a
test_step()
, you don’t need to implement this method.
- train_dataloader() DataLoader[dict[str, Any]] | None [source]#
An iterable or collection of iterables specifying training samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
fit()
prepare_data()
- Note:
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- property uuid: UUID#
This property is used to create a unique cache file for each dataset. The constructor of this dataset is completely determined by the data description, including the pre_transforms. Therefore, we can use the data description to create an uuid that is unique for each datamodule.
The uuid is computed by hashing the data description using the basemodel_to_uuid function, which uses a sha256 hash of the pickled object and converts it to an UUID. As pickles can change with python versions, this uuid will be different when using different python versions.
- Returns:
- str
A unique identifier for this datamodule.
- val_dataloader() DataLoader[dict[str, Any]] | None [source]#
An iterable or collection of iterables specifying validation samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
It’s recommended that all data downloads and preparation happen in
prepare_data()
.fit()
validate()
prepare_data()
- Note:
Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
- Note:
If you don’t need a validation dataset and a
validation_step()
, you don’t need to implement this method.
ahcore.data.samplers module#
Module implementing the samplers. These are used for instance to create batches of the same WSI.
- class ahcore.data.samplers.WsiBatchSampler(dataset: ConcatDataset[TiledROIsSlideImageDataset], batch_size: int)[source]#
Bases:
Sampler
[List
[int
]]
- class ahcore.data.samplers.WsiBatchSamplerPredict(sampler: SequentialSampler | None = None, batch_size: int | None = None, drop_last: bool = False, dataset: ConcatDataset[TiledROIsSlideImageDataset] | None = None)[source]#
Bases:
Sampler
[List
[int
]]This Sampler is identical to the WsiBatchSampler, but its signature is changed for compatibility with the predict phase of Lightning.
Module contents#
General module for datasets, samplers and lightning modules.
Generic dataset generated by a manifest, which can handle classification, detection and segmentation.
Samplers that for instance perform adaptive sampling, or define different weights per sample.