ahcore.data package#


ahcore.data.dataset module#

Utilities to construct datasets and DataModule’s from manifests.

class ahcore.data.dataset.DlupDataModule(data_description: DataDescription, pre_transform: Callable[[bool], Callable[[dict[str, Any]], dict[str, Any]]], batch_size: int = 32, validate_batch_size: int | None = None, num_workers: int = 16, persistent_workers: bool = False, pin_memory: bool = False)[source]#

Bases: LightningDataModule

Datamodule for the Ahcore framework. This datamodule is based on dlup.

Construct a DataModule based on a manifest.


See ahcore.utils.data.DataDescription for more information.


A pre-transform is a callable which is directly applied to the output of the dataset before collation in the dataloader. The transforms typically convert the image in the output to a tensor, convert the WsiAnnotations to a mask or similar.


The batch size of the data loader.

validate_batch_sizeint, optional

Sometimes the batch size for validation can be larger. If so, set this variable. Will also use this for prediction.


The number of workers used to fetch tiles.


Whether to use persistent workers. Check the pytorch documentation for more information.


Whether to use cuda pin workers. Check the pytorch documentation for more information.

property data_manager: DataManager#
predict_dataloader() DataLoader[dict[str, Any]] | None[source]#

An iterable or collection of iterables specifying prediction samples.

For more information about multiple dataloaders, see this section.

It’s recommended that all data downloads and preparation happen in prepare_data().


Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.


A torch.utils.data.DataLoader or a sequence of them specifying prediction samples.

setup(stage: str) None[source]#

Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.


stage: either 'fit', 'validate', 'test', or 'predict'


class LitModel(...):
    def __init__(self):
        self.l1 = None

    def prepare_data(self):

        # don't do this
        self.something = else

    def setup(self, stage):
        data = load_data(...)
        self.l1 = nn.Linear(28, data.num_classes)
teardown(stage: str | None = None) None[source]#

Called at the end of fit (train + validate), validate, test, or predict.


stage: either 'fit', 'validate', 'test', or 'predict'

test_dataloader() DataLoader[dict[str, Any]] | None[source]#

An iterable or collection of iterables specifying test samples.

For more information about multiple dataloaders, see this section.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.


do not assign state in prepare_data


Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.


If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

train_dataloader() DataLoader[dict[str, Any]] | None[source]#

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.


do not assign state in prepare_data


Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

property uuid: UUID#

This property is used to create a unique cache file for each dataset. The constructor of this dataset is completely determined by the data description, including the pre_transforms. Therefore, we can use the data description to create an uuid that is unique for each datamodule.

The uuid is computed by hashing the data description using the basemodel_to_uuid function, which uses a sha256 hash of the pickled object and converts it to an UUID. As pickles can change with python versions, this uuid will be different when using different python versions.


A unique identifier for this datamodule.

val_dataloader() DataLoader[dict[str, Any]] | None[source]#

An iterable or collection of iterables specifying validation samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

  • fit()

  • validate()

  • prepare_data()

  • setup()


Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.


If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

ahcore.data.samplers module#

Module implementing the samplers. These are used for instance to create batches of the same WSI.

class ahcore.data.samplers.WsiBatchSampler(dataset: ConcatDataset[TiledROIsSlideImageDataset], batch_size: int)[source]#

Bases: Sampler[List[int]]

class ahcore.data.samplers.WsiBatchSamplerPredict(sampler: SequentialSampler | None = None, batch_size: int | None = None, drop_last: bool = False, dataset: ConcatDataset[TiledROIsSlideImageDataset] | None = None)[source]#

Bases: Sampler[List[int]]

This Sampler is identical to the WsiBatchSampler, but its signature is changed for compatibility with the predict phase of Lightning.

Module contents#

General module for datasets, samplers and lightning modules.

  • Generic dataset generated by a manifest, which can handle classification, detection and segmentation.

  • Samplers that for instance perform adaptive sampling, or define different weights per sample.