dlup.data package#

Submodules#

dlup.data.dataset module#

Datasets helpers to simplify the generation of a dataset made of tiles from a WSI. Dataset and ConcatDataset are taken from pytorch 1.8.0 under BSD license.

class dlup.data.dataset.AnnotationData[source]#

Bases: dict

boxes: dict[str, list[tuple[tuple[int, int], tuple[int, int]]]]#
mask: numpy.ndarray[Any, numpy.dtype[numpy.int64]]#
points: dict[str, list[tuple[float, float]]]#
roi: Optional[numpy.ndarray[Any, numpy.dtype[numpy.int64]]]#
class dlup.data.dataset.BaseWsiDataset(path: Union[str, pathlib.Path], regions: collections.abc.Sequence[tuple[float, float, int, int, float]], crop: bool = False, mask: Optional[Union[dlup._image.SlideImage, numpy.ndarray[Any, numpy.dtype[numpy.int64]], dlup.annotations.WsiAnnotations]] = None, mask_threshold: float | None = 0.0, output_tile_size: Optional[tuple[int, int]] = None, annotations: Optional[Union[list[Union[list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]], list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]] = None, labels: Optional[list[tuple[str, Union[str, bool, int, float]]]] = None, backend: Union[dlup.experimental_backends.ImageBackend, Type[dlup.backends.common.AbstractSlideBackend]] = ImageBackend.OPENSLIDE, apply_color_profile: bool = False, **kwargs: Any)[source]#

Bases: dlup.data.dataset.Dataset[Union[dlup.data.dataset.TileSample, Sequence[dlup.data.dataset.TileSample]]]

Generic Dataset to iterate over regions of a :class:`SlideImage`class.

This class features some logic to avoid instantiating too many slides which for very large datasets can cause expensive allocation due to internal caching of the image reading backand.

This class is the superclass of TiledWsiDataset, which has a function, from_standard_tiling, to compute all the regions for specified tiling parameters on the fly.

Parameters
pathPathLike

Path to the image.

regionscollections.abc.Sequence[tuple[float, float, int, int, float]]

Sequence of rectangular regions as (x, y, h, w, mpp)

cropbool

Whether to crop overflowing tiles.

masknp.ndarray or SlideImage or WsiAnnotations

Binary mask used to filter each region together with a threshold.

mask_thresholdfloat, optional

Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.

output_tile_size: tuple[int, int], optional

If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.

annotationsWsiAnnotations

Annotation classes.

labelslist

Image-level labels. Will be added to each individual tile.

transform

Transforming function. To be used for augmentations or other model specific preprocessing.

backendImageBackend or AbstractSlideBackend

Backend to pass to SlideImage

apply_color_profilebool

Whether to apply the ICC profile to the image if available.

**kwargsAny

Passed to SlideImage

property crop: bool#

Returns true the regions will be cropped at the boundaries.

property path: Union[str, pathlib.Path]#

Path of whole slide image

property slide_image: dlup._image.SlideImage#

Return the cached slide image instance associated with this dataset.

class dlup.data.dataset.ConcatDataset(datasets: Iterable[dlup.data.dataset.Dataset[dlup.data.dataset.T_co]])[source]#

Bases: dlup.data.dataset.Dataset[dlup.data.dataset.T_co]

Dataset as a concatenation of multiple datasets.

This class is useful to assemble different existing datasets.

Parameters
datasetssequence

List of datasets to be concatenated

Notes

Taken and adapted from pytorch 1.8.0 torch.utils.data.Dataset under BSD license.

static cumsum(sequence: list[dlup.data.dataset.Dataset[+ T_co]]) list[int][source]#
cumulative_sizes: list[int]#
datasets: list[dlup.data.dataset.Dataset[+T_co]]#
index_to_dataset(idx: int) tuple[dlup.data.dataset.Dataset[+T_co], int][source]#

Returns the dataset and the index of the sample in the dataset.

Parameters
idxint

Index of the sample in the concatenated dataset.

Returns
tuple[Dataset, int]

Dataset and index of the sample in the dataset.

wsi_indices: dict[str, range]#
class dlup.data.dataset.Dataset[source]#

Bases: Generic[dlup.data.dataset.T_co], collections.abc.Sequence[dlup.data.dataset.T_co]

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader.

Notes

Taken and adapted from pytorch 1.8.0 torch.utils.data.Dataset under BSD license. DataLoader by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

class dlup.data.dataset.RegionFromWsiDatasetSample[source]#

Bases: dict

annotations: Optional[Iterable[dlup.annotations.Point | dlup.annotations.Polygon]]#
coordinates: tuple[int | float, int | float]#
grid_index: int#
grid_local_coordinates: tuple[int, int]#
image: PIL.Image.Image#
labels: dict[str, Any] | None#
mpp: float#
path: Union[str, pathlib.Path]#
region_index: int#
class dlup.data.dataset.TileSample[source]#

Bases: TypedDict

annotations: Optional[Iterable[dlup.annotations.Point | dlup.annotations.Polygon]]#
coordinates: tuple[int | float, int | float]#
image: PIL.Image.Image#
labels: dict[str, Any] | None#
mpp: float#
path: Union[str, pathlib.Path]#
region_index: int#
class dlup.data.dataset.TileSampleWithAnnotationData[source]#

Bases: TypedDict

annotation_data: dlup.data.dataset.AnnotationData#
class dlup.data.dataset.TiledROIsSlideImageDataset(*args: Any, **kwargs: Any)[source]#

Bases: dlup.data.dataset.TiledWsiDataset

Parameters
pathPathLike

Path to the image.

regionscollections.abc.Sequence[tuple[float, float, int, int, float]]

Sequence of rectangular regions as (x, y, h, w, mpp)

cropbool

Whether to crop overflowing tiles.

masknp.ndarray or SlideImage or WsiAnnotations

Binary mask used to filter each region together with a threshold.

mask_thresholdfloat, optional

Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.

output_tile_size: tuple[int, int], optional

If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.

annotationsWsiAnnotations

Annotation classes.

labelslist

Image-level labels. Will be added to each individual tile.

transform

Transforming function. To be used for augmentations or other model specific preprocessing.

backendImageBackend or AbstractSlideBackend

Backend to pass to SlideImage

apply_color_profilebool

Whether to apply the ICC profile to the image if available.

**kwargsAny

Passed to SlideImage

class dlup.data.dataset.TiledWsiDataset(path: Union[str, pathlib.Path], grids: list[tuple[dlup.tiling.Grid, tuple[int, int], float]], crop: bool = False, mask: Optional[Union[dlup._image.SlideImage, numpy.ndarray[Any, numpy.dtype[numpy.int64]], dlup.annotations.WsiAnnotations]] = None, mask_threshold: float | None = 0.0, output_tile_size: Optional[tuple[int, int]] = None, annotations: Optional[Union[list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]] = None, labels: Optional[list[tuple[str, Union[str, bool, int, float]]]] = None, transform: Optional[Callable[[dlup.data.dataset.RegionFromWsiDatasetSample], dlup.data.dataset.RegionFromWsiDatasetSample]] = None, backend: Union[dlup.experimental_backends.ImageBackend, Type[dlup.backends.common.AbstractSlideBackend]] = ImageBackend.OPENSLIDE, **kwargs: Any)[source]#

Bases: dlup.data.dataset.BaseWsiDataset

Example dataset class that supports multiple ROIs.

This dataset can be used, for example, to tile your WSI on-the-fly using the from_standard_tiling function.

Examples

>>>  dlup_dataset = TiledWsiDataset.from_standard_tiling(            path='/path/to/TCGA-WSI.svs',            mpp=0.5,            tile_size=(512,512),            tile_overlap=(0,0),            tile_mode='skip',            crop=True,            mask=None,            mask_threshold=0.5,            annotations=None,            labels=[("msi", True),]
        transform=YourTransform()         )
>>> sample = dlup_dataset[5]
>>> image = sample["image']
Parameters
pathPathLike

Path to the image.

regionscollections.abc.Sequence[tuple[float, float, int, int, float]]

Sequence of rectangular regions as (x, y, h, w, mpp)

cropbool

Whether to crop overflowing tiles.

masknp.ndarray or SlideImage or WsiAnnotations

Binary mask used to filter each region together with a threshold.

mask_thresholdfloat, optional

Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.

output_tile_size: tuple[int, int], optional

If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.

annotationsWsiAnnotations

Annotation classes.

labelslist

Image-level labels. Will be added to each individual tile.

transform

Transforming function. To be used for augmentations or other model specific preprocessing.

backendImageBackend or AbstractSlideBackend

Backend to pass to SlideImage

apply_color_profilebool

Whether to apply the ICC profile to the image if available.

**kwargsAny

Passed to SlideImage

classmethod from_standard_tiling(path: pathlib.Path, mpp: float | None, tile_size: tuple[int, int], tile_overlap: tuple[int, int], output_tile_size: Optional[tuple[int, int]] = None, tile_mode: dlup.tiling.TilingMode = TilingMode.overflow, grid_order: dlup.tiling.GridOrder = GridOrder.C, crop: bool = False, mask: Optional[Union[dlup._image.SlideImage, numpy.ndarray[Any, numpy.dtype[numpy.int64]], dlup.annotations.WsiAnnotations]] = None, mask_threshold: float | None = 0.0, rois: Optional[list[tuple[tuple[int, int], tuple[int, int]]]] = None, annotations: Optional[Union[list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]] = None, labels: Optional[list[tuple[str, Union[str, bool, int, float]]]] = None, transform: Optional[Callable[[dlup.data.dataset.TileSample], dlup.data.dataset.RegionFromWsiDatasetSample]] = None, backend: Union[dlup.experimental_backends.ImageBackend, Type[dlup.backends.common.AbstractSlideBackend]] = ImageBackend.OPENSLIDE, limit_bounds: bool = True, **kwargs: Any) dlup.data.dataset.TiledWsiDataset[source]#

Function to be used to tile a WSI on-the-fly. Parameters ———- path :

path to a single WSI

mpp :

float stating the microns per pixel that you wish the tiles to be.

tile_size :

Tuple of integers that represent the pixel size of output tiles

tile_overlap :

Tuple of integers that represents the overlap of tiles in the x and y direction

output_tile_size: tuple[int, int], optional

If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.

tile_mode :

“skip” or “overflow”. see dlup.tiling.TilingMode for more information

grid_orderGridOrder

Run through the grid either in C order or Fortran order.

cropbool

If overflowing tiles should be cropped.

mask :

Binary mask used to filter each region together with a threshold.

mask_thresholdfloat, optional

Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.

rois :

Regions of interest to restrict the grids to. Coordinates should be given at level 0.

annotations :

Annotation class

labelslist

Image-level labels. Will be added to each individual tile.

transformCallable

Transform to be applied to the sample.

backendImageBackend

Backend to use to read the whole slide image.

limit_boundsbool

If the bounds of the grid should be limited to the bounds of the slide given in the slide_bounds property of the SlideImage class. If ROIs are given, this parameter is ignored.

**kwargs :

Gets passed to the SlideImage constructor.

Returns
Initialized SlideImageDataset with all the regions as computed using the given tile size, mpp, and so on.
Calling this dataset with an index will return a tile extracted straight from the WSI. This means tiling as
pre-processing step is not required.

Examples

See example of usage in the main class docstring

property grids: list[tuple[dlup.tiling.Grid, tuple[int, int], float]]#
dlup.data.dataset.parse_rois(rois: list[tuple[tuple[int, int], tuple[int, int]]] | None, image_size: tuple[int, int], scaling: float = 1.0) list[tuple[tuple[int, int], tuple[int, int]]][source]#

dlup.data.transforms module#

class dlup.data.transforms.ContainsPolygonToLabel(*, roi_name: str | None, label: str, threshold: float)[source]#

Bases: object

Transform which transforms annotations into a sample-level label whether the label is present above a threshold.

The area of the label within the ROI (if given) is first computed. If the proportion of this label in the image itself is above the threshold, the [“labels”][“has <label>”] is set to True, otherwise False.

Parameters
roi_namestr

Name of the ROI key.

labelstr

Which label to test.

thresholdfloat

Threshold as number between 0 and 1 that denotes when we should consider the label to be present.

class dlup.data.transforms.ConvertAnnotationsToMask(*, roi_name: str | None, index_map: dict[str, int], default_value: int = 0)[source]#

Bases: object

Transform which converts polygons to masks. Will overwrite the annotations key

Converts annotations given my dlup.annotations.Polygon or dlup.annotations.Point to a mask and a dictionary of points. The mask is initialized with default_value, (i.e., background). The values in the mask are subsequently determined by index_map, where each value is written to the mask according to this map, in the order of the elements in the annotations. This means that if you have overlapping polygons, the last polygon will overwrite the previous one. The sorting can be handled in the dlup.annotations.WsiAnnotation class.

In case there are no annotations present (i.e. the “annotations” key is None) a ValueError is raised.

Parameters
roi_namestr, optional

Name of the ROI key.

index_mapdict

Dictionary mapping the label to the integer in the output.

default_valueint

The mask will be initialized with this value.

class dlup.data.transforms.MajorityClassToLabel(*, roi_name: str | None, index_map: dict[str, int])[source]#

Bases: object

Transform which the majority class in the annotations to a label.

The function works as follows: - The total area for each label in the sample is computed, the label with the maximum area is determined. - The total area not covered by the ROI is computed. - If the area the roi doesn’t cover is larger than the label with the maximum area the image is masked on the ROI. - The label is added to the output dictionary in [“labels”][“majority_label”]

Parameters
roi_namestr

Name of the ROI key.

index_mapdict

Dictionary mapping the label to the integer in the output.

class dlup.data.transforms.RenameLabels(remap_labels: dict[str, str])[source]#

Bases: object

Remap the label names

Parameters
remap_labelsdict

Dictionary mapping old name to new name.

dlup.data.transforms.convert_annotations(annotations: Iterable[dlup.annotations.Point | dlup.annotations.Polygon], region_size: tuple[int, int], index_map: dict[str, int], roi_name: Optional[str] = None, default_value: int = 0) tuple[dict[str, list[tuple[float, float]]], dict[str, list[tuple[tuple[int, int], tuple[int, int]]]], numpy.ndarray[Any, numpy.dtype[numpy.int64]], numpy.ndarray[Any, numpy.dtype[numpy.int64]] | None][source]#

Convert the polygon and point annotations as output of a dlup dataset class, where: - In case of points the output is dictionary mapping the annotation name to a list of locations. - In case of bounding boxes the output is a dictionary mapping the annotation name to a list of bounding boxes.

Note that the internal representation of a bounding box is a polygon (AnnotationType is AnnotationType.BOX), so the bounding box of that polygon is computed to convert.

  • In case of polygons these are converted into a mask according to index_map.

BE AWARE: the polygon annotations are processed sequentially and later annotations can overwrite earlier ones. This is for instance useful when you would annotate “tumor associated stroma” on top of “stroma”. The dlup Annotation classes return the polygons with area from large to small.

When the polygon has holes, the previous written annotation is used to fill the holes.

BE AWARE: This function will silently ignore annotations which are written out of bounds.

Parameters
annotationsIterable[_AnnotationsTypes]

The annotations as a list, e.g., as output from dlup.annotations.WsiAnnotations.read_region().

region_sizetuple[int, int]
index_mapdict[str, int]

Map mapping annotation name to index number in the output.

roi_namestr

Name of the region-of-interest key.

default_valueint

The mask will be initialized with this value.

Returns
dict, np.ndarray, np.ndarray or None

Dictionary of points, mask and roi_mask.

dlup.data.transforms.rename_labels(annotations: Iterable[dlup.annotations.Point | dlup.annotations.Polygon], remap_labels: dict[str, str]) list[dlup.annotations.Point | dlup.annotations.Polygon][source]#

Rename the labels in the annotations.

Parameters
annotations: Iterable[_AnnotationsTypes]

The annotations

remap_labels: dict[str, str]

The renaming table

Returns
list[_AnnotationsTypes]

Module contents#