dlup.data package#

Submodules#

dlup.data.dataset module#

Datasets helpers to simplify the generation of a dataset made of tiles from a WSI. Dataset and ConcatDataset are taken from pytorch 1.8.0 under BSD license.

class dlup.data.dataset.AnnotationData[source]#

Bases: dict

boxes: dict[str, list[tuple[tuple[int, int], tuple[int, int]]]]#

mask: numpy.ndarray[Any, numpy.dtype[numpy.int64]]#

points: dict[str, list[tuple[float, float]]]#

roi: Optional[numpy.ndarray[Any, numpy.dtype[numpy.int64]]]#

class dlup.data.dataset.BaseWsiDataset(path: Union[str, pathlib.Path], regions: collections.abc.Sequence[tuple[float, float, int, int, float]], crop: bool = False, mask: Optional[Union[dlup._image.SlideImage, numpy.ndarray[Any, numpy.dtype[numpy.int64]], dlup.annotations.WsiAnnotations]] = None, mask_threshold: float | None = 0.0, output_tile_size: Optional[tuple[int, int]] = None, annotations: Optional[Union[list[Union[list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]], list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]] = None, labels: Optional[list[tuple[str, Union[str, bool, int, float]]]] = None, backend: Union[dlup.experimental_backends.ImageBackend, Type[dlup.backends.common.AbstractSlideBackend]] = ImageBackend.OPENSLIDE, apply_color_profile: bool = False, **kwargs: Any)[source]#

Bases: dlup.data.dataset.Dataset[Union[dlup.data.dataset.TileSample, Sequence[dlup.data.dataset.TileSample]]]

Generic Dataset to iterate over regions of a :class:`SlideImage`class.

This class features some logic to avoid instantiating too many slides which for very large datasets can cause expensive allocation due to internal caching of the image reading backand.

This class is the superclass of TiledWsiDataset, which has a function, from_standard_tiling, to compute all the regions for specified tiling parameters on the fly.

Parameters

pathPathLike: Path to the image.
regionscollections.abc.Sequence[tuple[float, float, int, int, float]]: Sequence of rectangular regions as (x, y, h, w, mpp)
cropbool: Whether to crop overflowing tiles.
masknp.ndarray or SlideImage or WsiAnnotations: Binary mask used to filter each region together with a threshold.
mask_thresholdfloat, optional: Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.
output_tile_size: tuple[int, int], optional: If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.
annotationsWsiAnnotations: Annotation classes.
labelslist: Image-level labels. Will be added to each individual tile.
transform: Transforming function. To be used for augmentations or other model specific preprocessing.
backendImageBackend or AbstractSlideBackend: Backend to pass to SlideImage
apply_color_profilebool: Whether to apply the ICC profile to the image if available.
**kwargsAny: Passed to SlideImage

property crop: bool#: Returns true the regions will be cropped at the boundaries.

property path: Union[str, pathlib.Path]#: Path of whole slide image

property slide_image: dlup._image.SlideImage#: Return the cached slide image instance associated with this dataset.

class dlup.data.dataset.ConcatDataset(datasets: Iterable[dlup.data.dataset.Dataset[dlup.data.dataset.T_co]])[source]#

Bases: dlup.data.dataset.Dataset[dlup.data.dataset.T_co]

Dataset as a concatenation of multiple datasets.

This class is useful to assemble different existing datasets.

Parameters

datasetssequence: List of datasets to be concatenated

Notes

Taken and adapted from pytorch 1.8.0 torch.utils.data.Dataset under BSD license.

static cumsum(sequence: list[dlup.data.dataset.Dataset[+ T_co]]) → list[int][source]#

cumulative_sizes: list[int]#

datasets: list[dlup.data.dataset.Dataset[+T_co]]#

index_to_dataset(idx: int) → tuple[dlup.data.dataset.Dataset[+T_co], int][source]#

Returns the dataset and the index of the sample in the dataset.

Parameters

idxint: Index of the sample in the concatenated dataset.

Returns

tuple[Dataset, int]: Dataset and index of the sample in the dataset.

wsi_indices: dict[str, range]#

class dlup.data.dataset.Dataset[source]#

Bases: Generic[dlup.data.dataset.T_co], collections.abc.Sequence[dlup.data.dataset.T_co]

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader.

Notes

Taken and adapted from pytorch 1.8.0 torch.utils.data.Dataset under BSD license. DataLoader by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

class dlup.data.dataset.RegionFromWsiDatasetSample[source]#

Bases: dict

annotations: Optional[Iterable[dlup.annotations.Point | dlup.annotations.Polygon]]#

coordinates: tuple[int | float, int | float]#

grid_index: int#

grid_local_coordinates: tuple[int, int]#

image: PIL.Image.Image#

labels: dict[str, Any] | None#

mpp: float#

path: Union[str, pathlib.Path]#

region_index: int#

class dlup.data.dataset.TileSample[source]#

Bases: TypedDict

annotations: Optional[Iterable[dlup.annotations.Point | dlup.annotations.Polygon]]#

coordinates: tuple[int | float, int | float]#

image: PIL.Image.Image#

labels: dict[str, Any] | None#

mpp: float#

path: Union[str, pathlib.Path]#

region_index: int#

class dlup.data.dataset.TileSampleWithAnnotationData[source]#

Bases: TypedDict

annotation_data: dlup.data.dataset.AnnotationData#

class dlup.data.dataset.TiledROIsSlideImageDataset(*args: Any, **kwargs: Any)[source]#

Bases: dlup.data.dataset.TiledWsiDataset

Parameters

pathPathLike: Path to the image.
regionscollections.abc.Sequence[tuple[float, float, int, int, float]]: Sequence of rectangular regions as (x, y, h, w, mpp)
cropbool: Whether to crop overflowing tiles.
masknp.ndarray or SlideImage or WsiAnnotations: Binary mask used to filter each region together with a threshold.
mask_thresholdfloat, optional: Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.
output_tile_size: tuple[int, int], optional: If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.
annotationsWsiAnnotations: Annotation classes.
labelslist: Image-level labels. Will be added to each individual tile.
transform: Transforming function. To be used for augmentations or other model specific preprocessing.
backendImageBackend or AbstractSlideBackend: Backend to pass to SlideImage
apply_color_profilebool: Whether to apply the ICC profile to the image if available.
**kwargsAny: Passed to SlideImage

class dlup.data.dataset.TiledWsiDataset(path: Union[str, pathlib.Path], grids: list[tuple[dlup.tiling.Grid, tuple[int, int], float]], crop: bool = False, mask: Optional[Union[dlup._image.SlideImage, numpy.ndarray[Any, numpy.dtype[numpy.int64]], dlup.annotations.WsiAnnotations]] = None, mask_threshold: float | None = 0.0, output_tile_size: Optional[tuple[int, int]] = None, annotations: Optional[Union[list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]] = None, labels: Optional[list[tuple[str, Union[str, bool, int, float]]]] = None, transform: Optional[Callable[[dlup.data.dataset.RegionFromWsiDatasetSample], dlup.data.dataset.RegionFromWsiDatasetSample]] = None, backend: Union[dlup.experimental_backends.ImageBackend, Type[dlup.backends.common.AbstractSlideBackend]] = ImageBackend.OPENSLIDE, **kwargs: Any)[source]#

Bases: dlup.data.dataset.BaseWsiDataset

Example dataset class that supports multiple ROIs.

This dataset can be used, for example, to tile your WSI on-the-fly using the from_standard_tiling function.

Examples

>>>  dlup_dataset = TiledWsiDataset.from_standard_tiling(            path='/path/to/TCGA-WSI.svs',            mpp=0.5,            tile_size=(512,512),            tile_overlap=(0,0),            tile_mode='skip',            crop=True,            mask=None,            mask_threshold=0.5,            annotations=None,            labels=[("msi", True),]
        transform=YourTransform()         )
>>> sample = dlup_dataset[5]
>>> image = sample["image']

Parameters

pathPathLike: Path to the image.
regionscollections.abc.Sequence[tuple[float, float, int, int, float]]: Sequence of rectangular regions as (x, y, h, w, mpp)
cropbool: Whether to crop overflowing tiles.
masknp.ndarray or SlideImage or WsiAnnotations: Binary mask used to filter each region together with a threshold.
mask_thresholdfloat, optional: Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.
output_tile_size: tuple[int, int], optional: If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.
annotationsWsiAnnotations: Annotation classes.
labelslist: Image-level labels. Will be added to each individual tile.
transform: Transforming function. To be used for augmentations or other model specific preprocessing.
backendImageBackend or AbstractSlideBackend: Backend to pass to SlideImage
apply_color_profilebool: Whether to apply the ICC profile to the image if available.
**kwargsAny: Passed to SlideImage

classmethod from_standard_tiling(path: pathlib.Path, mpp: float | None, tile_size: tuple[int, int], tile_overlap: tuple[int, int], output_tile_size: Optional[tuple[int, int]] = None, tile_mode: dlup.tiling.TilingMode = TilingMode.overflow, grid_order: dlup.tiling.GridOrder = GridOrder.C, crop: bool = False, mask: Optional[Union[dlup._image.SlideImage, numpy.ndarray[Any, numpy.dtype[numpy.int64]], dlup.annotations.WsiAnnotations]] = None, mask_threshold: float | None = 0.0, rois: Optional[list[tuple[tuple[int, int], tuple[int, int]]]] = None, annotations: Optional[Union[list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]] = None, labels: Optional[list[tuple[str, Union[str, bool, int, float]]]] = None, transform: Optional[Callable[[dlup.data.dataset.TileSample], dlup.data.dataset.RegionFromWsiDatasetSample]] = None, backend: Union[dlup.experimental_backends.ImageBackend, Type[dlup.backends.common.AbstractSlideBackend]] = ImageBackend.OPENSLIDE, limit_bounds: bool = True, **kwargs: Any) → dlup.data.dataset.TiledWsiDataset[source]#

Function to be used to tile a WSI on-the-fly. Parameters ———- path :

path to a single WSI

mpp :: float stating the microns per pixel that you wish the tiles to be.
tile_size :: Tuple of integers that represent the pixel size of output tiles
tile_overlap :: Tuple of integers that represents the overlap of tiles in the x and y direction
output_tile_size: tuple[int, int], optional: If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.
tile_mode :: “skip” or “overflow”. see dlup.tiling.TilingMode for more information
grid_orderGridOrder: Run through the grid either in C order or Fortran order.
cropbool: If overflowing tiles should be cropped.
mask :: Binary mask used to filter each region together with a threshold.
mask_thresholdfloat, optional: Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.
rois :: Regions of interest to restrict the grids to. Coordinates should be given at level 0.
annotations :: Annotation class
labelslist: Image-level labels. Will be added to each individual tile.
transformCallable: Transform to be applied to the sample.
backendImageBackend: Backend to use to read the whole slide image.
limit_boundsbool: If the bounds of the grid should be limited to the bounds of the slide given in the slide_bounds property of the SlideImage class. If ROIs are given, this parameter is ignored.
**kwargs :: Gets passed to the SlideImage constructor.

Returns

Initialized SlideImageDataset with all the regions as computed using the given tile size, mpp, and so on.
Calling this dataset with an index will return a tile extracted straight from the WSI. This means tiling as
pre-processing step is not required.

Examples

See example of usage in the main class docstring

property grids: list[tuple[dlup.tiling.Grid, tuple[int, int], float]]#

dlup.data.dataset.parse_rois(rois: list[tuple[tuple[int, int], tuple[int, int]]] | None, image_size: tuple[int, int], scaling: float = 1.0) → list[tuple[tuple[int, int], tuple[int, int]]][source]#

dlup.data.transforms module#

class dlup.data.transforms.ContainsPolygonToLabel(*, roi_name: str | None, label: str, threshold: float)[source]#

Bases: object

Transform which transforms annotations into a sample-level label whether the label is present above a threshold.

The area of the label within the ROI (if given) is first computed. If the proportion of this label in the image itself is above the threshold, the [“labels”][“has <label>”] is set to True, otherwise False.

Parameters

roi_namestr: Name of the ROI key.
labelstr: Which label to test.
thresholdfloat: Threshold as number between 0 and 1 that denotes when we should consider the label to be present.

class dlup.data.transforms.ConvertAnnotationsToMask(*, roi_name: str | None, index_map: dict[str, int], default_value: int = 0)[source]#

Bases: object

Transform which converts polygons to masks. Will overwrite the annotations key

Converts annotations given my dlup.annotations.Polygon or dlup.annotations.Point to a mask and a dictionary of points. The mask is initialized with default_value, (i.e., background). The values in the mask are subsequently determined by index_map, where each value is written to the mask according to this map, in the order of the elements in the annotations. This means that if you have overlapping polygons, the last polygon will overwrite the previous one. The sorting can be handled in the dlup.annotations.WsiAnnotation class.

In case there are no annotations present (i.e. the “annotations” key is None) a ValueError is raised.

Parameters

roi_namestr, optional: Name of the ROI key.
index_mapdict: Dictionary mapping the label to the integer in the output.
default_valueint: The mask will be initialized with this value.

class dlup.data.transforms.MajorityClassToLabel(*, roi_name: str | None, index_map: dict[str, int])[source]#

Bases: object

Transform which the majority class in the annotations to a label.

The function works as follows: - The total area for each label in the sample is computed, the label with the maximum area is determined. - The total area not covered by the ROI is computed. - If the area the roi doesn’t cover is larger than the label with the maximum area the image is masked on the ROI. - The label is added to the output dictionary in [“labels”][“majority_label”]

Parameters

roi_namestr: Name of the ROI key.
index_mapdict: Dictionary mapping the label to the integer in the output.

class dlup.data.transforms.RenameLabels(remap_labels: dict[str, str])[source]#

Bases: object

Remap the label names

Parameters

remap_labelsdict: Dictionary mapping old name to new name.

dlup.data.transforms.convert_annotations(annotations: Iterable[dlup.annotations.Point | dlup.annotations.Polygon], region_size: tuple[int, int], index_map: dict[str, int], roi_name: Optional[str] = None, default_value: int = 0) → tuple[dict[str, list[tuple[float, float]]], dict[str, list[tuple[tuple[int, int], tuple[int, int]]]], numpy.ndarray[Any, numpy.dtype[numpy.int64]], numpy.ndarray[Any, numpy.dtype[numpy.int64]] | None][source]#

Convert the polygon and point annotations as output of a dlup dataset class, where: - In case of points the output is dictionary mapping the annotation name to a list of locations. - In case of bounding boxes the output is a dictionary mapping the annotation name to a list of bounding boxes.

Note that the internal representation of a bounding box is a polygon (AnnotationType is AnnotationType.BOX), so the bounding box of that polygon is computed to convert.

In case of polygons these are converted into a mask according to index_map.

BE AWARE: the polygon annotations are processed sequentially and later annotations can overwrite earlier ones. This is for instance useful when you would annotate “tumor associated stroma” on top of “stroma”. The dlup Annotation classes return the polygons with area from large to small.

When the polygon has holes, the previous written annotation is used to fill the holes.

BE AWARE: This function will silently ignore annotations which are written out of bounds.

Parameters

annotationsIterable[_AnnotationsTypes]: The annotations as a list, e.g., as output from dlup.annotations.WsiAnnotations.read_region().
region_sizetuple[int, int]
index_mapdict[str, int]: Map mapping annotation name to index number in the output.
roi_namestr: Name of the region-of-interest key.
default_valueint: The mask will be initialized with this value.

Returns

dict, np.ndarray, np.ndarray or None: Dictionary of points, mask and roi_mask.

dlup.data.transforms.rename_labels(annotations: Iterable[dlup.annotations.Point | dlup.annotations.Polygon], remap_labels: dict[str, str]) → list[dlup.annotations.Point | dlup.annotations.Polygon][source]#

Rename the labels in the annotations.

Parameters

annotations: Iterable[_AnnotationsTypes]: The annotations
remap_labels: dict[str, str]: The renaming table

Returns

list[_AnnotationsTypes]

DLUP 0.3.38 documentation

dlup.data package

Contents

dlup.data package#

Submodules#

dlup.data.dataset module#

dlup.data.transforms module#

Module contents#