dlup.data package
Contents
dlup.data package#
Submodules#
dlup.data.dataset module#
Datasets helpers to simplify the generation of a dataset made of tiles from a WSI. Dataset and ConcatDataset are taken from pytorch 1.8.0 under BSD license.
- class dlup.data.dataset.AnnotationData[source]#
Bases:
dict
- boxes: dict[str, list[tuple[tuple[int, int], tuple[int, int]]]]#
- mask: numpy.ndarray[Any, numpy.dtype[numpy.int64]]#
- points: dict[str, list[tuple[float, float]]]#
- roi: Optional[numpy.ndarray[Any, numpy.dtype[numpy.int64]]]#
- class dlup.data.dataset.BaseWsiDataset(path: Union[str, pathlib.Path], regions: collections.abc.Sequence[tuple[float, float, int, int, float]], crop: bool = False, mask: Optional[Union[dlup._image.SlideImage, numpy.ndarray[Any, numpy.dtype[numpy.int64]], dlup.annotations.WsiAnnotations]] = None, mask_threshold: float | None = 0.0, output_tile_size: Optional[tuple[int, int]] = None, annotations: Optional[Union[list[Union[list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]], list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]] = None, labels: Optional[list[tuple[str, Union[str, bool, int, float]]]] = None, backend: Union[dlup.experimental_backends.ImageBackend, Type[dlup.backends.common.AbstractSlideBackend]] = ImageBackend.OPENSLIDE, apply_color_profile: bool = False, **kwargs: Any)[source]#
Bases:
dlup.data.dataset.Dataset
[Union
[dlup.data.dataset.TileSample
,Sequence
[dlup.data.dataset.TileSample
]]]Generic
Dataset
to iterate over regions of a :class:`SlideImage`class.This class features some logic to avoid instantiating too many slides which for very large datasets can cause expensive allocation due to internal caching of the image reading backand.
This class is the superclass of
TiledWsiDataset
, which has a function, from_standard_tiling, to compute all the regions for specified tiling parameters on the fly.- Parameters
- pathPathLike
Path to the image.
- regionscollections.abc.Sequence[tuple[float, float, int, int, float]]
Sequence of rectangular regions as (x, y, h, w, mpp)
- cropbool
Whether to crop overflowing tiles.
- masknp.ndarray or SlideImage or WsiAnnotations
Binary mask used to filter each region together with a threshold.
- mask_thresholdfloat, optional
Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.
- output_tile_size: tuple[int, int], optional
If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.
- annotationsWsiAnnotations
Annotation classes.
- labelslist
Image-level labels. Will be added to each individual tile.
- transform
Transforming function. To be used for augmentations or other model specific preprocessing.
- backendImageBackend or AbstractSlideBackend
Backend to pass to SlideImage
- apply_color_profilebool
Whether to apply the ICC profile to the image if available.
- **kwargsAny
Passed to SlideImage
- property crop: bool#
Returns true the regions will be cropped at the boundaries.
- property path: Union[str, pathlib.Path]#
Path of whole slide image
- property slide_image: dlup._image.SlideImage#
Return the cached slide image instance associated with this dataset.
- class dlup.data.dataset.ConcatDataset(datasets: Iterable[dlup.data.dataset.Dataset[dlup.data.dataset.T_co]])[source]#
Bases:
dlup.data.dataset.Dataset
[dlup.data.dataset.T_co
]Dataset as a concatenation of multiple datasets.
This class is useful to assemble different existing datasets.
- Parameters
- datasetssequence
List of datasets to be concatenated
Notes
Taken and adapted from pytorch 1.8.0 torch.utils.data.Dataset under BSD license.
- cumulative_sizes: list[int]#
- datasets: list[dlup.data.dataset.Dataset[+T_co]]#
- index_to_dataset(idx: int) tuple[dlup.data.dataset.Dataset[+T_co], int] [source]#
Returns the dataset and the index of the sample in the dataset.
- Parameters
- idxint
Index of the sample in the concatenated dataset.
- Returns
- tuple[Dataset, int]
Dataset and index of the sample in the dataset.
- wsi_indices: dict[str, range]#
- class dlup.data.dataset.Dataset[source]#
Bases:
Generic
[dlup.data.dataset.T_co
],collections.abc.Sequence
[dlup.data.dataset.T_co
]An abstract class representing a
Dataset
.All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite
__getitem__()
, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite__len__()
, which is expected to return the size of the dataset by manySampler
implementations and the default options ofDataLoader
.Notes
Taken and adapted from pytorch 1.8.0 torch.utils.data.Dataset under BSD license.
DataLoader
by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
- class dlup.data.dataset.RegionFromWsiDatasetSample[source]#
Bases:
dict
- annotations: Optional[Iterable[dlup.annotations.Point | dlup.annotations.Polygon]]#
- coordinates: tuple[int | float, int | float]#
- grid_index: int#
- grid_local_coordinates: tuple[int, int]#
- image: PIL.Image.Image#
- labels: dict[str, Any] | None#
- mpp: float#
- path: Union[str, pathlib.Path]#
- region_index: int#
- class dlup.data.dataset.TileSample[source]#
Bases:
TypedDict
- annotations: Optional[Iterable[dlup.annotations.Point | dlup.annotations.Polygon]]#
- coordinates: tuple[int | float, int | float]#
- image: PIL.Image.Image#
- labels: dict[str, Any] | None#
- mpp: float#
- path: Union[str, pathlib.Path]#
- region_index: int#
- class dlup.data.dataset.TileSampleWithAnnotationData[source]#
Bases:
TypedDict
- annotation_data: dlup.data.dataset.AnnotationData#
- class dlup.data.dataset.TiledROIsSlideImageDataset(*args: Any, **kwargs: Any)[source]#
Bases:
dlup.data.dataset.TiledWsiDataset
- Parameters
- pathPathLike
Path to the image.
- regionscollections.abc.Sequence[tuple[float, float, int, int, float]]
Sequence of rectangular regions as (x, y, h, w, mpp)
- cropbool
Whether to crop overflowing tiles.
- masknp.ndarray or SlideImage or WsiAnnotations
Binary mask used to filter each region together with a threshold.
- mask_thresholdfloat, optional
Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.
- output_tile_size: tuple[int, int], optional
If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.
- annotationsWsiAnnotations
Annotation classes.
- labelslist
Image-level labels. Will be added to each individual tile.
- transform
Transforming function. To be used for augmentations or other model specific preprocessing.
- backendImageBackend or AbstractSlideBackend
Backend to pass to SlideImage
- apply_color_profilebool
Whether to apply the ICC profile to the image if available.
- **kwargsAny
Passed to SlideImage
- class dlup.data.dataset.TiledWsiDataset(path: Union[str, pathlib.Path], grids: list[tuple[dlup.tiling.Grid, tuple[int, int], float]], crop: bool = False, mask: Optional[Union[dlup._image.SlideImage, numpy.ndarray[Any, numpy.dtype[numpy.int64]], dlup.annotations.WsiAnnotations]] = None, mask_threshold: float | None = 0.0, output_tile_size: Optional[tuple[int, int]] = None, annotations: Optional[Union[list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]] = None, labels: Optional[list[tuple[str, Union[str, bool, int, float]]]] = None, transform: Optional[Callable[[dlup.data.dataset.RegionFromWsiDatasetSample], dlup.data.dataset.RegionFromWsiDatasetSample]] = None, backend: Union[dlup.experimental_backends.ImageBackend, Type[dlup.backends.common.AbstractSlideBackend]] = ImageBackend.OPENSLIDE, **kwargs: Any)[source]#
Bases:
dlup.data.dataset.BaseWsiDataset
Example dataset class that supports multiple ROIs.
This dataset can be used, for example, to tile your WSI on-the-fly using the from_standard_tiling function.
Examples
>>> dlup_dataset = TiledWsiDataset.from_standard_tiling( path='/path/to/TCGA-WSI.svs', mpp=0.5, tile_size=(512,512), tile_overlap=(0,0), tile_mode='skip', crop=True, mask=None, mask_threshold=0.5, annotations=None, labels=[("msi", True),] transform=YourTransform() ) >>> sample = dlup_dataset[5] >>> image = sample["image']
- Parameters
- pathPathLike
Path to the image.
- regionscollections.abc.Sequence[tuple[float, float, int, int, float]]
Sequence of rectangular regions as (x, y, h, w, mpp)
- cropbool
Whether to crop overflowing tiles.
- masknp.ndarray or SlideImage or WsiAnnotations
Binary mask used to filter each region together with a threshold.
- mask_thresholdfloat, optional
Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.
- output_tile_size: tuple[int, int], optional
If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.
- annotationsWsiAnnotations
Annotation classes.
- labelslist
Image-level labels. Will be added to each individual tile.
- transform
Transforming function. To be used for augmentations or other model specific preprocessing.
- backendImageBackend or AbstractSlideBackend
Backend to pass to SlideImage
- apply_color_profilebool
Whether to apply the ICC profile to the image if available.
- **kwargsAny
Passed to SlideImage
- classmethod from_standard_tiling(path: pathlib.Path, mpp: float | None, tile_size: tuple[int, int], tile_overlap: tuple[int, int], output_tile_size: Optional[tuple[int, int]] = None, tile_mode: dlup.tiling.TilingMode = TilingMode.overflow, grid_order: dlup.tiling.GridOrder = GridOrder.C, crop: bool = False, mask: Optional[Union[dlup._image.SlideImage, numpy.ndarray[Any, numpy.dtype[numpy.int64]], dlup.annotations.WsiAnnotations]] = None, mask_threshold: float | None = 0.0, rois: Optional[list[tuple[tuple[int, int], tuple[int, int]]]] = None, annotations: Optional[Union[list[tuple[str, Union[dlup._image.SlideImage, dlup.annotations.WsiAnnotations]]], dlup._image.SlideImage, dlup.annotations.WsiAnnotations]] = None, labels: Optional[list[tuple[str, Union[str, bool, int, float]]]] = None, transform: Optional[Callable[[dlup.data.dataset.TileSample], dlup.data.dataset.RegionFromWsiDatasetSample]] = None, backend: Union[dlup.experimental_backends.ImageBackend, Type[dlup.backends.common.AbstractSlideBackend]] = ImageBackend.OPENSLIDE, limit_bounds: bool = True, **kwargs: Any) dlup.data.dataset.TiledWsiDataset [source]#
Function to be used to tile a WSI on-the-fly. Parameters ———- path :
path to a single WSI
- mpp :
float stating the microns per pixel that you wish the tiles to be.
- tile_size :
Tuple of integers that represent the pixel size of output tiles
- tile_overlap :
Tuple of integers that represents the overlap of tiles in the x and y direction
- output_tile_size: tuple[int, int], optional
If this value is set, this value will be used as the tile size of the output tiles. If this value is different from the underlying grid, this tile will be extracted around the center of the region.
- tile_mode :
“skip” or “overflow”. see dlup.tiling.TilingMode for more information
- grid_orderGridOrder
Run through the grid either in C order or Fortran order.
- cropbool
If overflowing tiles should be cropped.
- mask :
Binary mask used to filter each region together with a threshold.
- mask_thresholdfloat, optional
Threshold to check against. The foreground percentage should be strictly larger than threshold. If None anything is foreground. If 1, the region must be completely foreground. Other values are in between, for instance if 0.5, the region must be at least 50% foreground.
- rois :
Regions of interest to restrict the grids to. Coordinates should be given at level 0.
- annotations :
Annotation class
- labelslist
Image-level labels. Will be added to each individual tile.
- transformCallable
Transform to be applied to the sample.
- backendImageBackend
Backend to use to read the whole slide image.
- limit_boundsbool
If the bounds of the grid should be limited to the bounds of the slide given in the slide_bounds property of the SlideImage class. If ROIs are given, this parameter is ignored.
- **kwargs :
Gets passed to the SlideImage constructor.
- Returns
- Initialized SlideImageDataset with all the regions as computed using the given tile size, mpp, and so on.
- Calling this dataset with an index will return a tile extracted straight from the WSI. This means tiling as
- pre-processing step is not required.
Examples
See example of usage in the main class docstring
- property grids: list[tuple[dlup.tiling.Grid, tuple[int, int], float]]#
dlup.data.transforms module#
- class dlup.data.transforms.ContainsPolygonToLabel(*, roi_name: str | None, label: str, threshold: float)[source]#
Bases:
object
Transform which transforms annotations into a sample-level label whether the label is present above a threshold.
The area of the label within the ROI (if given) is first computed. If the proportion of this label in the image itself is above the threshold, the [“labels”][“has <label>”] is set to True, otherwise False.
- Parameters
- roi_namestr
Name of the ROI key.
- labelstr
Which label to test.
- thresholdfloat
Threshold as number between 0 and 1 that denotes when we should consider the label to be present.
- class dlup.data.transforms.ConvertAnnotationsToMask(*, roi_name: str | None, index_map: dict[str, int], default_value: int = 0)[source]#
Bases:
object
Transform which converts polygons to masks. Will overwrite the annotations key
Converts annotations given my dlup.annotations.Polygon or dlup.annotations.Point to a mask and a dictionary of points. The mask is initialized with default_value, (i.e., background). The values in the mask are subsequently determined by index_map, where each value is written to the mask according to this map, in the order of the elements in the annotations. This means that if you have overlapping polygons, the last polygon will overwrite the previous one. The sorting can be handled in the dlup.annotations.WsiAnnotation class.
In case there are no annotations present (i.e. the “annotations” key is None) a ValueError is raised.
- Parameters
- roi_namestr, optional
Name of the ROI key.
- index_mapdict
Dictionary mapping the label to the integer in the output.
- default_valueint
The mask will be initialized with this value.
- class dlup.data.transforms.MajorityClassToLabel(*, roi_name: str | None, index_map: dict[str, int])[source]#
Bases:
object
Transform which the majority class in the annotations to a label.
The function works as follows: - The total area for each label in the sample is computed, the label with the maximum area is determined. - The total area not covered by the ROI is computed. - If the area the roi doesn’t cover is larger than the label with the maximum area the image is masked on the ROI. - The label is added to the output dictionary in [“labels”][“majority_label”]
- Parameters
- roi_namestr
Name of the ROI key.
- index_mapdict
Dictionary mapping the label to the integer in the output.
- class dlup.data.transforms.RenameLabels(remap_labels: dict[str, str])[source]#
Bases:
object
Remap the label names
- Parameters
- remap_labelsdict
Dictionary mapping old name to new name.
- dlup.data.transforms.convert_annotations(annotations: Iterable[dlup.annotations.Point | dlup.annotations.Polygon], region_size: tuple[int, int], index_map: dict[str, int], roi_name: Optional[str] = None, default_value: int = 0) tuple[dict[str, list[tuple[float, float]]], dict[str, list[tuple[tuple[int, int], tuple[int, int]]]], numpy.ndarray[Any, numpy.dtype[numpy.int64]], numpy.ndarray[Any, numpy.dtype[numpy.int64]] | None] [source]#
Convert the polygon and point annotations as output of a dlup dataset class, where: - In case of points the output is dictionary mapping the annotation name to a list of locations. - In case of bounding boxes the output is a dictionary mapping the annotation name to a list of bounding boxes.
Note that the internal representation of a bounding box is a polygon (AnnotationType is AnnotationType.BOX), so the bounding box of that polygon is computed to convert.
In case of polygons these are converted into a mask according to index_map.
BE AWARE: the polygon annotations are processed sequentially and later annotations can overwrite earlier ones. This is for instance useful when you would annotate “tumor associated stroma” on top of “stroma”. The dlup Annotation classes return the polygons with area from large to small.
When the polygon has holes, the previous written annotation is used to fill the holes.
BE AWARE: This function will silently ignore annotations which are written out of bounds.
- Parameters
- annotationsIterable[_AnnotationsTypes]
The annotations as a list, e.g., as output from dlup.annotations.WsiAnnotations.read_region().
- region_sizetuple[int, int]
- index_mapdict[str, int]
Map mapping annotation name to index number in the output.
- roi_namestr
Name of the region-of-interest key.
- default_valueint
The mask will be initialized with this value.
- Returns
- dict, np.ndarray, np.ndarray or None
Dictionary of points, mask and roi_mask.
- dlup.data.transforms.rename_labels(annotations: Iterable[dlup.annotations.Point | dlup.annotations.Polygon], remap_labels: dict[str, str]) list[dlup.annotations.Point | dlup.annotations.Polygon] [source]#
Rename the labels in the annotations.
- Parameters
- annotations: Iterable[_AnnotationsTypes]
The annotations
- remap_labels: dict[str, str]
The renaming table
- Returns
- list[_AnnotationsTypes]