mnist_dataset
HDF5-backed MNIST Dataset for PyTorch.
This implements a lightweight dataset that lazily opens an HDF5 file and returns (image_tensor, label) pairs. Images are returned as uint8 tensors scaled to [0,1] if a transform isn’t provided.
Classes
Module Contents
- class mnist_dataset.MnistH5Dataset(h5_path: str | pathlib.Path, transform: Callable | None = None, target_transform: Callable | None = None)
Bases:
torch.utils.data.Dataset- h5_path
- transform = None
- target_transform = None
- _h5 = None
- _ensure_open()
- __len__()
Return the number of samples in the dataset.
This reads the
labelsdataset’s first dimension to determine the length and therefore triggers lazy opening of the HDF5 file.
- __getitem__(idx)
Return the (image, label) pair for the given index.
Parameters
- idxint
Index of the requested sample.
Returns
- (torch.Tensor, int)
A tuple of (image_tensor, label). The image tensor is a float tensor scaled to [0,1] with shape (C, H, W). Label is an int and may have been transformed by
target_transformif provided.
Raises
- IndexError
If the index is out of bounds for the dataset.
- close()
Close the underlying HDF5 file handle if open.
This is safe to call multiple times. It is also invoked from
__del__to ensure resources are released when the dataset is garbage-collected.
- __del__()