foundry.foundry_dataset
FoundryDataset
Representation of an individual dataset. Provides access to metadata as well as functions to instantiate data into memory in different formats.
Args:
dataset_name
(str): name of dataset (equivalent to source_id in MDF)
splits List[FoundrySplit]
: list of splits in the dataset
use_globus
(bool): if True, use Globus to download the data else try HTTPS
interval
(int): How often to wait before checking Globus transfer status
parallel_https
(int): Number of files to download in parallel if using HTTPS
verbose
(bool): Produce more debug messages to screen
Desired functions: - Get as pandas - Get as tensorflow dataset - Get as pytorch dataset - Get file list - Set metadata - Attach datafiles - Validate against schema - Get citation
__init__
clear_dataset_cache
Deletes the cached data for this specific datset
get_as_dict
Returns the data from the dataset as a dictionary
Arguments:
split
(string): Split to create dataset on.
**Default
: ** None
Returns: (dict) Dictionary of all the data from the specified split
get_as_tensorflow
Convert Foundry Dataset to a Tensorflow Sequence
Arguments:
split
(string): Split to create Tensorflow Sequence on.
**Default
: ** None
Returns: (TensorflowSequence) Tensorflow Sequence of all the data from the specified split
get_as_torch
Returns the data from the dataset as a TorchDataset
Arguments:
split
(string): Split to create PyTorch Dataset on.
**Default
: ** None
Returns: (TorchDataset) PyTorch Dataset of all the data from the specified split
get_citation
validate_metadata
Validate the JSON message against the FoundryDataset model
Arguments:
metadata
(dict): Metadata information provided by the user.
Raises:
ValidationError
: if metadata supplied by user does not meet the specificiation of a FoundryDataset object.
This file was automatically generated via lazydocs.