foundry.foundry_dataset
module foundry.foundry_dataset
foundry.foundry_datasetclass FoundryDataset
FoundryDatasetRepresentation of an individual dataset. Provides access to metadata as well as functions to instantiate data into memory in different formats.
Args:
dataset_name(str): name of dataset (equivalent to source_id in MDF)splits List[FoundrySplit]: list of splits in the datasetuse_globus(bool): if True, use Globus to download the data else try HTTPSinterval(int): How often to wait before checking Globus transfer statusparallel_https(int): Number of files to download in parallel if using HTTPSverbose(bool): Produce more debug messages to screen
Desired functions: - Get as pandas - Get as tensorflow dataset - Get as pytorch dataset - Get file list - Set metadata - Attach datafiles - Validate against schema - Get citation
method __init__
__init____init__(
dataset_name: str,
datacite_entry: dict,
transfer_client: Any,
foundry_schema: FoundrySchema,
use_globus: bool = False,
interval: int = 10,
parallel_https: int = 4,
verbose: bool = False,
forge_client: Forge = None,
local_cache_dir: str = None
)method clear_dataset_cache
clear_dataset_cacheclear_dataset_cache()Deletes the cached data for this specific datset
method get_as_dict
get_as_dictget_as_dict(split: str = None, as_hdf5: bool = False)Returns the data from the dataset as a dictionary
Arguments:
split(string): Split to create dataset on.**Default: **None
Returns: (dict) Dictionary of all the data from the specified split
method get_as_tensorflow
get_as_tensorflowget_as_tensorflow(split: str = None)Convert Foundry Dataset to a Tensorflow Sequence
Arguments:
split(string): Split to create Tensorflow Sequence on.**Default: **None
Returns: (TensorflowSequence) Tensorflow Sequence of all the data from the specified split
method get_as_torch
get_as_torchget_as_torch(split: str = None)Returns the data from the dataset as a TorchDataset
Arguments:
split(string): Split to create PyTorch Dataset on.**Default: **None
Returns: (TorchDataset) PyTorch Dataset of all the data from the specified split
method get_citation
get_citationget_citation() → strmethod validate_metadata
validate_metadatavalidate_metadata(metadata)Validate the JSON message against the FoundryDataset model
Arguments:
metadata(dict): Metadata information provided by the user.
Raises:
ValidationError: if metadata supplied by user does not meet the specificiation of a FoundryDataset object.
This file was automatically generated via lazydocs.