foundry.foundry_dataset
module foundry.foundry_dataset
foundry.foundry_dataset
class FoundryDataset
FoundryDataset
Representation of an individual dataset. Provides access to metadata as well as functions to instantiate data into memory in different formats.
Args:
dataset_name
(str): name of dataset (equivalent to source_id in MDF)splits List[FoundrySplit]
: list of splits in the datasetuse_globus
(bool): if True, use Globus to download the data else try HTTPSinterval
(int): How often to wait before checking Globus transfer statusparallel_https
(int): Number of files to download in parallel if using HTTPSverbose
(bool): Produce more debug messages to screen
Desired functions: - Get as pandas - Get as tensorflow dataset - Get as pytorch dataset - Get file list - Set metadata - Attach datafiles - Validate against schema - Get citation
method __init__
__init__
__init__(
dataset_name: str,
datacite_entry: dict,
transfer_client: Any,
foundry_schema: FoundrySchema,
use_globus: bool = False,
interval: int = 10,
parallel_https: int = 4,
verbose: bool = False,
forge_client: Forge = None,
local_cache_dir: str = None
)
method clear_dataset_cache
clear_dataset_cache
clear_dataset_cache()
Deletes the cached data for this specific datset
method get_as_dict
get_as_dict
get_as_dict(split: str = None, as_hdf5: bool = False)
Returns the data from the dataset as a dictionary
Arguments:
split
(string): Split to create dataset on.**Default
: **None
Returns: (dict) Dictionary of all the data from the specified split
method get_as_tensorflow
get_as_tensorflow
get_as_tensorflow(split: str = None)
Convert Foundry Dataset to a Tensorflow Sequence
Arguments:
split
(string): Split to create Tensorflow Sequence on.**Default
: **None
Returns: (TensorflowSequence) Tensorflow Sequence of all the data from the specified split
method get_as_torch
get_as_torch
get_as_torch(split: str = None)
Returns the data from the dataset as a TorchDataset
Arguments:
split
(string): Split to create PyTorch Dataset on.**Default
: **None
Returns: (TorchDataset) PyTorch Dataset of all the data from the specified split
method get_citation
get_citation
get_citation() → str
method validate_metadata
validate_metadata
validate_metadata(metadata)
Validate the JSON message against the FoundryDataset model
Arguments:
metadata
(dict): Metadata information provided by the user.
Raises:
ValidationError
: if metadata supplied by user does not meet the specificiation of a FoundryDataset object.
This file was automatically generated via lazydocs.