foundry.foundry_cache
FoundryCache
The FoundryCache manages the local storage of FoundryDataset objects
__init__
Initializes a FoundryCache object.
Args:
forge_client
(Forge): The Forge client object.
transfer_client
(Any): The transfer client object.
local_cache_dir
(str, optional): The local cache directory. Defaults to None. If not specified, defaults to either environmental variable ('FOUNDRY_LOCAL_CACHE_DIR') or './data/'.
clear_cache
Deletes all of the locally stored datasets
Arguments:
dataset_name
(str): Optional name of a specific dataset. If omitted, all datsets will be erased
download_to_cache
Checks if the data is downloaded, and if not, downloads the data from source to local storage.
Args:
dataset_name
(str): Name of the dataset (equivalent to source_id in MDF).
splits
(List[FoundrySplit], optional): List of splits in the dataset. Defaults to None.
use_globus
(bool, optional): If True, use Globus to download the data; otherwise, try HTTPS. Defaults to False.
interval
(int, optional): How often to wait before checking Globus transfer status. Defaults to 10.
parallel_https
(int, optional): Number of files to download in parallel if using HTTPS. Defaults to 4.
verbose
(bool, optional): Produce more debug messages to screen. Defaults to False.
transfer_client
(Any, optional): The transfer client object. Defaults to None.
Returns:
FoundryCache
: The FoundryCache object.
download_via_globus
Downloads selected dataset over Globus.
Args:
dataset_name
(str): Name of the dataset (equivalent to source_id in MDF).
interval
(int): How often to wait before checking Globus transfer status.
download_via_http
Downloads selected dataset from MDF over HTTP.
Args:
dataset_name
(str): Name of the dataset (equivalent to source_id in MDF).
parallel_https
(int): Number of threads to use for downloading.
verbose
(bool): Produce more debug messages to screen.
transfer_client
(Any): The transfer client object.
get_keys
Get keys for a Foundry dataset
Arguments:
foundry_schema
(FoundrySchema): The schema from MDF that contains the keys
type
(str): The type of key to be returned e.g., "input", "target"
as_object
(bool): When False
, will return a list of keys in as strings When True
, will return the full key objects
**Default
: ** False
Returns: (list) String representations of keys or if as_object
is False otherwise returns the full key objects.
load_as_dict
Load in the data associated with the prescribed dataset.
Args:
dataset_name
(str): Name of the dataset (equivalent to source_id in MDF).
foundry_schema
(FoundrySchema, optional): Schema element as obtained from MDF. Defaults to None.
use_globus
(bool, optional): If True, use Globus to download the data; otherwise, try HTTPS. Defaults to False.
interval
(int, optional): How often to wait before checking Globus transfer status. Defaults to 10.
parallel_https
(int, optional): Number of files to download in parallel if using HTTPS. Defaults to 4.
verbose
(bool, optional): Produce more debug messages to screen. Defaults to False.
transfer_client
(Any, optional): The transfer client object. Defaults to None.
as_hdf5
(bool, optional): If True and dataset is in HDF5 format, keep data in HDF5 format. Defaults to False.
Returns:
dict
: A labeled dictionary of tuples.
load_as_tensorflow
Convert Foundry Dataset to a Tensorflow Sequence
Arguments:
split
(string): Split to create Tensorflow Sequence on.
**Default
: ** None
Returns: (TensorflowSequence) Tensorflow Sequence of all the data from the specified split
load_as_torch
Convert Foundry Dataset to a PyTorch Dataset
Arguments:
split
(string): Split to create PyTorch Dataset on.
**Default
: ** None
Returns: (TorchDataset) PyTorch Dataset of all the data from the specified split
validate_local_dataset_storage
Verifies that the local storage location exists and all expected files are present.
Args:
dataset_name
(str): Name of the dataset (equivalent to source_id in MDF).
splits
(List[FoundrySplit], optional): Labels of splits to be loaded. Defaults to None.
Returns:
bool
: True if the dataset exists and contains all the desired files; False otherwise.
This file was automatically generated via lazydocs.