foundry.foundry_cache
module foundry.foundry_cache
foundry.foundry_cacheclass FoundryCache
FoundryCacheThe FoundryCache manages the local storage of FoundryDataset objects
method __init__
__init____init__(forge_client: Forge, transfer_client: Any, local_cache_dir: str = None)Initializes a FoundryCache object.
Args:
forge_client(Forge): The Forge client object.transfer_client(Any): The transfer client object.local_cache_dir(str, optional): The local cache directory. Defaults to None. If not specified, defaults to either environmental variable ('FOUNDRY_LOCAL_CACHE_DIR') or './data/'.
method clear_cache
clear_cacheclear_cache(dataset_name: str = None)Deletes all of the locally stored datasets
Arguments:
dataset_name(str): Optional name of a specific dataset. If omitted, all datsets will be erased
method download_to_cache
download_to_cachedownload_to_cache(
dataset_name: str,
splits: List[FoundrySplit] = None,
use_globus: bool = False,
interval: int = 10,
parallel_https: int = 4,
verbose: bool = False,
transfer_client=None
)Checks if the data is downloaded, and if not, downloads the data from source to local storage.
Args:
dataset_name(str): Name of the dataset (equivalent to source_id in MDF).splits(List[FoundrySplit], optional): List of splits in the dataset. Defaults to None.use_globus(bool, optional): If True, use Globus to download the data; otherwise, try HTTPS. Defaults to False.interval(int, optional): How often to wait before checking Globus transfer status. Defaults to 10.parallel_https(int, optional): Number of files to download in parallel if using HTTPS. Defaults to 4.verbose(bool, optional): Produce more debug messages to screen. Defaults to False.transfer_client(Any, optional): The transfer client object. Defaults to None.
Returns:
FoundryCache: The FoundryCache object.
method download_via_globus
download_via_globusdownload_via_globus(dataset_name: str, interval: int)Downloads selected dataset over Globus.
Args:
dataset_name(str): Name of the dataset (equivalent to source_id in MDF).interval(int): How often to wait before checking Globus transfer status.
method download_via_http
download_via_httpdownload_via_http(
dataset_name: str,
parallel_https: int,
verbose: bool,
transfer_client: Any
)Downloads selected dataset from MDF over HTTP.
Args:
dataset_name(str): Name of the dataset (equivalent to source_id in MDF).parallel_https(int): Number of threads to use for downloading.verbose(bool): Produce more debug messages to screen.transfer_client(Any): The transfer client object.
method get_keys
get_keysget_keys(
foundry_schema: FoundrySchema,
type: str = None,
as_object: bool = False
)Get keys for a Foundry dataset
Arguments:
foundry_schema(FoundrySchema): The schema from MDF that contains the keystype(str): The type of key to be returned e.g., "input", "target"as_object(bool): WhenFalse, will return a list of keys in as strings WhenTrue, will return the full key objects**Default: **FalseReturns: (list) String representations of keys or ifas_objectis False otherwise returns the full key objects.
method load_as_dict
load_as_dictload_as_dict(
split: str,
dataset_name: str,
foundry_schema: FoundrySchema,
use_globus: bool,
interval: int,
parallel_https: int,
verbose: bool,
transfer_client: Any,
as_hdf5: bool
)Load in the data associated with the prescribed dataset.
Args:
dataset_name(str): Name of the dataset (equivalent to source_id in MDF).foundry_schema(FoundrySchema, optional): Schema element as obtained from MDF. Defaults to None.use_globus(bool, optional): If True, use Globus to download the data; otherwise, try HTTPS. Defaults to False.interval(int, optional): How often to wait before checking Globus transfer status. Defaults to 10.parallel_https(int, optional): Number of files to download in parallel if using HTTPS. Defaults to 4.verbose(bool, optional): Produce more debug messages to screen. Defaults to False.transfer_client(Any, optional): The transfer client object. Defaults to None.as_hdf5(bool, optional): If True and dataset is in HDF5 format, keep data in HDF5 format. Defaults to False.
Returns:
dict: A labeled dictionary of tuples.
method load_as_tensorflow
load_as_tensorflowload_as_tensorflow(
split: str,
dataset_name: str,
foundry_schema: FoundrySchema,
use_globus: bool,
interval: int,
parallel_https: int,
verbose: bool,
transfer_client: Any,
as_hdf5: bool
)Convert Foundry Dataset to a Tensorflow Sequence
Arguments:
split(string): Split to create Tensorflow Sequence on.**Default: **None
Returns: (TensorflowSequence) Tensorflow Sequence of all the data from the specified split
method load_as_torch
load_as_torchload_as_torch(
split: str,
dataset_name: str,
foundry_schema: FoundrySchema,
use_globus: bool,
interval: int,
parallel_https: int,
verbose: bool,
transfer_client: Any,
as_hdf5: bool
)Convert Foundry Dataset to a PyTorch Dataset
Arguments:
split(string): Split to create PyTorch Dataset on.**Default: **None
Returns: (TorchDataset) PyTorch Dataset of all the data from the specified split
method validate_local_dataset_storage
validate_local_dataset_storagevalidate_local_dataset_storage(
dataset_name: str,
splits: List[FoundrySplit] = None
)Verifies that the local storage location exists and all expected files are present.
Args:
dataset_name(str): Name of the dataset (equivalent to source_id in MDF).splits(List[FoundrySplit], optional): Labels of splits to be loaded. Defaults to None.
Returns:
bool: True if the dataset exists and contains all the desired files; False otherwise.
This file was automatically generated via lazydocs.