foundry package — Foundry_test 1.1 documentation - HTML AUTOGENERATION
Last updated
Was this helpful?
Last updated
Was this helpful?
class foundry.foundry.Foundry(no_browser=False, no_local_server=False, search_index='mdf-test', *, dc: Dict = {}, mdf: Dict = {}, dataset: = {}, config: = FoundryConfig(dataframe_file='foundry_dataframe.json', data_file='foundry.hdf5', metadata_file='foundry_metadata.json', destination_endpoint=None, local=False, metadata_key='foundry', organization='foundry', local_cache_dir='./data'), dlhub_client: Any = None, forge_client: Any = None, connect_client: Any = None, xtract_tokens: Any = None)
Bases:
Foundry Client Base Class TODO: ——- Add Docstring build(spec, globus=False, interval=3, file=False)
Build a Foundry Data Package :param spec: dict or str (relative filename) of the data package specification :type spec: multiple :param globus: if True use Globus to fetch datasets :type globus: bool :param interval: Polling interval on checking task status in seconds. :type interval: int :param type: One of “file” or None :type type: strReturns
**(Foundry)**Return type
self: for chaining check_model_status(res)
Check status of model or function publication to DLHub
TODO: currently broken on DLHub side of things check_status(source_id, short=False, raw=False)
Check the status of your submission.Parameters
source_id (str) – The source_id
(source_name
+ version information) of the submission to check. Returned in the res
result from publish()
via MDF Connect Client.
short (bool) – When False
, will print a status summary containing all of the status steps for the dataset. When True
, will print a short finished/processing message, useful for checking many datasets’ status at once. Default: False
raw (bool) – When False
, will print a nicely-formatted status summary. When True
, will return the full status result. For direct human consumption, False
is recommended. Default: False
Returns
The full status result.Return type
If raw
is True
, dict collect_dataframes(packages=[])
Collect dataframes of local data packages :param packages: List of packages to collect, defaults to all :type packages: listReturns
**(tuple)**Return type
Set Foundry config :keyword file: Path to the file containing :kwtype file: str :keyword (default: self.config.metadata_file)
dataframe_file (str): filename for the dataframe file default:”foundry_dataframe.json” data_file (str): : filename for the data file default:”foundry.hdf5” destination_endpoint (str): Globus endpoint UUID where Foundry data should move local_cache_dir (str): Where to place collected data default:”./data”Returns
**(Foundry)**Return type
Download a Foundry dataset :param globus: if True, use Globus to download the data else try HTTPS :type globus: bool :param verbose: if True print out debug information during the download :type verbose: boolReturns
**(Foundry)**Return type
Get keys for a Foundry datasetParameters
type (str) – The type of key to be returned e.g., “input”, “target”
as_object (bool) – When False
, will return a list of keys in as strings When True
, will return the full key objects Default: False
Returns: (list) String representations of keys or if as_object
Get available local data packagesParameters
paths (bool) – If True return paths in addition to package, if False return package name onlyReturns
**(list)**Return type
List available Foundry data packagesReturns
**(pandas.DataFrame)**Return type
Load the metadata for a Foundry dataset into the client :param name: Name of the foundry dataset :type name: str :param download: If True, download the data associated with the package (default is True) :type download: bool :param globus: If True, download using Globus, otherwise https :type globus: bool :param verbose: If True print additional debug information :type verbose: bool :param metadata: For debug purposes. A search result analog to prepopulate metadata. :type metadata: dictKeyword Arguments
interval (int) – How often to poll Globus to check if transfers are completeReturnsReturn type
Load in the data associated with the prescribed dataset
Tabular Data Type: Data are arranged in a standard data frame stored in self.dataframe_file. The contents are read, and
File Data Type: <<Add desc>>
For more complicated data structures, users should subclass Foundry and override the load_data functionParameters
inputs (list) – List of strings for input columns
targets (list) – List of strings for output columns
Returns ——-s
(tuple): Tuple of X, y values
Submit a dataset for publication :param foundry_metadata: Dict of metadata describing data package :type foundry_metadata: dict :param data_source: Url for Globus endpoint :type data_source: string :param title: Title of data package :type title: string :param authors: List of data package author names e.g., Jack Black
or Nunez, Victoria
Parameters
update (bool) – True if this is an update to a prior data package (default: self.config.metadata_file)
publication_year (int) – Year of dataset publication. If None, will be set to the current calendar year by MDF Connect Client. (default: $current_year)
Keyword Arguments
short_name (string) – Shortened/abbreviated name of the data package
publisher (string) – Data publishing entity (e.g. MDF, Zenodo, etc.)
Returns
(dict) MDF Connect Response – of dataset. Contains source_id, which can be used to check the status of the submissionReturn type
Submit a model or function for publication :param options: dict of all possible optionsOptions keys:
title (req) authors (req) short_name (req) servable_type (req) (“static method”, “class method”, “keras”, “pytorch”, “tensorflow”, “sklearn”) affiliations domains abstract references requirements (dict of library:version keypairs) module (if Python method) function (if Python method) inputs (not needed for TF) (dict of options) outputs (not needed for TF) methods (e.g. research methods) DOI publication_year (advanced) version (advanced) visibility (dict of users and groups, each a list) funding reference rights
Run a model on dataParameters
name (str) – DLHub model name
inputs – Data to send to DLHub as inputs (should be JSON serializable)
ReturnsReturn type
Returns results after invocation via the DLHub service
Bases: pydantic.main.BaseModel
Foundry Configuration Configuration information for Foundry DatasetParameters
dataframe_file (str) – Filename to read dataframe contents from
metadata_file (str) – Filename to read metadata contents from defaults to reading for MDF Discover
destination_endpoint (str) – Globus endpoint ID to transfer data to (defaults to local GCP installation)
local_cache_dir (str) – Path to local Foundry package cache
Bases: pydantic.main.BaseModel
Bases: enum.Enum
Bases: pydantic.main.BaseModel
Bases: pydantic.main.BaseModel
Tuple of X(pandas.DataFrame), y(pandas.DataFrame) configure(**kwargs)
self: for chaining connect_client_: Any_ describe_model() dlhub_client_: Any_ download(globus=True, verbose=False, **kwargs)
self: for chaining forge_client_: Any_ get_keys(type, as_object=False)
is False otherwise returns the full key objects. get_packages(paths=False)
List describing local Foundry packages list()
DataFrame with summary list of Foundry data packages including name, title, and publication year load(name, download=True, globus=True, verbose=False, metadata=None, **kwargs)
self load_data(source_id=None, globus=True)
publish(foundry_metadata, data_source, title, authors, update=False, publication_year=None, **kwargs)
affiliations () – List of author affiliations
tags () – List of tags to apply to the data package
Response from MDF Connect to allow tracking publish_model(options)
TODO: alternate identifier (to add an identifier of this artifact in another service) add file add directory add files run(name, inputs, **kwargs)
Pass kwargs through to DLHub client and document kwargs
xtract_tokens_: Any_
class foundry.models.FoundryConfig(*, dataframe_file: str = 'foundry_dataframe.json', data_file: str = 'foundry.hdf5', metadata_file: str = 'foundry_metadata.json', destination_endpoint: str = None, local: bool = False, metadata_key: str = 'foundry', organization: str = 'foundry', local_cache_dir: str = './data')
data_file_: Optional[str] dataframe_file: Optional[str] destination_endpoint: Optional[str] local: Optional[bool] metadata_file: Optional[str] metadata_key: Optional[str] organization: Optional[str] class foundry.models.FoundryDataset(*, keys: List[] = None_, splits: List[] = None, type: = None, short_name: str = '', dataframe: Any = None)
Foundry Dataset Schema for Foundry Datasets. This includes specifications of inputs, outputs, type, version, and more class Config
Bases: object
arbitrary_types_allowed = True dataframe_: Optional[Any] keys: List[] short_name: Optional[str] splits: Optional[List[]] type:_ class foundry.models.FoundryDatasetType(value)
Foundry Dataset Types Enumeration of the possible Foundry dataset types files = 'files' hdf5 = 'hdf5' other = 'other' tabular = 'tabular' class foundry.models.FoundryKey(*, key: List[str] = [], type: str = '', filter: str = '', units: str = '', description: str = '', classes: List[] = None)
Bases: pydantic.main.BaseModel
classes_: Optional[List[]] description: Optional[str] filter: Optional[str] key: List[str] type: str_ units_: Optional[str] class foundry.models.FoundryKeyClass(*_, label: str = '', name: str = '')
Bases: pydantic.main.BaseModel
label_: str_ name_: str_ class foundry.models.FoundryMetadata(*, dc: Dict = {}, mdf: Dict = {}, dataset: = {}, config: = FoundryConfig(dataframe_file='foundry_dataframe.json', data_file='foundry.hdf5', metadata_file='foundry_metadata.json', destination_endpoint=None, local=False, metadata_key='foundry', organization='foundry', local_cache_dir='./data'))
Bases: pydantic.main.BaseModel
class Config
Bases: object
arbitrary_types_allowed = True config_:_ dataset_:_ dc_: Optional[Dict] mdf: Optional[Dict] class foundry.models.FoundrySpecification(*_, name: str = '', version: str = '', description: str = '', private: bool = False, dependencies: Any = None)
Pydantic base class for interacting with the Foundry data package specification The specification provides a way to group datasets and manage versions add_dependency(name, version) clear_dependencies() dependencies_: Any_ description_: str_ name_: str_ private_: bool_ remove_duplicate_dependencies() version_: str_ class foundry.models.FoundrySpecificationDataset(*, name: str = None, provider: str = 'MDF', version: str = None)
Pydantic base class for datasets within the Foundry data package specification name_: Optional[str] provider: Optional[str] version: Optional[str] class foundry.models.FoundrySplit(*_, type: str = '', path: str = '', label: str = '')
Bases: pydantic.main.BaseModel
label_: Optional[str] path: Optional[str] type: str_