foundry.foundry
HiddenColumnDataFrame
A subclass of pd.DataFrame that supports hiding a specific column. This is intended to mimic display of search results from an earlier version while providing access to associated FoundryDataset objects for each entry in the dataframe via the get_dataset_by_[name/doi]()
function.
Parameters: *args: positional arguments Positional arguments passed to the parent class constructor. hidden_column: str, optional The name of the column to be hidden. **kwargs: keyword arguments Keyword arguments passed to the parent class constructor.
Attributes: hidden_column: str or None The name of the hidden column.
Methods: repr_html(): Overrides the repr_html method of the parent class to hide the specified column in the HTML representation. get_dataset_by_name(dataset_name): Returns the FoundryDataset associated with the given dataset name. Can also handle a DOI. get_dataset_by_doi(doi): Returns the FoundryDataset associated with the given DOI.
__init__
property T
The transpose of the DataFrame.
Returns ------- DataFrame The transposed DataFrame.
See Also -------- DataFrame.transpose : Transpose index and columns.
Examples -------- df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df
col1 col2 0 1 3 1 2 4
df.T
0 1 col1 1 2 col2 3 4
property at
Access a single value for a row/column label pair.
Similar to loc
, in that both provide label-based lookups. Use at
if you only need to get or set a single value in a DataFrame or Series.
Raises ------ KeyError If getting a value and 'label' does not exist in a DataFrame or Series.
ValueError If row/column label pair is not a tuple or if any label from the pair is not a scalar for DataFrame. If label is list-like (excluding NamedTuple) for Series.
See Also -------- DataFrame.at : Access a single value for a row/column pair by label. DataFrame.iat : Access a single value for a row/column pair by integer position. DataFrame.loc : Access a group of rows and columns by label(s). DataFrame.iloc : Access a group of rows and columns by integer position(s). Series.at : Access a single value by label. Series.iat : Access a single value by integer position. Series.loc : Access a group of rows by label(s). Series.iloc : Access a group of rows by integer position(s).
Notes ----- See :ref:Fast scalar value getting and setting <indexing.basics.get_value>
for more details.
Examples -------- df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... index=[4, 5, 6], columns=['A', 'B', 'C']) df
A B C 4 0 2 3 5 0 4 1 6 10 20 30
Get value at specified row/column pair
df.at[4, 'B']
2
Set value at specified row/column pair
df.at[4, 'B'] = 10
df.at[4, 'B']
10
Get value within a Series
df.loc[5].at['B']
4
property attrs
Dictionary of global attributes of this dataset.
.. warning:
See Also -------- DataFrame.flags : Global flags applying to this object.
Notes ----- Many operations that create new datasets will copy attrs
. Copies are always deep so that changing attrs
will only affect the present dataset. pandas.concat
copies attrs
only if all input datasets have the same attrs
.
Examples -------- For Series:
ser = pd.Series([1, 2, 3])
ser.attrs = {"A": [10, 20, 30]}
ser.attrs
{'A': [10, 20, 30]}
For DataFrame:
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df.attrs = {"A": [10, 20, 30]}
df.attrs
{'A': [10, 20, 30]}
property axes
Return a list representing the axes of the DataFrame.
It has the row axis labels and column axis labels as the only members. They are returned in that order.
Examples -------- df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df.axes
[RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='object')]
property dtypes
Return the dtypes in the DataFrame.
This returns a Series with the data type of each column. The result's index is the original DataFrame's columns. Columns with mixed types are stored with the object
dtype. See :ref:the User Guide <basics.dtypes>
for more.
Returns ------- pandas.Series The data type of each column.
Examples -------- df = pd.DataFrame({'float': [1.0],
... 'int': [1], ... 'datetime': [pd.Timestamp('20180310')], ... 'string': ['foo']}) df.dtypes
float float64 int int64 datetime datetime64[ns] string object dtype: object
property empty
Indicator whether Series/DataFrame is empty.
True if Series/DataFrame is entirely empty (no items), meaning any of the axes are of length 0.
Returns ------- bool If Series/DataFrame is empty, return True, if not return False.
See Also -------- Series.dropna : Return series without null values. DataFrame.dropna : Return DataFrame with labels on given axis omitted where (all or any) data are missing.
Notes ----- If Series/DataFrame contains only NaNs, it is still not considered empty. See the example below.
Examples -------- An example of an actual empty DataFrame. Notice the index is empty:
df_empty = pd.DataFrame({'A' : []})
df_empty
Empty DataFrame Columns: [A] Index: [] df_empty.empty
True
If we only have NaNs in our DataFrame, it is not considered empty! We will need to drop the NaNs to make the DataFrame empty:
df = pd.DataFrame({'A' : [np.nan]})
df
A 0 NaN df.empty
False df.dropna().empty
True
ser_empty = pd.Series({'A' : []})
ser_empty
A [] dtype: object ser_empty.empty
False ser_empty = pd.Series()
ser_empty.empty
True
property flags
Get the properties associated with this pandas object.
The available flags are
:attr:Flags.allows_duplicate_labels
See Also -------- Flags : Flags that apply to pandas objects. DataFrame.attrs : Global metadata applying to this dataset.
Notes ----- "Flags" differ from "metadata". Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in :attr:DataFrame.attrs
.
Examples -------- df = pd.DataFrame({"A": [1, 2]})
df.flags
<Flags(allows_duplicate_labels=True)>
Flags can be get or set using .
df.flags.allows_duplicate_labels
True df.flags.allows_duplicate_labels = False
Or by slicing with a key
df.flags["allows_duplicate_labels"]
False df.flags["allows_duplicate_labels"] = True
property iat
Access a single value for a row/column pair by integer position.
Similar to iloc
, in that both provide integer-based lookups. Use iat
if you only need to get or set a single value in a DataFrame or Series.
Raises ------ IndexError When integer position is out of bounds.
See Also -------- DataFrame.at : Access a single value for a row/column label pair. DataFrame.loc : Access a group of rows and columns by label(s). DataFrame.iloc : Access a group of rows and columns by integer position(s).
Examples -------- df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... columns=['A', 'B', 'C']) df
A B C 0 0 2 3 1 0 4 1 2 10 20 30
Get value at specified row/column pair
df.iat[1, 2]
1
Set value at specified row/column pair
df.iat[1, 2] = 10
df.iat[1, 2]
10
Get value within a series
df.loc[0].iat[1]
2
property iloc
Purely integer-location based indexing for selection by position.
.. deprecated:: 2.2.0
Returning a tuple from a callable is deprecated.
.iloc[]
is primarily integer position based (from 0
to length-1
of the axis), but may also be used with a boolean array.
Allowed inputs are:
An integer, e.g. 5
.
A list or array of integers, e.g. [4, 3, 0]
.
A slice object with ints, e.g. 1:7
.
A boolean array.
A callable
function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don't have a reference to the calling object, but would like to base your selection on some value.
A tuple of row and column indexes. The tuple elements consist of one of the above inputs, e.g. (0, 1)
.
.iloc
will raise IndexError
if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).
See more at :ref:Selection by Position <indexing.integer>
.
See Also -------- DataFrame.iat : Fast integer location scalar accessor. DataFrame.loc : Purely label-location based indexer for selection by label. Series.iloc : Purely integer-location based indexing for selection by position.
Examples -------- mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
... {'a': 100, 'b': 200, 'c': 300, 'd': 400}, ... {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000}] df = pd.DataFrame(mydict)
df
a b c d 0 1 2 3 4 1 100 200 300 400 2 1000 2000 3000 4000
Indexing just the rows
With a scalar integer.
type(df.iloc[0])
<class 'pandas.core.series.Series'> df.iloc[0]
a 1 b 2 c 3 d 4 Name: 0, dtype: int64
With a list of integers.
df.iloc[[0]]
a b c d 0 1 2 3 4 type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>
df.iloc[[0, 1]]
a b c d 0 1 2 3 4 1 100 200 300 400
With a slice
object.
df.iloc[:3]
a b c d 0 1 2 3 4 1 100 200 300 400 2 1000 2000 3000 4000
With a boolean mask the same length as the index.
df.iloc[[True, False, True]]
a b c d 0 1 2 3 4 2 1000 2000 3000 4000
With a callable, useful in method chains. The x
passed to the lambda
is the DataFrame being sliced. This selects the rows whose index label even.
df.iloc[lambda x: x.index % 2 == 0]
a b c d 0 1 2 3 4 2 1000 2000 3000 4000
Indexing both axes
You can mix the indexer types for the index and columns. Use :
to select the entire axis.
With scalar integers.
df.iloc[0, 1]
2
With lists of integers.
df.iloc[[0, 2], [1, 3]]
b d 0 2 4 2 2000 4000
With slice
objects.
df.iloc[1:3, 0:3]
a b c 1 100 200 300 2 1000 2000 3000
With a boolean array whose length matches the columns.
df.iloc[:, [True, False, True, False]]
a c 0 1 3 1 100 300 2 1000 3000
With a callable function that expects the Series or DataFrame.
df.iloc[:, lambda df: [0, 2]]
a c 0 1 3 1 100 300 2 1000 3000
property loc
Access a group of rows and columns by label(s) or a boolean array.
.loc[]
is primarily label based, but may also be used with a boolean array.
Allowed inputs are:
A single label, e.g. 5
or 'a'
, (note that 5
is interpreted as a label of the index, and never as an integer position along the index).
A list or array of labels, e.g. ['a', 'b', 'c']
.
A slice object with labels, e.g. 'a':'f'
.
.. warning:: Note that contrary to usual python slices, both the start and the stop are included
A boolean array of the same length as the axis being sliced, e.g. [True, False, True]
.
An alignable boolean Series. The index of the key will be aligned before masking.
An alignable Index. The Index of the returned selection will be the input.
A callable
function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)
See more at :ref:Selection by Label <indexing.label>
.
Raises ------ KeyError If any items are not found. IndexingError If an indexed key is passed and its index is unalignable to the frame index.
See Also -------- DataFrame.at : Access a single value for a row/column label pair. DataFrame.iloc : Access group of rows and columns by integer position(s). DataFrame.xs : Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Series.loc : Access group of values using labels.
Examples -------- Getting values
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'], ... columns=['max_speed', 'shield']) df
max_speed shield cobra 1 2 viper 4 5 sidewinder 7 8
Single label. Note this returns the row as a Series.
df.loc['viper']
max_speed 4 shield 5 Name: viper, dtype: int64
List of labels. Note using [[]]
returns a DataFrame.
df.loc[['viper', 'sidewinder']]
max_speed shield viper 4 5 sidewinder 7 8
Single label for row and column
df.loc['cobra', 'shield']
2
Slice with labels for row and single label for column. As mentioned above, note that both the start and stop of the slice are included.
df.loc['cobra':'viper', 'max_speed']
cobra 1 viper 4 Name: max_speed, dtype: int64
Boolean list with the same length as the row axis
df.loc[[False, False, True]]
max_speed shield sidewinder 7 8
Alignable boolean Series:
df.loc[pd.Series([False, True, False],
... index=['viper', 'sidewinder', 'cobra'])] max_speed shield sidewinder 7 8
Index (same behavior as df.reindex
)
df.loc[pd.Index(["cobra", "viper"], name="foo")]
max_speed shield foo cobra 1 2 viper 4 5
Conditional that returns a boolean Series
df.loc[df['shield'] > 6]
max_speed shield sidewinder 7 8
Conditional that returns a boolean Series with column labels specified
df.loc[df['shield'] > 6, ['max_speed']]
max_speed sidewinder 7
Multiple conditional using &
that returns a boolean Series
df.loc[(df['max_speed'] > 1) & (df['shield'] < 8)]
max_speed shield viper 4 5
Multiple conditional using |
that returns a boolean Series
df.loc[(df['max_speed'] > 4) | (df['shield'] < 5)]
max_speed shield cobra 1 2 sidewinder 7 8
Please ensure that each condition is wrapped in parentheses ()
. See the :ref:user guide<indexing.boolean>
for more details and explanations of Boolean indexing.
.. note:
Callable that returns a boolean Series
df.loc[lambda df: df['shield'] == 8]
max_speed shield sidewinder 7 8
Setting values
Set value for all items matching the list of labels
df.loc[['viper', 'sidewinder'], ['shield']] = 50
df
max_speed shield cobra 1 2 viper 4 50 sidewinder 7 50
Set value for an entire row
df.loc['cobra'] = 10
df
max_speed shield cobra 10 10 viper 4 50 sidewinder 7 50
Set value for an entire column
df.loc[:, 'max_speed'] = 30
df
max_speed shield cobra 30 10 viper 30 50 sidewinder 30 50
Set value for rows matching callable condition
df.loc[df['shield'] > 35] = 0
df
max_speed shield cobra 30 10 viper 0 0 sidewinder 0 0
Add value matching location
df.loc["viper", "shield"] += 5
df
max_speed shield cobra 30 10 viper 0 5 sidewinder 0 0
Setting using a Series
or a DataFrame
sets the values matching the index labels, not the index positions.
shuffled_df = df.loc[["viper", "cobra", "sidewinder"]]
df.loc[:] += shuffled_df
df
max_speed shield cobra 60 20 viper 0 10 sidewinder 0 0
Getting values on a DataFrame with an index that has integer labels
Another example using integers for the index
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=[7, 8, 9], columns=['max_speed', 'shield']) df
max_speed shield 7 1 2 8 4 5 9 7 8
Slice with integer labels for rows. As mentioned above, note that both the start and stop of the slice are included.
df.loc[7:9]
max_speed shield 7 1 2 8 4 5 9 7 8
Getting values with a MultiIndex
A number of examples using a DataFrame with a MultiIndex
tuples = [
... ('cobra', 'mark i'), ('cobra', 'mark ii'), ... ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'), ... ('viper', 'mark ii'), ('viper', 'mark iii') ... ] index = pd.MultiIndex.from_tuples(tuples)
values = [[12, 2], [0, 4], [10, 20],
... [1, 4], [7, 1], [16, 36]] df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
df
max_speed shield cobra mark i 12 2 mark ii 0 4 sidewinder mark i 10 20 mark ii 1 4 viper mark ii 7 1 mark iii 16 36
Single label. Note this returns a DataFrame with a single index.
df.loc['cobra']
max_speed shield mark i 12 2 mark ii 0 4
Single index tuple. Note this returns a Series.
df.loc[('cobra', 'mark ii')]
max_speed 0 shield 4 Name: (cobra, mark ii), dtype: int64
Single label for row and column. Similar to passing in a tuple, this returns a Series.
df.loc['cobra', 'mark i']
max_speed 12 shield 2 Name: (cobra, mark i), dtype: int64
Single tuple. Note using [[]]
returns a DataFrame.
df.loc[[('cobra', 'mark ii')]]
max_speed shield cobra mark ii 0 4
Single tuple for the index with a single label for the column
df.loc[('cobra', 'mark i'), 'shield']
2
Slice from index tuple to single label
df.loc[('cobra', 'mark i'):'viper']
max_speed shield cobra mark i 12 2 mark ii 0 4 sidewinder mark i 10 20 mark ii 1 4 viper mark ii 7 1 mark iii 16 36
Slice from index tuple to index tuple
df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
max_speed shield cobra mark i 12 2 mark ii 0 4 sidewinder mark i 10 20 mark ii 1 4 viper mark ii 7 1
Please see the :ref:user guide<advanced.advanced_hierarchical>
for more details and explanations of advanced indexing.
property ndim
Return an int representing the number of axes / array dimensions.
Return 1 if Series. Otherwise return 2 if DataFrame.
See Also -------- ndarray.ndim : Number of array dimensions.
Examples -------- s = pd.Series({'a': 1, 'b': 2, 'c': 3})
s.ndim
1
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df.ndim
2
property shape
Return a tuple representing the dimensionality of the DataFrame.
See Also -------- ndarray.shape : Tuple of array dimensions.
Examples -------- df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df.shape
(2, 2)
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4],
... 'col3': [5, 6]}) df.shape
(2, 3)
property size
Return an int representing the number of elements in this object.
Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.
See Also -------- ndarray.size : Number of elements in the array.
Examples -------- s = pd.Series({'a': 1, 'b': 2, 'c': 3})
s.size
3
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df.size
4
property style
Returns a Styler object.
Contains methods for building a styled HTML representation of the DataFrame.
See Also -------- io.formats.style.Styler : Helps style a DataFrame or Series according to the data with HTML and CSS.
Examples -------- df = pd.DataFrame({'A': [1, 2, 3]})
df.style # doctest: +SKIP
Please see Table Visualization <../../user_guide/style.ipynb>
_ for more examples.
property values
Return a Numpy representation of the DataFrame.
.. warning:
Only the values in the DataFrame will be returned, the axes labels will be removed.
Returns ------- numpy.ndarray The values of the DataFrame.
See Also -------- DataFrame.to_numpy : Recommended alternative to this method. DataFrame.index : Retrieve the index labels. DataFrame.columns : Retrieving the column names.
Notes ----- The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.
e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By :func:numpy.find_common_type
convention, mixing int64 and uint64 will result in a float64 dtype.
Examples -------- A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.
df = pd.DataFrame({'age': [ 3, 29],
... 'height': [94, 170], ... 'weight': [31, 115]}) df
age height weight 0 3 94 31 1 29 170 115 df.dtypes
age int64 height int64 weight int64 dtype: object df.values
array([[ 3, 94, 31], [ 29, 170, 115]])
A DataFrame with mixed type columns(e.g., str/object, int64, float32) results in an ndarray of the broadest type that accommodates these mixed types (e.g., object).
df2 = pd.DataFrame([('parrot', 24.0, 'second'),
... ('lion', 80.5, 1), ... ('monkey', np.nan, None)], ... columns=('name', 'max_speed', 'rank')) df2.dtypes
name object max_speed float64 rank object dtype: object df2.values
array([['parrot', 24.0, 'second'], ['lion', 80.5, 1], ['monkey', nan, None]], dtype=object)
get_dataset_by_doi
get_dataset_by_name
Foundry
Foundry Client Base Class
This class represents a client for interacting with the Foundry service. It provides methods for searching and accessing datasets, as well as publishing new datasets.
Attributes:
dlhub_client
(Any): The DLHub client.
forge_client
(Any): The Forge client.
connect_client
(Any): The MDF Connect client.
transfer_client
(Any): The Globus transfer client.
auth_client
(Any): The authentication client.
index
(str): The index to use for search and data publication.
auths
(Any): The authorizers used for authentication.
__init__
Initialize a Foundry client
Args:
no_browser
(bool): Whether to open the browser for the Globus Auth URL.
no_local_server
(bool): Whether a local server is available. This should be False
when on a remote server (e.g., Google Colab).
index
(str): Index to use for search and data publication. Choices are mdf
or mdf-test
.
authorizers
(dict): A dictionary of authorizers to use, following the mdf_toolbox
format.
use_globus
(bool): If True, download using Globus, otherwise use HTTPS.
verbose
(bool): If True, print additional debug information.
interval
(int): How often to poll Globus to check if transfers are complete.
local_cache_dir
(str): Optional location to store downloaded data. If not specified, defaults to either environmental variable ('FOUNDRY_LOCAL_CACHE_DIR') or './data'
data
(dict): Other arguments, e.g., results from an MDF search result that are used to populate Foundry metadata fields.
Returns: An initialized and authenticated Foundry client.
check_status
Check the status of your submission.
Arguments:
source_id
(str): The source_id
(source_name
+ version information) of the submission to check. Returned in the res
result from publish()
via MDF Connect Client.
short
(bool): When False
, will print a status summary containing all of the status steps for the dataset. When True
, will print a short finished/processing message, useful for checking many datasets' status at once.
**Default
: ** False
raw
(bool): When False
, will print a nicely-formatted status summary. When True
, will return the full status result. For direct human consumption, False
is recommended.
**Default
: ** False
Returns:
If ``raw`` is ``True``, *dict*
: The full status result.
dataset_from_metadata
Converts the result of a forge query to a FoundryDatset object
Args:
metadata
(dict): result from a forge query
Returns:
FoundryDataset
: a FoundryDataset object created from the metadata
Raises:
Exception
: If the mdf entry is missing a section, cannot generate a foundry dataset object
filter_datasets_by_query
Filters the given metadata based on the provided query string.
Args:
query_string
(str): The query string to filter the metadata.
metadata
(list): The list of metadata to be filtered.
Returns:
list[dict]
: A list of dicts that match the query string.
get_metadata_by_doi
Query foundry datasets by DOI
Should only return a single result.
Args:
doi
(str): doi of desired dataset
Returns:
metadata
(dict): result from a forge query
get_metadata_by_query
Submit query to forge client and return results
Args:
q
(str): query string The query string to be submitted to the forge client.
limit
(int): maximum number of results to return The maximum number of results to be returned by the foundry client.
Returns:
metadata
(dict): result from a forge query The result from the forge query, represented as a dictionary.
list
List available Foundry datasets
Args:
limit
(int): maximum number of results to return
Returns:
List[FoundryDataset]
: List of FoundryDataset objects
publish_dataset
Submit a dataset for publication; can choose to submit via HTTPS using https_data_path
or via Globus Transfer using the globus_data_source
argument. Only one upload method may be specified.
Args:
foundry_metadata
(dict): Dict of metadata describing data package
title
(string): Title of data package
authors
(list): List of data package author names e.g., Jack Black or Nunez, Victoria
https_data_path
(str): Path to the local dataset to publish to Foundry via HTTPS. Creates an HTTPS PUT request to upload the data specified to a Globus endpoint (default is NCSA endpoint) before it is transferred to MDF. If None, the user must specify a 'globus_data_source' URL to the location of the data on their own Globus endpoint. User must choose either globus_data_source
or https_data_path
to publish their data.
globus_data_source
(str): Url path for a data folder on a Globus endpoint; url can be obtained through the Globus Web UI or SDK. If None, the user must specify an 'https_data_path' pointing to the location of the data on their local machine. User must choose either globus_data_source
or https_data_path
to publish their data.
update
(bool): True if this is an update to a prior data package
(default
: self.config.metadata_file)
publication_year
(int): Year of dataset publication. If None, will be set to the current calendar year by MDF Connect Client.
(default
: $current_year)
test
(bool): If True, do not submit the dataset for publication (ie transfer to the MDF endpoint). Default is False.
Keyword Args:
affiliations
(list): List of author affiliations
tags
(list): List of tags to apply to the data package
short_name
(string): Shortened/abbreviated name of the data package
publisher
(string): Data publishing entity (e.g. MDF, Zenodo, etc.)
description
(str): A description of the dataset.
dataset_doi
(str): The DOI for this dataset (not an associated paper).
related_dois
(list): DOIs related to this dataset, not including the dataset's own DOI (for example, an associated paper's DOI).
Returns ------- (dict) MDF Connect Response: Response from MDF Connect to allow tracking of dataset. Contains source_id
, which can be used to check the status of the submission
publish_model
Simplified publishing method for servables
Args:
title
(string): title for the servable
creators
(string | list): either the creator's name (FamilyName, GivenName) or a list of the creators' names
short_name
(string): shorthand name for the servable
servable_type
(string): the type of the servable, must be a member of ("static_method", "class_method", "keras", "pytorch", "tensorflow", "sklearn")
serv_options
(dict): the servable_type specific arguments that are necessary for publishing. arguments can be found at
https
: //dlhub-sdk.readthedocs.io/en/latest/source/dlhub_sdk.models.servables.html under the appropriate create_model
signature. use the argument names as keys and their values as the values.
affiliations
(list): list of affiliations for each author
paper_doi
(str): DOI of a paper that describes the servable
Returns:
(string)
: task id of this submission, can be used to check for success
Raises:
ValueError
: If the given servable_type is not in the list of acceptable types
Exception
: If the serv_options are incomplete or the request to publish results in an error
search
Search available Foundry datasets
This method searches for available Foundry datasets based on the provided query string. If a DOI is provided as the query, it retrieves the metadata for that specific dataset. If a query string is provided, it retrieves the metadata for datasets that match the query. The limit parameter can be used to specify the maximum number of results to return.
Args:
query
(str): The query string to match. If a DOI is provided, it retrieves the metadata for that specific dataset.
limit
(int): The maximum number of results to return.
as_list
(bool): If True, the search results will be returned as a list instead of a DataFrame.
Returns:
List[FoundryDataset] or DataFrame
: A list of search results as FoundryDataset objects or a DataFrame if as_list is False.
Raises:
Exception
: If no results are found for the provided query.
Example: foundry = Foundry()
>>> results = foundry.search(query="materials science", limit=10) >>> print(len(results)) 10
search_results_to_dataframe
Convert a list of results into a pandas DataFrame.
Args:
results
(list): A list of results.
Returns:
DataFrame
: A pandas DataFrame containing the converted results.
This file was automatically generated via lazydocs.