Loading Data

Once you've found a dataset, here's how to load and use it.

Basic Loading

from foundry import Foundry

f = Foundry()
results = f.search("band gap", limit=1)
dataset = results.iloc[0].FoundryDataset

# Load all data
data = dataset.get_as_dict()

Understanding the Data Structure

Most datasets have this structure:

data = {
    'train': (X_train, y_train),  # Inputs and targets
    'test': (X_test, y_test),
}

Access training data:

Loading Specific Splits

Loading with Schema

Get data and metadata together:

Data Types

Tabular Data

Most common format - dictionaries of arrays:

Working with DataFrames

HDF5 Data

For large datasets, use lazy loading:

Caching

Data is cached locally after first download:

Custom Cache Location

Clear Cache

Common Patterns

Train/Test Split

Single Target Column

Multiple Inputs

Error Handling

Last updated

Was this helpful?