Loading Data
Once you've found a dataset, here's how to load and use it.
Basic Loading
from foundry import Foundry
f = Foundry()
results = f.search("band gap", limit=1)
dataset = results.iloc[0].FoundryDataset
# Load all data
data = dataset.get_as_dict()Understanding the Data Structure
Most datasets have this structure:
data = {
'train': (X_train, y_train), # Inputs and targets
'test': (X_test, y_test),
}Access training data:
Loading Specific Splits
Loading with Schema
Get data and metadata together:
Data Types
Tabular Data
Most common format - dictionaries of arrays:
Working with DataFrames
HDF5 Data
For large datasets, use lazy loading:
Caching
Data is cached locally after first download:
Custom Cache Location
Clear Cache
Common Patterns
Train/Test Split
Single Target Column
Multiple Inputs
Error Handling
Last updated
Was this helpful?