Dataset Schemas

Schemas describe what data a dataset contains, helping you understand before you load.

Getting the Schema

from foundry import Foundry

f = Foundry()
dataset = f.search("band gap", limit=1).iloc[0].FoundryDataset

schema = dataset.get_schema()

Schema Structure

{
    'name': 'foundry_oqmd_band_gaps_v1.1',
    'title': 'OQMD Band Gaps Dataset',
    'doi': '10.18126/abc123',
    'description': 'Band gaps calculated using DFT...',
    'data_type': 'tabular',
    'fields': [...],
    'splits': [...]
}

Fields

Fields describe each column/feature in the dataset:

Example output:

Splits

Splits show how data is divided:

Example output:

Data Types

The data_type field indicates the format:

Type
Description

tabular

Rows and columns (most common)

hierarchical

Nested/tree structure

image

Image data

Using Schema Information

Filter by Field Role

Check Units

Include Schema with Data

CLI Schema

Output:

Best Practices

Always Check Schema First

Validate Data Against Schema

Document Your Usage

Last updated

Was this helpful?