FAQ
General
What is Foundry?
Foundry-ML is a Python library for discovering and loading machine learning-ready datasets in materials science and chemistry. It provides standardized access to curated scientific datasets with rich metadata.
Is Foundry free?
Yes. Foundry is open source and the datasets are freely available. Some datasets may have specific licenses - check the citation information for details.
Do I need to create an account?
No account is required for basic usage with HTTPS download. Some features (like Globus transfers) may require authentication.
Installation
What Python version do I need?
Python 3.8 or higher.
How do I install Foundry?
pip install foundry-mlI get import errors after installing
Try upgrading:
Data Loading
Why is my first download slow?
Data is downloaded on first access and cached locally. Subsequent loads are fast.
Where is data cached?
By default, in your home directory. To change:
How do I clear the cache?
Can I use Foundry offline?
You need internet to search and download datasets. Once cached, data loads locally.
Cloud Environments
How do I use Foundry in Google Colab?
Does it work with Jupyter on a remote server?
Yes, use the same settings:
Data Format
What format is the data in?
Most datasets use a dictionary format:
How do I get a pandas DataFrame?
Does it work with PyTorch?
Yes:
Does it work with TensorFlow?
Yes:
Publishing
How do I publish my own dataset?
See Publishing Datasets for the full workflow.
What metadata format is required?
Foundry uses DataCite-compliant metadata. See Metadata Reference.
Can I update a published dataset?
Create a new version with an updated source_id (e.g., my_dataset_v2).
Globus
Do I need Globus?
No. HTTPS download is the default and works for most use cases.
When should I use Globus?
For very large datasets (>10GB) or if you have institutional Globus endpoints.
How do I enable Globus?
You'll need Globus Connect Personal running.
AI Integration
How do I use Foundry with Claude?
Install the MCP server:
Restart Claude Code. You can now ask Claude to find and load datasets.
What AI assistants are supported?
Any MCP-compatible assistant. Currently tested with Claude Code.
HuggingFace
Can I export to HuggingFace Hub?
Yes:
Who is listed as author on HuggingFace?
The original dataset creators from the DataCite metadata, not the person pushing.
Troubleshooting
I get "Dataset not found"
Check:
The DOI is correct
Try a broader search term
List available datasets:
f.list()
Download keeps failing
Try:
Check your internet connection
Try again (transient errors)
If using Globus, switch to HTTPS:
f = Foundry(use_globus=False)
The data format is unexpected
Check the schema first:
More Help
Last updated
Was this helpful?