Developer Documentation
This page provides information for developers who want to extend or contribute to Goldilocks.
Project Structure
goldilocks/
├── src/
│ └── qe_input/
│ ├── QE_input.py # Main Streamlit app entry point
│ ├── pages/ # Streamlit pages
│ │ ├── Intro.py # Data input page
│ │ ├── Chatbot_generator.py
│ │ ├── Deterministic_generator.py
│ │ └── Documentation.py
│ ├── models/ # ML model implementations
│ │ ├── alignn.py # ALIGNN model
│ │ ├── cgcnn.py # CGCNN model
│ │ ├── alignn_graph.py # ALIGNN graph construction
│ │ ├── cgcnn_graph.py # CGCNN graph construction
│ │ ├── atom_features_utils.py
│ │ └── compound_features_utils.py
│ ├── kspacing_model.py # Main prediction function
│ ├── data_utils.py # Database lookup utilities
│ ├── utils.py # Input file generation utilities
│ └── embeddings/ # Atomic feature embeddings
├── tests/ # Test suite
└── docs/ # Documentation
Key API Functions
K-Point Prediction
predict_kspacing(structure, model_name, confidence_level=0.95)
Predicts k-point spacing for a crystal structure using machine learning models.
Parameters:
- structure (pymatgen.core.structure.Structure): Crystal structure to predict k-point spacing for
- model_name (str): Model to use, either 'RF' (Random Forest) or 'ALIGNN'
- confidence_level (float): Confidence level for prediction intervals (0.95, 0.9, or 0.85)
Returns:
- tuple: (kdist, kdist_upper, kdist_lower) where:
- kdist: Predicted k-point spacing (median)
- kdist_upper: Upper bound of confidence interval
- kdist_lower: Lower bound of confidence interval
Example:
from pymatgen.core.structure import Structure
from kspacing_model import predict_kspacing
structure = Structure.from_file("structure.cif")
kdist, kdist_upper, kdist_lower = predict_kspacing(
structure,
model_name='ALIGNN',
confidence_level=0.95
)
Structure Lookup
StructureLookup(mp_api_key=None)
Class for looking up crystal structures from materials databases.
Parameters:
- mp_api_key (str, optional): Materials Project API key (required for MP lookups)
Methods:
- get_jarvis_table(formula): Search JARVIS database
- get_mp_structure_table(formula): Search Materials Project database
- get_mc3d_structure_table(formula): Search MC3D database
- get_oqmd_structure_table(formula): Search OQMD database
- select_structure_from_table(result_df, id_lookup_func): Select structure from search results
Example:
from data_utils import StructureLookup
lookup = StructureLookup(mp_api_key="your_key")
results = lookup.get_jarvis_table("SiO2")
structure, primitive = lookup.select_structure_from_table(
results,
lookup.get_jarvis_structure_by_id
)
Input File Generation
generate_input_file(structure, functional, mode, kdist, ...)
Generates a Quantum Espresso input file.
Parameters:
- structure: pymatgen Structure object
- functional (str): XC functional ('PBEsol' or 'PBE')
- mode (str): Pseudopotential mode ('efficiency' or 'precision')
- kdist (float): K-point spacing
- Additional parameters for magnetic configuration, etc.
Returns:
- str: Quantum Espresso input file content
list_of_pseudos(pseudo_potentials_folder, functional, mode, compound, target_folder)
Determines and copies required pseudopotential files for a compound.
Returns:
- tuple: (chosen_subfolder, list_of_element_files)
cutoff_limits(pseudo_potentials_cutoffs_folder, functional, mode, compound)
Determines maximum energy and density cutoffs from SSSP tables.
Returns:
- dict: Dictionary with 'ecutwfc' and 'ecutrho' keys
Machine Learning Models
ALIGNN Model
The ALIGNN (Atomistic Line Graph Neural Network) model captures both bond and angle information in crystal structures.
Key Components:
- ALIGNN_PyG: Main model class
- build_alignn_graph_with_angles_from_structure(): Constructs atomic and line graphs
- atom_features_from_structure(): Generates atomic features
Model Architecture:
- Uses both atomic graph (nodes=atoms, edges=bonds) and line graph (nodes=bonds, edges=angles)
- Supports quantile regression for uncertainty prediction
- Models are stored on Hugging Face Hub: STFC-SCD/kpoints-goldilocks-ALIGNNd
Random Forest Model
The Random Forest model provides fast k-point predictions.
Key Features:
- Uses composition and structure features
- Quantile regression for confidence intervals
- Models stored on Hugging Face Hub: STFC-SCD/kpoints-goldilocks-QRF
CGCNN Model
CGCNN (Crystal Graph Convolutional Neural Network) model is used to predict metallicity featuers which are used as input for both ALIGNN and RF models. This CGCNN model was trained on Materials Project 'is_metal' dataset (version October 2025).
Graph Construction
ALIGNN Graphs
from models.alignn_graph import build_alignn_graph_with_angles_from_structure
from models.atom_features_utils import atom_features_from_structure
atom_features = atom_features_from_structure(structure, atomic_features_config)
data_g, data_lg = build_alignn_graph_with_angles_from_structure(
structure,
atom_features,
radius=10.0,
max_neighbors=12
)
CGCNN Graphs
from models.cgcnn_graph import build_radius_cgcnn_graph_from_structure
atom_features = atom_features_from_structure(structure, atomic_features_config)
data = build_radius_cgcnn_graph_from_structure(
structure,
atom_features,
radius=10.0,
max_neighbors=12
)
Testing
The test suite uses pytest and includes:
- Unit tests for model components
- Integration tests for prediction functions
- UI tests using Streamlit Testing Library
Running tests:
pytest tests/
With coverage:
pytest --cov=src/qe_input tests/
Adding New Models
To add a new ML model for k-point prediction:
- Implement the model in
src/qe_input/models/ - Add prediction logic to
predict_kspacing()inkspacing_model.py - Upload trained model to Hugging Face Hub
- Add model option to UI in
pages/Intro.py - Write tests in
tests/
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Dependencies
Key dependencies:
- pymatgen: Materials analysis
- torch: Deep learning framework
- torch_geometric: Graph neural networks
- matminer: Materials features
- dscribe: SOAP descriptors
- streamlit: Web application framework
- huggingface_hub: Model storage and download
See pyproject.toml or requirements.txt for complete dependency list.