Contributing to CA Biositing Data Models
See the main project's CONTRIBUTING.md for general
contribution guidelines (branching, PRs, commit style).
This document covers everything specific to the ca-biositing-datamodels
package.
Package Structure
src/ca_biositing/datamodels/
├── ca_biositing/
│ └── datamodels/
│ ├── __init__.py # Package initialization and version
│ ├── config.py # Model configuration (Pydantic Settings)
│ ├── database.py # SQLModel engine and session management
│ ├── views.py # Materialized view definitions (7 views)
│ ├── models/ # Hand-written SQLModel classes
│ │ ├── __init__.py # Central re-export of all 91 models
│ │ ├── base.py # Base classes (BaseEntity, LookupBase, etc.)
│ │ ├── aim1_records/ # Aim 1 analytical records
│ │ ├── aim2_records/ # Aim 2 processing records
│ │ ├── core/ # ETL lineage and run tracking
│ │ ├── data_sources_metadata/ # Data source and dataset metadata
│ │ ├── experiment_equipment/ # Experiments and equipment
│ │ ├── external_data/ # LandIQ, USDA, Billion Ton records
│ │ ├── field_sampling/ # Field samples and collection methods
│ │ ├── general_analysis/ # Observations and analysis types
│ │ ├── infrastructure/ # Infrastructure facility records
│ │ ├── methods_parameters_units/ # Methods, parameters, units
│ │ ├── misc/ # Additional infrastructure models
│ │ ├── people/ # Contacts and providers
│ │ ├── places/ # Location and address models
│ │ ├── resource_information/ # Resources, availability, strains
│ │ └── sample_preparation/ # Prepared samples and methods
│ └── sql_schemas/ # Reference SQL files (for pgschema validation)
├── tests/
│ ├── conftest.py # Pytest fixtures and configuration
│ ├── test_biomass.py # Tests for biomass models
│ ├── test_geographic_locations.py # Tests for location models
│ └── test_package.py # Tests for package metadata
├── LICENSE
├── README.md
└── pyproject.toml
Model Categories
The 91 models span 15 domain subdirectories:
Core and Infrastructure
base.py— Base classes (BaseEntity,LookupBase,Aim1RecordBase,Aim2RecordBase)core/— ETL run tracking and lineage (EtlRun,EntityLineage,LineageGroup)infrastructure/— Infrastructure facility records (biodiesel plants, landfills, ethanol biorefineries)misc/— Additional infrastructure models (MSW digesters, SAF plants, wastewater treatment)places/— Location and address models (Place,LocationAddress,LocationResolution)people/— Contact and provider information (Contact,Provider)data_sources_metadata/— Data source tracking (DataSource,Dataset,FileObjectMetadata)
Resources and Sampling
resource_information/— Core resource entities (Resource,ResourceClass,ResourceSubclass,ResourceAvailability,Strain)field_sampling/— Field sampling data (FieldSample,HarvestMethod,CollectionMethod,SoilType)sample_preparation/— Sample processing (PreparedSample,PreparationMethod,ProcessingMethod)
Experiments and Analysis
experiment_equipment/— Experimental setup (Experiment,Equipment,ExperimentAnalysis)methods_parameters_units/— Methods, parameters, and units (Method,Parameter,Unit,MethodCategory)general_analysis/— Observations and analysis results (Observation,AnalysisType,PhysicalCharacteristic)aim1_records/— Aim 1 analytical records (proximate, ultimate, compositional, ICP, XRD, XRF)aim2_records/— Aim 2 processing records (autoclave, fermentation, gasification, pretreatment)
External Data
external_data/— Integration with external datasets (LandIQ, USDA Census, USDA Survey, Billion Ton 2023, USDA Market)
Development Setup
Install the package in editable mode from the project root using Pixi:
pixi install
Or standalone for just this package:
cd src/ca_biositing/datamodels
pip install -e .
Adding New Models
- Create model — Add a new SQLModel class in the appropriate subdirectory
under
models/, or create a new subdirectory if needed. - Re-export — Add the import to
models/__init__.pyso the model is available fromca_biositing.datamodels.models. - Generate migration — Run
pixi run migrate-autogenerate -m "Add new model". - Review — Check the generated migration in
alembic/versions/. - Apply — Run
pixi run migrateto update the database.
Modifying Existing Models
- Edit the SQLModel class in its domain subdirectory.
- Run
pixi run migrate-autogenerate -m "Describe change". - Review the migration script for accuracy.
- Run
pixi run migrate.
Schema Management
All schema changes are managed through Alembic migrations generated from SQLModel class definitions.
Materialized Views
Views are defined in ca_biositing/datamodels/views.py and managed via manual
Alembic migration scripts (not autogenerated). After loading new data, refresh
them:
pixi run refresh-views
Check view status:
pixi run schema-analytics-list
Validation with pgschema (Optional)
# Diff public schema
pixi run schema-plan
# Diff analytics schema (materialized views)
pixi run schema-analytics-plan
Testing
# Run all tests
pixi run pytest src/ca_biositing/datamodels -v
# Run a specific test file
pixi run pytest src/ca_biositing/datamodels/tests/test_biomass.py -v
# Run with coverage
pixi run pytest src/ca_biositing/datamodels --cov=ca_biositing.datamodels --cov-report=html
Code Quality
Before committing, run pre-commit checks:
pixi run pre-commit run --files src/ca_biositing/datamodels/**/*