3.3. Database

Up: 3. Developer guide

Prev: 3.2. Overview of the code

Next: 3.4. Storage interface

Sections:

Introduction
Core models
Other features
Schema changes and migrations

Introduction

ATR stores all of its data in a SQLite database. The database schema is defined in models.sql using SQLModel, which uses Pydantic for data validation and SQLAlchemy for database operations. This page explains the main features of the database schema to help you understand how data is structured in ATR.

Core models

The three most important models in ATR are Committee, Project, and Release.

A Committee represents a PMC or PPMC at the ASF. Each committee has a name, which is the primary key, and a full name for display purposes. Committees can have child committees, which is used for the relationship between the Incubator PMC and individual podling PPMCs. Committees also have lists of committee members and committers stored as JSON arrays.

A Project belongs to a committee and can have multiple releases. Projects have a name as the primary key, along with metadata such as a description and category and programming language tags. Each project can optionally have a ReleasePolicy that defines how releases should be handled, including e.g. vote templates and GitHub workflow configuration.

A Release belongs to a project and represents a specific version of software which is voted on by a committee. The primary key is a name derived from the project name and version. Releases have a phase that indicates their current state in the release process, from draft composition to final publication. Each release can have multiple Revision instances before final publication, representing iterations of the underlying files.

Other features

The models themselves are the most important components, but to support those models we need other components such as enumerations, column types, automatically populated fields, computed properties, and constraints.

Enumerations

ATR uses Python enumerations to ensure that certain fields only contain valid values. The most important enumeration is ReleasePhase, which defines the four phases of a release: RELEASE_CANDIDATE_DRAFT for composing, RELEASE_CANDIDATE for voting, RELEASE_PREVIEW for finishing, and RELEASE for completed releases.

The TaskStatus enumeration defines the states a task can be in: QUEUED, ACTIVE, COMPLETED, or FAILED. The TaskType enumeration lists all the different types of background tasks that ATR can execute, from signature checks to SBOM generation.

The DistributionPlatform enumeration is more complex, as each value contains not just a name but a DistributionPlatformValue with template URLs and configuration for different package distribution platforms like PyPI, npm, and Maven Central.

Special column types

SQLite does not support all the data types we need, so we use SQLAlchemy type decorators to handle conversions. The UTCDateTime type ensures that all datetime values are stored in UTC and returned as timezone-aware datetime objects. When Python code provides a datetime with timezone information, the type decorator converts it to UTC before storing. When reading from the database, it adds back the UTC timezone information.

The ResultsJSON type handles storing task results. It automatically serializes Pydantic models to JSON when writing to the database, and deserializes them back to the appropriate result model when reading.

Automatic field population

Some fields are populated automatically using SQLAlchemy event listeners. When a new Revision is created, the populate_revision_sequence_and_name function runs before the database insert. This function queries for the highest existing sequence number for the release, increments it, and sets both the seq and number fields. It also constructs the revision name by combining the release name with the revision number.

The check_release_name function runs before inserting a release. If the release name is empty, it automatically generates it from the project name and version using the release_name helper function.

Computed properties

Some properties are computed dynamically rather than stored in the database. The Release.latest_revision_number property is implemented as a SQLAlchemy column property using a correlated subquery. This means that when you access release.latest_revision_number, SQLAlchemy automatically executes a query to find the highest revision number for that release. The query is defined once in RELEASE_LATEST_REVISION_NUMBER and attached to the Release class.

Projects have many computed properties that provide access to release policy settings with appropriate defaults. For example, Project.policy_start_vote_template returns the custom vote template if one is configured, or falls back to Project.policy_start_vote_default if not. This pattern allows projects to customize their release process while providing sensible defaults.

Constraints and validation

Database constraints ensure data integrity. The Task model includes a check constraint that validates the status transitions. A task must start in QUEUED state, can only transition to ACTIVE when started and pid are set, and can only reach COMPLETED or FAILED when the completed timestamp is set. These constraints prevent invalid state transitions at the database level.

Unique constraints ensure that certain combinations of fields are unique. The Release model has a unique constraint on (project_name, version) to prevent creating duplicate releases for the same project version. The Revision model has two unique constraints: one on (release_name, seq) and another on (release_name, number), ensuring that revision numbers are unique within a release.

Schema changes and migrations

We often have to make changes to the database model in ATR, whether that be to add a whole new model or just to rename or change some existing properties. No matter the change, this involves creating a database migration. We use Alembic to perform migrations, and this allows migrations to be bidirectional: we can downgrade as well as upgrade. This can be very helpful when, for example, a migration didn't apply properly or is no longer needed due to having found a different solution.

To change the database, do not edit the SQLite directly. Instead, change the model file in atr/models/sql.py. If you're running ATR locally, you should see from its logs that the server is now broken due to having a mismatching database. That's fine! This is the point where you now create the migration. To do so, run:

uv run --frozen alembic revision -m "Description of changes" --autogenerate

Obviously, change "Description of changes" to an actual description of the changes that you made. Keep it short, around 50-60 characters. Then when you restart the server you should find that the migration is automatically applied. You should be careful, however, before restarting the server. Not all migrations apply successfully when autogenerated. Always review the automatically produced migrations in migrations/versions first, and ensure that they are correct before proceeding. One common problem is that the autogenerator leaves out server defaults. Please note that you do not need to include changes to enums in Alembic migrations, because they are not enforced in the SQLite schema.

It can be helpful to make a backup of the entire SQLite database before performing the migration, especially if the migration is particularly complex. This can help if, for example, the downgrade is broken, otherwise you may find yourself in a state from which there is no easy recovery. Always ensure that migrations are working locally before pushing them to GitHub, because we apply changes from GitHub directly to our deployed containers. Note that sometimes the deployed containers contain data that causes an error that was not caught locally. In that case there is usually no option but to provide special triage on the deployed containers.

If testing a migration in a PR, be sure to stop the server and run uv run --frozen alembic downgrade -1 before switching back to any branch not containing the migration.