This is Alpha 2 Software. You can test the process and download content, but releases must still be committed by hand to svn:dist/release (see svn-dist Transition Plan 1B).

3.14. Input validation

Up: 3. Developer guide

Prev: 3.13. Authorization security

Next: 3.15. TLS security configuration

Sections:

Overview

Input validation is critical for ATR's security posture. As a system that handles cryptographic signatures and release artifacts, ATR must ensure that all user input is properly validated before processing. This page documents the validation strategies and patterns used throughout the codebase.

Defense in depth

ATR employs multiple layers of validation:

  1. Transport layer: HTTPS required, enforced by httpd
  2. Request layer: Size limits enforced by httpd (MAX_CONTENT_LENGTH)
  3. Form layer: Pydantic models validate structure and types
  4. Application layer: Business logic validation in route handlers, interaction helpers, and storage writers
  5. Database layer: SQLAlchemy ORM with parameterized queries, plus constraints
  6. Markdown layer: Via cmarkgfm
  7. Output layer: Jinja2 auto-escaping for HTML output

Each layer provides independent protection, so a failure in one layer does not compromise the system.

Form validation with Pydantic

All form inputs in ATR are validated through Pydantic models defined in form.py. The base class for forms is Form, which extends Pydantic's BaseModel.

Defining form fields

Form fields are defined using Python type annotations and the label function:

class ExampleForm(Form):
    name: str = label("Project name", "Enter the project name")
    count: int = label("Count", widget=Widget.NUMBER)
    email: EmailStr = label("Contact email", widget=Widget.EMAIL)

The label function accepts a description (shown to users), optional documentation, and an optional widget hint for rendering.

Validation process

When a form is submitted, ATR:

  1. Extracts form data from the request via quart_request
  2. Passes the data to the Pydantic model for validation
  3. If validation fails, collects errors via flash_error_data
  4. Displays errors to the user with flash_error_summary
  5. If validation succeeds, proceeds with the validated data

Pydantic provides built-in validators for common types (strings, integers, emails, URLs) and supports custom validators via decorators.

Custom validators

For complex validation logic, use Pydantic's @model_validator decorator:

from pydantic import model_validator

class ReleaseForm(Form):
    version: str = label("Version")

    @model_validator(mode="after")
    def validate_version_format(self):
        if not re.match(r"^\d+\.\d+\.\d+", self.version):
            raise ValueError("Version must start with X.Y.Z")
        return self

CSRF protection

All POST forms must include a CSRF token. The token is generated by csrf_input and validated automatically by Quart-WTF:

def csrf_input() -> htm.VoidElement:
    csrf_token = utils.generate_csrf()
    return htpy.input(type="hidden", name="csrf_token", value=csrf_token)

In templates, include the CSRF token in every form:

<form method="post">
    {{ csrf_input() }}
    <!-- other form fields -->
</form>

The CSRF token is tied to the user's session and validated on form submission. Requests without a valid CSRF token are rejected. When using the form module renderer, the CSRF token is added automatically.

Validation rules by input type

ASF User IDs

User IDs are validated against a strict pattern in principal.py:

if not re.match(r"^[-_a-z0-9]+$", user):
    raise CommitterError("Invalid characters in User ID")

Only lowercase alphanumeric characters, hyphens, and underscores are permitted.

Email addresses

Email validation uses Pydantic's EmailStr type, which implements RFC 5322 validation:

from pydantic import EmailStr

class ContactForm(Form):
    email: EmailStr = label("Email address")

URLs

URL validation uses Pydantic's HttpUrl type:

from pydantic import HttpUrl

class LinkForm(Form):
    website: HttpUrl = label("Website URL")

Version strings

Version strings are validated according to project-specific patterns. The general pattern allows semantic versioning with optional suffixes:

VERSION_PATTERN = re.compile(r"^[0-9]+\.[0-9]+.*$")

Committee and project names

Committee and project names are validated against the set of known committees and projects from LDAP and the ASF project database. Unknown names are rejected.

File names

File names in uploads are sanitized to prevent path traversal:

  • Directory separators (/, \) and the path token .. are rejected or stripped
  • Null bytes are rejected
  • Only expected extensions are permitted per upload type

Data integrity validation

Beyond input validation, ATR performs data integrity validation on database records using validate.py. This catches inconsistencies that may have been introduced by bugs, migrations, or manual database edits.

Committee validation

The committee function checks:

  • child_committees must be empty (not used)
  • full_name must be set, trimmed, and not prefixed with "Apache "

Project validation

The project function checks:

  • category must use comma-separated labels without colons
  • committee_key must be set (project must be linked to a committee)
  • created timestamp must be in the past
  • full_name must be set and start with "Apache "
  • programming_languages must use comma-separated labels without colons
  • release_policy_id must be None (not used)

Release validation

The release function checks:

  • created timestamp must be in the past
  • name must match the expected pattern for project and version
  • Release directory must exist on disk and contain files
  • package_managers must be empty (not used)
  • released timestamp must be in the past or None
  • sboms must be empty (not used)
  • Vote logic must be consistent (cannot have vote_resolved without vote_started)
  • votes must be empty (not used)

Running validation

Data integrity validation can be run via the admin interface or programmatically:

async for divergence in validate.everything(data):
    print(f"{divergence.source}: {divergence.divergence}")

These validators are complementary to the live checks described below. Data integrity validation inspects stored records for drift or corruption. Business logic validation stops inconsistent actions before ATR accepts them.

Business logic validation

Field validation is only the first step. ATR also checks whether an action still makes sense in the wider state of the release. These rules compare data across releases, revisions, committees, queued tasks, stored policy, and message delivery settings. They live mainly in interaction.py, storage writers, mail.py, and shared helpers in util.py.

Vote initiation

Before a vote can start, release_ready_for_vote checks that the release still has a latest revision, that the requested revision is that latest revision, and that the release is still attached to a committee. It also checks that the requested vote mode agrees with the stored project policy, so ATR does not let a user start a manual vote for a project configured for standard voting, or the reverse.

That same validation step then checks the surrounding release state. The user must be a committee member for the project or an ATR administrator. The selected revision must have no blocker results, and the release candidate draft must contain files. When ATR actually promotes the release into the voting phase, promote_to_candidate adds a task state check and refuses the transition while queued or active tasks still exist for that revision. This binds vote initiation to release phase, revision state, policy, committee membership, check results, file storage, and task execution rather than to form fields alone.

Trusted Publishing

Trusted Publishing settings are validated when they are stored and again when they are used. On write, validate_trusted_publishing_constraints and policy.py normalize the configured repository name, branch, and workflow paths and reject incomplete or impossible combinations. A workflow path cannot be stored without a repository name. A branch cannot be stored without a repository name. Repository names are stored without a slash. Every workflow path must begin with .github/workflows/.

At request time, _trusted_project_checks and _trusted_project compare the GitHub token claims with the stored policy. The repository must be under apache. The workflow reference must begin with that same repository, must include a git ref, and must resolve to a workflow path under .github/workflows/. ATR then looks up the project by repository name and by the phase specific workflow path that was stored for compose, vote, or finish. Distribution callbacks add one more contextual check in trusted_jwt_for_dist, which refuses the request unless the named release exists and is in the expected phase for the requested operation. The cryptographic validation of the token itself is described in authentication security.

Email delivery

Email validation in ATR also depends on context. validate_email_recipients requires a primary recipient and rejects duplicate addresses across To, Cc, and Bcc. send then requires the sender to use @apache.org, and _validate_recipient rejects any envelope recipient outside @apache.org and its subdomains. This means that vote and release mail must go to ASF controlled addresses even if the address itself would be syntactically valid.

Output encoding

ATR uses Jinja2 for templating with auto-escaping enabled by default. All variables rendered in templates are automatically HTML-escaped:

<!-- This is safe; user_input is escaped -->
<p>Hello, {{ user_input }}</p>

When HTML output is intentionally generated (e.g., via htpy), it must be explicitly marked safe using markupsafe.Markup:

import markupsafe
safe_html = markupsafe.Markup("<strong>Bold</strong>")

For Markdown rendering, ATR uses markupsafe.Markup(cmarkgfm.github_flavored_markdown_to_html(markdown_text)), which safely filters dangerous input before rendering.

Never mark user-controlled data as safe without proper sanitization.

File upload security

File uploads are handled with several security measures:

Size limits

Maximum upload size is enforced at the httpd layer via MAX_CONTENT_LENGTH. This prevents denial-of-service attacks via large uploads.

Extension validation

Each upload type has an allowlist of permitted file extensions. Files with unexpected extensions are rejected.

Storage location

Uploaded files are stored outside the application in configured directories (e.g., state/unfinished/). They are not directly accessible via HTTP.

File handling

Files are processed via quart.datastructures.FileStorage and validated before being written to disk. Empty files (where the browser sends a file input with no selection) are filtered out.

Injection prevention

SQL injection

ATR uses SQLAlchemy ORM exclusively for database access. All queries use parameterized statements:

# Safe: parameterized query
result = await session.exec(
    select(Project).where(Project.key == project_name)
)

Direct SQL string concatenation is never used.

Cross-site scripting (XSS)

XSS is prevented through:

  • Jinja2 auto-escaping (enabled by default)
  • markupsafe.Markup for trusted HTML only
  • Content Security Policy headers (configured in httpd)

Path traversal

Path traversal is prevented by:

  • Using pathlib.Path for all file operations
  • Validating that paths remain within expected directories
  • Rejecting file names containing path separators

For form fields that accept file or directory paths, always use form.RelPath (or form.RelPathList for multiple paths). These types automatically call to_relpath(), which rejects path traversal sequences, absolute paths, and empty values at the Pydantic validation layer. This is the preferred approach because it prevents path traversal before the handler code runs.

For cases outside of form validation (e.g., URL route parameters), use form.to_relpath() directly, or validate manually:

import pathlib

base = pathlib.Path("/allowed/directory")
user_path = base / user_filename
# Verify the resolved path is still under base
if not user_path.resolve().is_relative_to(base.resolve()):
    raise ValueError("Path traversal detected")

Command injection

ATR safeguards against command injection as much as possible. Where external commands are necessary (e.g., GPG operations), arguments are passed as lists, never as shell strings:

import subprocess

# Safe: arguments as list
subprocess.run(["gpg", "--verify", signature_file, data_file])

# Unsafe: never do this
subprocess.run(f"gpg --verify {signature_file} {data_file}", shell=True)

Implementation references