3.14. Input validation
Up: 3. Developer guide
Prev: 3.13. Authorization security
Next: 3.15. TLS security configuration
Sections:
- Overview
- Defense in depth
- Form validation with Pydantic
- CSRF protection
- Validation rules by input type
- Data integrity validation
- Business logic validation
- Output encoding
- File upload security
- Injection prevention
- Implementation references
Overview
Input validation is critical for ATR's security posture. As a system that handles cryptographic signatures and release artifacts, ATR must ensure that all user input is properly validated before processing. This page documents the validation strategies and patterns used throughout the codebase.
Defense in depth
ATR employs multiple layers of validation:
- Transport layer: HTTPS required, enforced by httpd
- Request layer: Size limits enforced by httpd (
MAX_CONTENT_LENGTH) - Form layer: Pydantic models validate structure and types
- Application layer: Business logic validation in route handlers, interaction helpers, and storage writers
- Database layer: SQLAlchemy ORM with parameterized queries, plus constraints
- Markdown layer: Via
cmarkgfm - Output layer: Jinja2 auto-escaping for HTML output
Each layer provides independent protection, so a failure in one layer does not compromise the system.
Form validation with Pydantic
All form inputs in ATR are validated through Pydantic models defined in form.py. The base class for forms is Form, which extends Pydantic's BaseModel.
Defining form fields
Form fields are defined using Python type annotations and the label function:
class ExampleForm(Form):
name: str = label("Project name", "Enter the project name")
count: int = label("Count", widget=Widget.NUMBER)
email: EmailStr = label("Contact email", widget=Widget.EMAIL)
The label function accepts a description (shown to users), optional documentation, and an optional widget hint for rendering.
Validation process
When a form is submitted, ATR:
- Extracts form data from the request via
quart_request - Passes the data to the Pydantic model for validation
- If validation fails, collects errors via
flash_error_data - Displays errors to the user with
flash_error_summary - If validation succeeds, proceeds with the validated data
Pydantic provides built-in validators for common types (strings, integers, emails, URLs) and supports custom validators via decorators.
Custom validators
For complex validation logic, use Pydantic's @model_validator decorator:
from pydantic import model_validator
class ReleaseForm(Form):
version: str = label("Version")
@model_validator(mode="after")
def validate_version_format(self):
if not re.match(r"^\d+\.\d+\.\d+", self.version):
raise ValueError("Version must start with X.Y.Z")
return self
CSRF protection
All POST forms must include a CSRF token. The token is generated by csrf_input and validated automatically by Quart-WTF:
def csrf_input() -> htm.VoidElement:
csrf_token = utils.generate_csrf()
return htpy.input(type="hidden", name="csrf_token", value=csrf_token)
In templates, include the CSRF token in every form:
<form method="post">
{{ csrf_input() }}
<!-- other form fields -->
</form>
The CSRF token is tied to the user's session and validated on form submission. Requests without a valid CSRF token are rejected. When using the form module renderer, the CSRF token is added automatically.
Validation rules by input type
ASF User IDs
User IDs are validated against a strict pattern in principal.py:
if not re.match(r"^[-_a-z0-9]+$", user):
raise CommitterError("Invalid characters in User ID")
Only lowercase alphanumeric characters, hyphens, and underscores are permitted.
Email addresses
Email validation uses Pydantic's EmailStr type, which implements RFC 5322 validation:
from pydantic import EmailStr
class ContactForm(Form):
email: EmailStr = label("Email address")
URLs
URL validation uses Pydantic's HttpUrl type:
from pydantic import HttpUrl
class LinkForm(Form):
website: HttpUrl = label("Website URL")
Version strings
Version strings are validated according to project-specific patterns. The general pattern allows semantic versioning with optional suffixes:
VERSION_PATTERN = re.compile(r"^[0-9]+\.[0-9]+.*$")
Committee and project names
Committee and project names are validated against the set of known committees and projects from LDAP and the ASF project database. Unknown names are rejected.
File names
File names in uploads are sanitized to prevent path traversal:
- Directory separators (
/,\) and the path token..are rejected or stripped - Null bytes are rejected
- Only expected extensions are permitted per upload type
Data integrity validation
Beyond input validation, ATR performs data integrity validation on database records using validate.py. This catches inconsistencies that may have been introduced by bugs, migrations, or manual database edits.
Committee validation
The committee function checks:
child_committeesmust be empty (not used)full_namemust be set, trimmed, and not prefixed with "Apache "
Project validation
The project function checks:
categorymust use comma-separated labels without colonscommittee_keymust be set (project must be linked to a committee)createdtimestamp must be in the pastfull_namemust be set and start with "Apache "programming_languagesmust use comma-separated labels without colonsrelease_policy_idmust be None (not used)
Release validation
The release function checks:
createdtimestamp must be in the pastnamemust match the expected pattern for project and version- Release directory must exist on disk and contain files
package_managersmust be empty (not used)releasedtimestamp must be in the past or Nonesbomsmust be empty (not used)- Vote logic must be consistent (cannot have
vote_resolvedwithoutvote_started) votesmust be empty (not used)
Running validation
Data integrity validation can be run via the admin interface or programmatically:
async for divergence in validate.everything(data):
print(f"{divergence.source}: {divergence.divergence}")
These validators are complementary to the live checks described below. Data integrity validation inspects stored records for drift or corruption. Business logic validation stops inconsistent actions before ATR accepts them.
Business logic validation
Field validation is only the first step. ATR also checks whether an action still makes sense in the wider state of the release. These rules compare data across releases, revisions, committees, queued tasks, stored policy, and message delivery settings. They live mainly in interaction.py, storage writers, mail.py, and shared helpers in util.py.
Vote initiation
Before a vote can start, release_ready_for_vote checks that the release still has a latest revision, that the requested revision is that latest revision, and that the release is still attached to a committee. It also checks that the requested vote mode agrees with the stored project policy, so ATR does not let a user start a manual vote for a project configured for standard voting, or the reverse.
That same validation step then checks the surrounding release state. The user must be a committee member for the project or an ATR administrator. The selected revision must have no blocker results, and the release candidate draft must contain files. When ATR actually promotes the release into the voting phase, promote_to_candidate adds a task state check and refuses the transition while queued or active tasks still exist for that revision. This binds vote initiation to release phase, revision state, policy, committee membership, check results, file storage, and task execution rather than to form fields alone.
Trusted Publishing
Trusted Publishing settings are validated when they are stored and again when they are used. On write, validate_trusted_publishing_constraints and policy.py normalize the configured repository name, branch, and workflow paths and reject incomplete or impossible combinations. A workflow path cannot be stored without a repository name. A branch cannot be stored without a repository name. Repository names are stored without a slash. Every workflow path must begin with .github/workflows/.
At request time, _trusted_project_checks and _trusted_project compare the GitHub token claims with the stored policy. The repository must be under apache. The workflow reference must begin with that same repository, must include a git ref, and must resolve to a workflow path under .github/workflows/. ATR then looks up the project by repository name and by the phase specific workflow path that was stored for compose, vote, or finish. Distribution callbacks add one more contextual check in trusted_jwt_for_dist, which refuses the request unless the named release exists and is in the expected phase for the requested operation. The cryptographic validation of the token itself is described in authentication security.
Email delivery
Email validation in ATR also depends on context. validate_email_recipients requires a primary recipient and rejects duplicate addresses across To, Cc, and Bcc. send then requires the sender to use @apache.org, and _validate_recipient rejects any envelope recipient outside @apache.org and its subdomains. This means that vote and release mail must go to ASF controlled addresses even if the address itself would be syntactically valid.
Output encoding
ATR uses Jinja2 for templating with auto-escaping enabled by default. All variables rendered in templates are automatically HTML-escaped:
<!-- This is safe; user_input is escaped -->
<p>Hello, {{ user_input }}</p>
When HTML output is intentionally generated (e.g., via htpy), it must be explicitly marked safe using markupsafe.Markup:
import markupsafe
safe_html = markupsafe.Markup("<strong>Bold</strong>")
For Markdown rendering, ATR uses markupsafe.Markup(cmarkgfm.github_flavored_markdown_to_html(markdown_text)), which safely filters dangerous input before rendering.
Never mark user-controlled data as safe without proper sanitization.
File upload security
File uploads are handled with several security measures:
Size limits
Maximum upload size is enforced at the httpd layer via MAX_CONTENT_LENGTH. This prevents denial-of-service attacks via large uploads.
Extension validation
Each upload type has an allowlist of permitted file extensions. Files with unexpected extensions are rejected.
Storage location
Uploaded files are stored outside the application in configured directories (e.g., state/unfinished/). They are not directly accessible via HTTP.
File handling
Files are processed via quart.datastructures.FileStorage and validated before being written to disk. Empty files (where the browser sends a file input with no selection) are filtered out.
Injection prevention
SQL injection
ATR uses SQLAlchemy ORM exclusively for database access. All queries use parameterized statements:
# Safe: parameterized query
result = await session.exec(
select(Project).where(Project.key == project_name)
)
Direct SQL string concatenation is never used.
Cross-site scripting (XSS)
XSS is prevented through:
- Jinja2 auto-escaping (enabled by default)
markupsafe.Markupfor trusted HTML only- Content Security Policy headers (configured in httpd)
Path traversal
Path traversal is prevented by:
- Using
pathlib.Pathfor all file operations - Validating that paths remain within expected directories
- Rejecting file names containing path separators
For form fields that accept file or directory paths, always use form.RelPath (or form.RelPathList for multiple paths). These types automatically call to_relpath(), which rejects path traversal sequences, absolute paths, and empty values at the Pydantic validation layer. This is the preferred approach because it prevents path traversal before the handler code runs.
For cases outside of form validation (e.g., URL route parameters), use form.to_relpath() directly, or validate manually:
import pathlib
base = pathlib.Path("/allowed/directory")
user_path = base / user_filename
# Verify the resolved path is still under base
if not user_path.resolve().is_relative_to(base.resolve()):
raise ValueError("Path traversal detected")
Command injection
ATR safeguards against command injection as much as possible. Where external commands are necessary (e.g., GPG operations), arguments are passed as lists, never as shell strings:
import subprocess
# Safe: arguments as list
subprocess.run(["gpg", "--verify", signature_file, data_file])
# Unsafe: never do this
subprocess.run(f"gpg --verify {signature_file} {data_file}", shell=True)
Implementation references
form.py- Form definitions, validation, and renderingdb/interaction.py- Contextual validation for vote initiation and Trusted Publishingstorage/writers/policy.py- Trusted Publishing policy validation and normalisationstorage/writers/release.py- Release phase transition checks and task state guardsvalidate.py- Data integrity validatorsutil.py- Shared validation helpers including path handling, recipients, and Trusted Publishingmail.py- Sender and recipient domain enforcementhtm.py- HTML generation utilities