3.13. Input validation
Up: 3. Developer guide
Prev: 3.12. Authorization security
Next: (none)
Sections:
- Overview
- Defense in depth
- Form validation with Pydantic
- CSRF protection
- Validation rules by input type
- Data integrity validation
- Output encoding
- File upload security
- Injection prevention
- Implementation references
Overview
Input validation is critical for ATR's security posture. As a system that handles cryptographic signatures and release artifacts, ATR must ensure that all user input is properly validated before processing. This page documents the validation strategies and patterns used throughout the codebase.
Defense in depth
ATR employs multiple layers of validation:
- Transport layer: HTTPS required, enforced by httpd
- Request layer: Size limits enforced by httpd (
MAX_CONTENT_LENGTH) - Form layer: Pydantic models validate structure and types
- Application layer: Business logic validation in route handlers
- Database layer: SQLAlchemy ORM with parameterized queries, plus constraints
- Markdown layer: Via
cmarkgfm - Output layer: Jinja2 auto-escaping for HTML output
Each layer provides independent protection, so a failure in one layer does not compromise the system.
Form validation with Pydantic
All form inputs in ATR are validated through Pydantic models defined in form.py. The base class for forms is Form, which extends Pydantic's BaseModel.
Defining form fields
Form fields are defined using Python type annotations and the label function:
class ExampleForm(Form):
name: str = label("Project name", "Enter the project name")
count: int = label("Count", widget=Widget.NUMBER)
email: EmailStr = label("Contact email", widget=Widget.EMAIL)
The label function accepts a description (shown to users), optional documentation, and an optional widget hint for rendering.
Validation process
When a form is submitted, ATR:
- Extracts form data from the request via
quart_request - Passes the data to the Pydantic model for validation
- If validation fails, collects errors via
flash_error_data - Displays errors to the user with
flash_error_summary - If validation succeeds, proceeds with the validated data
Pydantic provides built-in validators for common types (strings, integers, emails, URLs) and supports custom validators via decorators.
Custom validators
For complex validation logic, use Pydantic's @model_validator decorator:
from pydantic import model_validator
class ReleaseForm(Form):
version: str = label("Version")
@model_validator(mode="after")
def validate_version_format(self):
if not re.match(r"^\d+\.\d+\.\d+", self.version):
raise ValueError("Version must start with X.Y.Z")
return self
CSRF protection
All POST forms must include a CSRF token. The token is generated by csrf_input and validated automatically by Quart-WTF:
def csrf_input() -> htm.VoidElement:
csrf_token = utils.generate_csrf()
return htpy.input(type="hidden", name="csrf_token", value=csrf_token)
In templates, include the CSRF token in every form:
<form method="post">
{{ csrf_input() }}
<!-- other form fields -->
</form>
The CSRF token is tied to the user's session and validated on form submission. Requests without a valid CSRF token are rejected. When using the form module renderer, the CSRF token is added automatically.
Validation rules by input type
ASF User IDs
User IDs are validated against a strict pattern in principal.py:
if not re.match(r"^[-_a-z0-9]+$", user):
raise CommitterError("Invalid characters in User ID")
Only lowercase alphanumeric characters, hyphens, and underscores are permitted.
Email addresses
Email validation uses Pydantic's EmailStr type, which implements RFC 5322 validation:
from pydantic import EmailStr
class ContactForm(Form):
email: EmailStr = label("Email address")
URLs
URL validation uses Pydantic's HttpUrl type:
from pydantic import HttpUrl
class LinkForm(Form):
website: HttpUrl = label("Website URL")
Version strings
Version strings are validated according to project-specific patterns. The general pattern allows semantic versioning with optional suffixes:
VERSION_PATTERN = re.compile(r"^[0-9]+\.[0-9]+.*$")
Committee and project names
Committee and project names are validated against the set of known committees and projects from LDAP and the ASF project database. Unknown names are rejected.
File names
File names in uploads are sanitized to prevent path traversal:
- Directory separators (
/,\) and the path token..are rejected or stripped - Null bytes are rejected
- Only expected extensions are permitted per upload type
Data integrity validation
Beyond input validation, ATR performs data integrity validation on database records using validate.py. This catches inconsistencies that may have been introduced by bugs, migrations, or manual database edits.
Committee validation
The committee function checks:
child_committeesmust be empty (not used)full_namemust be set, trimmed, and not prefixed with "Apache "
Project validation
The project function checks:
categorymust use comma-separated labels without colonscommittee_namemust be set (project must be linked to a committee)createdtimestamp must be in the pastfull_namemust be set and start with "Apache "programming_languagesmust use comma-separated labels without colonsrelease_policy_idmust be None (not used)
Release validation
The release function checks:
createdtimestamp must be in the pastnamemust match the expected pattern for project and version- Release directory must exist on disk and contain files
package_managersmust be empty (not used)releasedtimestamp must be in the past or Nonesbomsmust be empty (not used)- Vote logic must be consistent (cannot have
vote_resolvedwithoutvote_started) votesmust be empty (not used)
Running validation
Data integrity validation can be run via the admin interface or programmatically:
async for divergence in validate.everything(data):
print(f"{divergence.source}: {divergence.divergence}")
Output encoding
ATR uses Jinja2 for templating with auto-escaping enabled by default. All variables rendered in templates are automatically HTML-escaped:
<!-- This is safe; user_input is escaped -->
<p>Hello, {{ user_input }}</p>
When HTML output is intentionally generated (e.g., via htpy), it must be explicitly marked safe using markupsafe.Markup:
import markupsafe
safe_html = markupsafe.Markup("<strong>Bold</strong>")
Never mark user-controlled data as safe without proper sanitization.
File upload security
File uploads are handled with several security measures:
Size limits
Maximum upload size is enforced at the httpd layer via MAX_CONTENT_LENGTH. This prevents denial-of-service attacks via large uploads.
Extension validation
Each upload type has an allowlist of permitted file extensions. Files with unexpected extensions are rejected.
Storage location
Uploaded files are stored outside the application in configured directories (e.g., state/unfinished/). They are not directly accessible via HTTP.
File handling
Files are processed via quart.datastructures.FileStorage and validated before being written to disk. Empty files (where the browser sends a file input with no selection) are filtered out.
Injection prevention
SQL injection
ATR uses SQLAlchemy ORM exclusively for database access. All queries use parameterized statements:
# Safe: parameterized query
result = await session.exec(
select(Project).where(Project.name == project_name)
)
Direct SQL string concatenation is never used.
Cross-site scripting (XSS)
XSS is prevented through:
- Jinja2 auto-escaping (enabled by default)
markupsafe.Markupfor trusted HTML only- Content Security Policy headers (configured in httpd)
Path traversal
Path traversal is prevented by:
- Using
pathlib.Pathfor all file operations - Validating that paths remain within expected directories
- Rejecting file names containing path separators
import pathlib
base = pathlib.Path("/allowed/directory")
user_path = base / user_filename
# Verify the resolved path is still under base
if not user_path.resolve().is_relative_to(base.resolve()):
raise ValueError("Path traversal detected")
Command injection
ATR safeguards against command injection as much as possible. Where external commands are necessary (e.g., GPG operations), arguments are passed as lists, never as shell strings:
import subprocess
# Safe: arguments as list
subprocess.run(["gpg", "--verify", signature_file, data_file])
# Unsafe: never do this
subprocess.run(f"gpg --verify {signature_file} {data_file}", shell=True)
Implementation references
form.py- Form definitions, validation, and renderingvalidate.py- Data integrity validatorsutil.py- Utility functions including path handlinghtm.py- HTML generation utilities