Bulletproof Python β Unit Testing & Data Validation¶
This session is all about writing high-quality, testable code. We'll introduce unit testing and practice writing tests for a Python application using the pytest framework. We'll emphasize the principles of test-driven development (TDD) to build features that are robust from the start. Next, we'll introduce data validation and error handling. Students will learn how to write tests that specifically challenge Pydantic models to ensure that incoming data is always correctly formatted. We'll also cover best practices for creating custom validation logic.
Introduction: Why Test?¶
When building complex systemsβwhether orchestrating drones, chaining LLM agents, or processing large datasetsβsilent failures are your worst enemy. A badly formatted string or an unexpected null value can crash a pipeline hours into a run.
Unit testing allows us to verify that individual components of our code work exactly as intended in isolation.
Core Concept
A Unit Test exercises the smallest testable part of an application (like a single function or method) to ensure it behaves correctly under various conditions.
Test-Driven Development (TDD)¶
Test-Driven Development flips the traditional script: you write the test before you write the code. This forces you to think about the desired behavior and API design first.
graph TD
A[π΄ RED: Write a failing test] --> B[π’ GREEN: Write just enough code to pass]
B --> C[π΅ REFACTOR: Clean up and optimize]
C --> A
style A fill:#ffcccc,stroke:#cc0000
style B fill:#ccffcc,stroke:#009900
style C fill:#ccccff,stroke:#0000cc
Project Structure: Organizing for Scale¶
Before we write a single test, we need to know where they live. As your application grows, dumping all your tests into a single test.py file becomes completely unmanageable.
The industry standard for Python projects is the src layout. This explicitly separates your application code from your testing code. It prevents import bleed, forces you to test your code exactly as a user would import it, and ensures you don't accidentally package your test suite into your final production build.
Here is how a professional Python repository is structured:
my_agent_project/
βββ pyproject.toml # Project metadata and dependencies
βββ src/ # ALL production code lives inside here
β βββ my_app/ # Your main package
β βββ __init__.py
β βββ agents/ # Submodule for agents
β β βββ __init__.py
β β βββ router.py # E.g., The agent routing logic
β βββ utils/ # Submodule for utilities
β βββ __init__.py
β βββ text.py # E.g., Text cleaning functions
βββ tests/ # ALL testing code lives here
βββ __init__.py
βββ agents/
β βββ test_router.py # Tests for router.py
βββ utils/
βββ test_text.py # Tests for text.py
The Golden Rules of Test Organization¶
- Strict Separation: Keep your
tests/directory entirely outside of yoursrc/directory. - Mirror the Architecture: Your
tests/folder should act as a perfect reflection of yoursrc/my_app/folder. If you have a module atsrc/my_app/utils/text.py, its corresponding test file should live attests/utils/test_text.py. This mirroring makes finding the relevant tests instant, even in massive codebases. - The
test_Prefix Convention:pytestrelies on automatic discovery. It will recursively search your entire project and automatically run: - Any file named
test_*.pyor*_test.py - Any function inside those files starting with
test_
Best Practice: __init__.py in Tests
Notice the __init__.py files inside the tests/ subdirectories. While pytest doesn't strictly require them to find your tests, including them prevents name collisions if you happen to have two test files with the exact same name in different subdirectories (e.g., tests/agents/test_helpers.py and tests/utils/test_helpers.py).
The pytest Progression: From Simple to Production-Ready¶
We will use pytest, the industry standard for Python testing. It automatically discovers any files starting with test_ and any functions inside them starting with test_. To master unit testing, we need to understand that tests scale in complexity alongside our application logic.
Level 1: The Absolute Basics (Testing Pure Functions)¶
A pure function always produces the same output for the same input and has no side effects. These are the easiest to test.
| text_utils.py | |
|---|---|
| tests/text_utils.py | |
|---|---|
Pro Tip: Arrange, Act, Assert (AAA)
Notice the comments in the first test. This is the AAA pattern. Mentally following this pattern keeps your tests focused and readable.
Level 2: Testing Edge Cases and Exceptions¶
Production code needs to aggressively reject bad data. We can test that the correct errors are raised safely.
Level 3: The Fake Out (Mocking External Services)¶
What happens when our function makes a network call to OpenAI or a database? The test becomes slow, requires the internet, and costs money. To solve this, we use Mocking.
sequenceDiagram
participant Test
participant Application
participant MockedAPI as Mocked LLM API
participant RealAPI as Real LLM API
Test->>Application: Call generate_summary()
Note over Application,MockedAPI: We "Patch" the real API out
Application->>MockedAPI: Request Summary
MockedAPI-->>Application: Return "Fake Summary" (Instantly, Free)
Application-->>Test: Assert == "Fake Summary"
%% The real API is never reached
RealAPI--xRealAPI: (Untouched)
| app.py | |
|---|---|
Level 4: Fixtures and Parametrization¶
As tests grow, setups become complex. Fixtures handle reusable setup/teardown logic (like creating temporary files or database connections), while Parametrization lets you run the same test with different data.
Property-Based Testing with Hypothesis¶
Up until now, we've used Example-Based Testing. We hardcode exact inputs. But what if we forget an edge case?
Property-Based Testing solves this. You define the properties (rules) that should always hold true, and the framework throws thousands of randomly generated edge cases at it.
graph TB
subgraph Example-Based Testing
A[Developer writes inputs: 2 + 2] --> B[Test runs 1 time]
B --> C[Assert result is 4]
end
subgraph Property-Based Testing
D[Developer defines strategy: ANY integers] --> E[Framework generates 100+ edge cases]
E --> F[Test runs 100+ times]
F --> G[Assert property: a + b = b + a]
end
style A fill:#f9f2f4,stroke:#d04437
style D fill:#e3fcefc,stroke:#14892c
Hypothesis is the premier property-based testing library for Python. If it finds a failing input, it performs Shrinkingβsystematically reducing the bad data to the absolute minimum input required to trigger your bug.
Structural Validation with Pydantic¶
When accepting dataβlike JSON payloads or API configsβwe need absolute certainty about its shape and types before we process it. Pydantic enforces type hints at runtime.
Warning
Don't just test the "happy path." Always write tests that deliberately feed Pydantic malicious, malformed, or out-of-bounds data.
Tabular Data Validation with Pandera¶
While Pydantic is brilliant for object-level data, data engineering relies heavily on tabular data. pandera allows us to define statistical schemas for pandas DataFrames.
graph LR
A[Raw DataFrame] --> B{Pandera Schema}
B -- Valid --> C[Data Pipeline]
B -- Invalid --> D[SchemaError raised]
Suggested Practice Exercises¶
To solidify these concepts, work through these three exercises. They combine the tools we've learned to mirror real-world production scenarios, scaling from simple assertions to complex property validation.
Exercise 1: The Warm-Up & The Time Machine (Mocking)¶
Goal: Practice basic assertions, exception handling, and mocking a slow local function (no APIs required!).
Part A: Basic Assertions & Exceptions
- Write a pure function called
calculate_batch_size(total_records: int, workers: int) -> int. It should return the floor division of records per worker. - If
workersis0, it should explicitly raise aValueErrorwith the message:"Cannot divide work among 0 workers." - Write two
pytestfunctions:- One that tests a standard input (e.g., 100 records, 4 workers) using standard
assertstatements. - One that uses
with pytest.raises(ValueError)to ensure the exact error message is triggered when workers are 0.
- One that tests a standard input (e.g., 100 records, 4 workers) using standard
Part B: Mocking a Slow Function
- Create a dummy function to simulate a heavy workload:
import time
def _run_heavy_computation(data: list) -> int:
"""Simulates a massive, 10-second data transformation."""
time.sleep(10)
return len(data) * 42
- Write a main function called
process_dataset(data: list)that simply calls_run_heavy_computation(data)and returns the result. - The Challenge: If you test
process_datasetnormally, your test suite will pause for 10 seconds. Write a test using the@patchdecorator to mock_run_heavy_computation. Configure your mock to instantly return999. - Assert that your
process_datasetfunction returns999and that your mock wascalled_once(). You just time-traveled past a 10-second wait!
Exercise 2: Bulletproofing with Hypothesis & Pydantic¶
Goal: Combine property-based testing with structural validation to catch bizarre edge cases.
- Create a Pydantic model called
UserAccountwith fields:username(string, min 3 chars),age(int, strictly greater than 18), andemail(string). - Write a custom Pydantic
@field_validatorfor the email field that ensures it contains an@symbol. - Use Hypothesis
@givento generate entirely random strings and integers for these three fields. - Write a test that attempts to instantiate
UserAccountwith this completely random Hypothesis data inside atry/except ValidationErrorblock. - The Logic: If a
ValidationErroris raised, the test shouldpass(because Pydantic successfully blocked bad data). If the model initializes successfully without an error, writeassertstatements to prove that the randomly generated data genuinely meets all your strict criteria (e.g.,assert account.age > 18).
Exercise 3: The Data Pipeline Sandbox¶
Goal: Practice pytest setup/teardown lifecycle fixtures and Pandera DataFrame schemas.
- Define a Pandera schema for a CSV containing housing data:
price(float, strictly > 0),bedrooms(int, strictly > 0), andzipcode(string, length exactly 5). -
Write a
pytestfixture using theyieldkeyword that:- Creates a temporary CSV file.
- Writes a few rows of valid data and one row of explicitly invalid data (e.g., a negative price) into the file.
- Yields the temporary file path to the test.
- Deletes the file in the teardown phase.
-
Write a test that accepts the fixture, loads the temporary CSV via pandas, and attempts to validate it with your Pandera schema.
- Assert that a
SchemaErroris caught, proving your pipeline will safely reject bad files before they infect your database.