Testing · TikTok Army

How tests are structured

Tests live in ~/projects/tiktok-army/tests/. The runner is pytest, configured in ~/projects/tiktok-army/pyproject.toml:

[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]

asyncio_mode = "auto" means async tests don't need @pytest.mark.asyncio — pytest discovers async def test_* functions and runs them under an event loop automatically.

Run tests with:

uv run pytest tests/                                  # everything
uv run pytest tests/test_models.py                    # one file
uv run pytest tests/test_models.py::test_xxx          # one test
uv run pytest -k "trend"                              # by keyword match
uv run pytest -v                                      # verbose

Lint and type check:

uv run ruff check .
uv run mypy tiktok_army

Mock mode means no API key needed

The defaults in ~/projects/tiktok-army/tiktok_army/config.py make local + test environments key-less:

•CLAUDE_MODE=mock → lib.claude.call_claude_cached routes to lib.mock_claude.mock_call.
•TIKTOK_PROVIDER_MODE=mock → providers return fixture data from _mock_data.py.
•All secret env vars (ANTHROPIC_API_KEY, TIKTOK_APP_KEY, etc.) fall back to the placeholder string "mock-no-key-set-CLAUDE_MODE-to-real-to-use-real-keys" when not set.

This means tests can instantiate agents, run them, and assert on output structure without touching Anthropic, TikTok, or any external service. Mock fixtures are deterministic per agent + prompt hash, so the same test produces the same output every time.

The exceptions: anything that touches lib.db, lib.audit (Pub/Sub publish), or transitively Studio require their respective backends. Most existing tests work around this by mocking the DB / Pub/Sub layer or using factory functions that don't persist.

What tests exist today

Inventory as of 2026-04-26:

File	What it covers
`tests/test_models.py`	Pydantic round-trips for `TikTokAccount`, event payloads. ~120 lines.
`tests/test_transcoding.py`	`transcoding.to_tiktok_vertical` invokes ffmpeg with the right flags. ~40 lines.
`tests/test_account_health.py`	Account Health agent against mock provider data. ~300 lines.
`tests/test_audience_mapper.py`	Audience Mapper output shape + score thresholds. ~300 lines.
`tests/test_compliance.py`	Compliance findings, severity, blocked logic. ~430 lines.
`tests/test_creator_outreach.py`	Outreach DM drafting + audience overlap filter. ~450 lines.
`tests/test_listing_optimizer.py`	Variant generation, recommended index. ~540 lines.
`tests/test_trend_watcher.py`	Trend ingestion + fit scoring. ~490 lines.
`tests/test_ad_campaign_director.py`	Ad campaign planning, spend cap check. ~220 lines.
`tests/test_inventory_sync.py`	Stock level sync, low-stock flagging. ~210 lines.
`tests/test_performance_feedback.py`	Winners/losers split, diagnosis. ~210 lines.
`tests/test_skills.py` + `test_skill_wires.py` + `test_skill_wires_v2.py`	Skill helpers (creative, policy, etc.).
`tests/test_cli.py`	CLI entry point smoke. ~70 lines.

What's missing

•End-to-end workflow tests. No test today exercises WorkflowRunner.execute() against a full mock-mode workflow. The runner has unit-testable pieces (_topo_sort, _resolve_path, _resolve_step_input) but they're not directly tested. Adding tests/test_runner.py would be high-value.
•Trace persistence assertions. Tests don't assert that tiktok_agent_steps rows get written. Because tests run without a DB by default, _persist_llm_step_trace no-ops silently. To test the trace pipeline you'd need a test DB fixture.
•Approval flow tests. No test exercises routers/approvals.py. The approve/reject paths are simple but they touch DB; need a fixture.
•SSE stream tests. The orchestrator/events.py bus is testable in isolation but not tested.
•Comment Triage tests. The comment_triage.py template has full implementation but no dedicated test file. (The mock fixture is wired up; just no test asserting on it.)
•Content Producer tests. Same — full implementation, no dedicated test. This one's harder because Content Producer requires Studio + transcoding + publisher, all of which need mocking.

How to run a workflow end-to-end in mock mode

This is the most useful smoke test for a builder change. It exercises everything from WorkflowDef to synthesis, all in-memory.

From a Python script

import asyncio
from uuid import uuid4

from tiktok_army.orchestrator.runner import run_workflow_inline
from tiktok_army.orchestrator.definitions import PROFILE_AUDIT


async def main():
    workflow_run_id = uuid4()
    workspace_id = uuid4()

    # Brief shape matches what routers/workflows_api.py:run_workflow constructs.
    brief = {
        "handle": "lakucosmetics",
        "target_type": "third_party",
        "outcome": "profile_audit",
        "notes": None,
    }

    report_md = await run_workflow_inline(
        workspace_id=workspace_id,
        workflow=PROFILE_AUDIT,
        brief=brief,
        workflow_run_id=workflow_run_id,
        brand_id=None,
    )
    print("=== Report ===")
    print(report_md)


if __name__ == "__main__":
    asyncio.run(main())

Run it:

cd ~/projects/tiktok-army
uv run python /tmp/run_audit.py

You'll get the synthesis Markdown printed to stdout. The trace inserts into tiktok_workflow_steps / tiktok_agent_steps happen if a DB is reachable (they fail-soft otherwise; check the logs for claude.trace.persist_failed).

Caveat: this requires Postgres

run_workflow_inline calls _insert_step_row which uses session_for_workspace. If no DB is reachable, the script will fail at the first step's INSERT. To run end-to-end without a DB, you'd need to mock session_for_workspace or stub _insert_step_row / _update_step_row. For now, set up local Postgres (see Deploy Runbook).

From the dashboard

Easier alternative for visual verification:

# Backend
uv run uvicorn tiktok_army.main:app --reload --port 8000 &

# Dashboard
cd dashboard
TIKTOK_ARMY_API_URL=http://localhost:8000 npm run dev

Submit a brief through the UI. The dashboard shows the live DAG; per-step trace drilldown is at /runs/<id> with each step expanding to show its tiktok_agent_steps rows.

Test patterns to copy

Look at tests/test_audience_mapper.py or tests/test_account_health.py for the reference shape:

import pytest
from uuid import uuid4

from tiktok_army.agents import AudienceMapperAgent
from tiktok_army.models import AgentTriggerType


async def test_audience_mapper_happy_path():
    agent = AudienceMapperAgent()
    result = await agent.run(
        workspace_id=uuid4(),
        brand_id=None,
        trigger_type=AgentTriggerType.MANUAL,
        input_data={"handle": "lakucosmetics"},
    )
    assert "segments" in result.output
    assert all("score" in seg for seg in result.output["segments"])
    assert all(seg["score"] >= 0.5 for seg in result.output["segments"])  # min_score default
    assert result.cost_usd > 0

Key practices:

•Use await agent.run(...), not agent._execute(ctx) directly. run() exercises the full lifecycle including contextvar setup; _execute alone won't trace.
•Use uuid4() for workspace_id and brand_id to avoid colliding with other tests.
•Assert on output structure (keys present, types correct) rather than exact values — mock fixtures are deterministic, but readability of the test matters more than tight coupling to fixture values.
•Tests that go through BaseAgent.run() will try to write to tiktok_agent_runs. In dev environments without a DB, this fails. Either: skip the test in CI without a DB, mock the DB layer, or fix the issue noted in ~/projects/tiktok-army/CLAUDE.md ("anything that touches lib.db, lib.audit, or lib.claude will fail in test environments without setting required secrets via env vars").

Adding a regression test for a bug

When you fix a bug, add a test that would have caught it. The pattern:

Reproduce the bug locally (in mock mode where possible).
Write a test that fails on the broken code (run it before the fix to confirm the assertion fires).
Apply the fix.
Run the test — it should pass now.
Run the rest of the suite — nothing else should have regressed.

For workflow-level bugs (orchestrator, runner, approval flow), the missing tests/test_runner.py is where they should land.

Things to know

•asyncio_mode=auto lets you write async def test_xxx() without decorators. Pytest will run them in a fresh event loop per test.
•Tests don't reset the Claude wrapper's client. _get_client() lazily instantiates AsyncAnthropic. In mock mode this is never called; in tests that switch modes mid-run, the client is reused. Don't mix modes in a single test session.
•get_settings() is lru_cached. Changes to environment variables mid-test won't take effect unless you call get_settings.cache_clear(). Most tests don't need to mutate config; if yours does, clear the cache.
•Mock fixtures are seeded for the 13 known agents. A new agent without a fixture entry returns the generic _DEFAULT_FIXTURE. Tests that expect specific output keys will fail with KeyError if the fixture is missing.
•Lint and types are enforced. CI (when wired) will run ruff check and mypy --strict. Adding code without complete type hints will fail mypy.