How tests are structured
Tests live in ~/projects/tiktok-army/tests/. The runner is pytest, configured in ~/projects/tiktok-army/pyproject.toml:
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]asyncio_mode = "auto" means async tests don't need @pytest.mark.asyncio — pytest discovers async def test_* functions and runs them under an event loop automatically.
Run tests with:
uv run pytest tests/ # everything
uv run pytest tests/test_models.py # one file
uv run pytest tests/test_models.py::test_xxx # one test
uv run pytest -k "trend" # by keyword match
uv run pytest -v # verboseLint and type check:
uv run ruff check .
uv run mypy tiktok_armyMock mode means no API key needed
The defaults in ~/projects/tiktok-army/tiktok_army/config.py make local + test environments key-less:
- •
CLAUDE_MODE=mock→lib.claude.call_claude_cachedroutes tolib.mock_claude.mock_call. - •
TIKTOK_PROVIDER_MODE=mock→ providers return fixture data from_mock_data.py. - •All secret env vars (
ANTHROPIC_API_KEY,TIKTOK_APP_KEY, etc.) fall back to the placeholder string"mock-no-key-set-CLAUDE_MODE-to-real-to-use-real-keys"when not set.
This means tests can instantiate agents, run them, and assert on output structure without touching Anthropic, TikTok, or any external service. Mock fixtures are deterministic per agent + prompt hash, so the same test produces the same output every time.
The exceptions: anything that touches lib.db, lib.audit (Pub/Sub publish), or transitively Studio require their respective backends. Most existing tests work around this by mocking the DB / Pub/Sub layer or using factory functions that don't persist.
What tests exist today
Inventory as of 2026-04-26:
| File | What it covers |
|---|---|
tests/test_models.py | Pydantic round-trips for TikTokAccount, event payloads. ~120 lines. |
tests/test_transcoding.py | transcoding.to_tiktok_vertical invokes ffmpeg with the right flags. ~40 lines. |
tests/test_account_health.py | Account Health agent against mock provider data. ~300 lines. |
tests/test_audience_mapper.py | Audience Mapper output shape + score thresholds. ~300 lines. |
tests/test_compliance.py | Compliance findings, severity, blocked logic. ~430 lines. |
tests/test_creator_outreach.py | Outreach DM drafting + audience overlap filter. ~450 lines. |
tests/test_listing_optimizer.py | Variant generation, recommended index. ~540 lines. |
tests/test_trend_watcher.py | Trend ingestion + fit scoring. ~490 lines. |
tests/test_ad_campaign_director.py | Ad campaign planning, spend cap check. ~220 lines. |
tests/test_inventory_sync.py | Stock level sync, low-stock flagging. ~210 lines. |
tests/test_performance_feedback.py | Winners/losers split, diagnosis. ~210 lines. |
tests/test_skills.py + test_skill_wires.py + test_skill_wires_v2.py | Skill helpers (creative, policy, etc.). |
tests/test_cli.py | CLI entry point smoke. ~70 lines. |
What's missing
- •End-to-end workflow tests. No test today exercises
WorkflowRunner.execute()against a full mock-mode workflow. The runner has unit-testable pieces (_topo_sort,_resolve_path,_resolve_step_input) but they're not directly tested. Addingtests/test_runner.pywould be high-value. - •Trace persistence assertions. Tests don't assert that
tiktok_agent_stepsrows get written. Because tests run without a DB by default,_persist_llm_step_traceno-ops silently. To test the trace pipeline you'd need a test DB fixture. - •Approval flow tests. No test exercises
routers/approvals.py. The approve/reject paths are simple but they touch DB; need a fixture. - •SSE stream tests. The
orchestrator/events.pybus is testable in isolation but not tested. - •Comment Triage tests. The
comment_triage.pytemplate has full implementation but no dedicated test file. (The mock fixture is wired up; just no test asserting on it.) - •Content Producer tests. Same — full implementation, no dedicated test. This one's harder because Content Producer requires Studio + transcoding + publisher, all of which need mocking.
How to run a workflow end-to-end in mock mode
This is the most useful smoke test for a builder change. It exercises everything from WorkflowDef to synthesis, all in-memory.
From a Python script
import asyncio
from uuid import uuid4
from tiktok_army.orchestrator.runner import run_workflow_inline
from tiktok_army.orchestrator.definitions import PROFILE_AUDIT
async def main():
workflow_run_id = uuid4()
workspace_id = uuid4()
# Brief shape matches what routers/workflows_api.py:run_workflow constructs.
brief = {
"handle": "lakucosmetics",
"target_type": "third_party",
"outcome": "profile_audit",
"notes": None,
}
report_md = await run_workflow_inline(
workspace_id=workspace_id,
workflow=PROFILE_AUDIT,
brief=brief,
workflow_run_id=workflow_run_id,
brand_id=None,
)
print("=== Report ===")
print(report_md)
if __name__ == "__main__":
asyncio.run(main())Run it:
cd ~/projects/tiktok-army
uv run python /tmp/run_audit.pyYou'll get the synthesis Markdown printed to stdout. The trace inserts into tiktok_workflow_steps / tiktok_agent_steps happen if a DB is reachable (they fail-soft otherwise; check the logs for claude.trace.persist_failed).
Caveat: this requires Postgres
run_workflow_inline calls _insert_step_row which uses session_for_workspace. If no DB is reachable, the script will fail at the first step's INSERT. To run end-to-end without a DB, you'd need to mock session_for_workspace or stub _insert_step_row / _update_step_row. For now, set up local Postgres (see Deploy Runbook).
From the dashboard
Easier alternative for visual verification:
# Backend
uv run uvicorn tiktok_army.main:app --reload --port 8000 &
# Dashboard
cd dashboard
TIKTOK_ARMY_API_URL=http://localhost:8000 npm run devSubmit a brief through the UI. The dashboard shows the live DAG; per-step trace drilldown is at /runs/<id> with each step expanding to show its tiktok_agent_steps rows.
Test patterns to copy
Look at tests/test_audience_mapper.py or tests/test_account_health.py for the reference shape:
import pytest
from uuid import uuid4
from tiktok_army.agents import AudienceMapperAgent
from tiktok_army.models import AgentTriggerType
async def test_audience_mapper_happy_path():
agent = AudienceMapperAgent()
result = await agent.run(
workspace_id=uuid4(),
brand_id=None,
trigger_type=AgentTriggerType.MANUAL,
input_data={"handle": "lakucosmetics"},
)
assert "segments" in result.output
assert all("score" in seg for seg in result.output["segments"])
assert all(seg["score"] >= 0.5 for seg in result.output["segments"]) # min_score default
assert result.cost_usd > 0Key practices:
- •Use
await agent.run(...), notagent._execute(ctx)directly.run()exercises the full lifecycle including contextvar setup;_executealone won't trace. - •Use
uuid4()forworkspace_idandbrand_idto avoid colliding with other tests. - •Assert on output structure (keys present, types correct) rather than exact values — mock fixtures are deterministic, but readability of the test matters more than tight coupling to fixture values.
- •Tests that go through
BaseAgent.run()will try to write totiktok_agent_runs. In dev environments without a DB, this fails. Either: skip the test in CI without a DB, mock the DB layer, or fix the issue noted in~/projects/tiktok-army/CLAUDE.md("anything that toucheslib.db,lib.audit, orlib.claudewill fail in test environments without setting required secrets via env vars").
Adding a regression test for a bug
When you fix a bug, add a test that would have caught it. The pattern:
- Reproduce the bug locally (in mock mode where possible).
- Write a test that fails on the broken code (run it before the fix to confirm the assertion fires).
- Apply the fix.
- Run the test — it should pass now.
- Run the rest of the suite — nothing else should have regressed.
For workflow-level bugs (orchestrator, runner, approval flow), the missing tests/test_runner.py is where they should land.
Things to know
- •
asyncio_mode=autolets you writeasync def test_xxx()without decorators. Pytest will run them in a fresh event loop per test. - •Tests don't reset the Claude wrapper's client.
_get_client()lazily instantiatesAsyncAnthropic. In mock mode this is never called; in tests that switch modes mid-run, the client is reused. Don't mix modes in a single test session. - •
get_settings()islru_cached. Changes to environment variables mid-test won't take effect unless you callget_settings.cache_clear(). Most tests don't need to mutate config; if yours does, clear the cache. - •Mock fixtures are seeded for the 13 known agents. A new agent without a fixture entry returns the generic
_DEFAULT_FIXTURE. Tests that expect specific output keys will fail with KeyError if the fixture is missing. - •Lint and types are enforced. CI (when wired) will run
ruff checkandmypy --strict. Adding code without complete type hints will fail mypy.