How to run a test
This is the high level walkthrough. For every input option and the full output schema, see Test options reference.
The flow
A persona test is asynchronous. It is always the same four steps:
- Estimate. Call
polis_estimatewith the same inputs you plan to test. It returns the token cost, the USD cost, and how it splits against your remaining tokens and your spend limit. This never charges anything. - Start. Call
polis_test. It charges the estimated tokens up front and returns arun_idimmediately. The test runs in the background. - Poll. Call
polis_statuswith therun_idevery few seconds. It reports how many personas have reacted and adoneflag. - Read. Once done, call
polis_reportfor the deliverable.
If a run fails before producing a report, the tokens are refunded automatically.
Worked example
Test a tweet against 100 personas:
// polis_test
{
"content_type": "tweet",
"content": "Ship faster. Polis tells you what your audience thinks before you post.",
"persona_count": 100
}
// -> { "run_id": "9f3c...", "status": "pending", ... }Poll polis_status until done: true, then polis_report with that run_id.
What the report means
This is the important part. polis_report returns the deliverable plus the raw material behind it.
segments
Three clusters of how the audience reacted, named by behaviour, not demographics. Each has a name, a share (0 to 1, the rough fraction of the audience in that cluster), and a one or two sentence summary. Example:
{ "name": "Hooked but skeptical", "share": 0.38,
"summary": "Liked the speed promise but wanted proof. Several asked 'what does it actually do'." }Read these to understand the shape of the room: who leaned in, who bounced, and why each group did.
topFriction
The three biggest reasons the content underperformed, ranked. Each has a point (the friction in one line), a severity (0 to 10, how much it cost you), and evidence (a real verbatim pattern from the reactions that proves the friction is real, not invented). This is your prioritized fix list.
rewrite
A rewritten version of the content that addresses the top friction without losing the original voice, length, or format. For a tweet it stays tweet sized; for a landing page it keeps the same structural slots. It is meant to be visibly better than the original and usable more or less as is.
structuralNotes (pages only)
For url and markdown content, 0 to 2 short notes about layout or ordering changes (for example "move the proof point above the fold"). Omitted for short feed content.
stats
Aggregate counts so you can see the distribution at a glance: how many personas chose each 5 second decision (stay, scroll, click, save), the mean purchase or engagement intent (0 to 10), and which swarm models produced the reactions.
sample_reactions
A sample of individual verbatim reactions, the raw material the synthesis is built from. Each persona returns: archetype (which audience segment they represent), first_glance (a three word gut read), decision (stay / scroll / click / save), noticed_first, friction (their single objection), quote (what they would actually say, in their own voice), and intent (0 to 10). Read these when a segment summary surprises you and you want the actual words behind it.
How to act on it
Use topFriction as the fix list, ship the rewrite (or adapt it), and re-run to confirm the friction dropped. segments and sample_reactions are there when you need to understand or defend a decision.
All docs
- What Polis is: The concept, inputs, and what you get back.
- Agent quickstart: Connect, get a key, run a test, read the report.
- How to run a test (you are here): The workflow, a worked example, and how to read the report.
- Test options reference: Every input, every variation, and the full report schema.
- MCP tools: Every tool, its inputs, and cost.
- Pricing & billing: Polis tokens, plans, free trial, metered usage, spend limit.
- Discovery & well-known endpoints: How agents and clients discover Polis.
Polis docs index · Full machine-readable index (llms.txt) · Everything inlined (llms-full.txt)