Documentation

Groundr is an AI reliability and arbitration platform built on the Reality Negotiation Protocol (RNP). It answers one question for developers: "Can I safely show this AI output to my users?"

The system forces multiple AI models to debate each other using evidence, then uses mathematical arbitration to determine verified truth. Every disagreement is logged, every agent builds a reputation, and every claim is grounded in external proof.

Quick Start

Get Groundr running locally in under 2 minutes:

1. Clone & Install

git clone https://github.com/your-org/groundr.git
cd groundr
pip install -r requirements.txt

2. Configure Environment

Copy the example environment file and add your API keys:

cp .env.example .env
# Edit .env with your keys:
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# GOOGLE_API_KEY=AI...

3. Start the Server

uvicorn api.main:app --reload --port 8000

Visit http://localhost:8000 for the landing page, or http://localhost:8000/docs for the interactive API explorer.

4. Make Your First Request (Verify v1)

curl -X POST http://localhost:8000/v1/verify \
  -H "Content-Type: application/json" \
  -H "X-API-Key: rnp_your_secret_key" \
  -d '{
    "query": "Does the EU Pro plan include feature X?",
    "domain": "policy",
    "mode": "balanced",
    "outputs": [
      {"model":"gpt-4o","provider":"openai","claim":"Yes, it is included.","confidence":0.90},
      {"model":"claude-3-5-sonnet","provider":"anthropic","claim":"No, it is not included.","confidence":0.86}
    ]
  }'

Authentication

Production: API key is required via X-API-Key.

Development: local environments may allow relaxed auth depending on GROUNDR_ENV.

Verify v1 HTTP routes (including GET /v1/kpis, GET /v1/audit/{tenant_id}, POST /v1/verify) follow the same rule: when GROUNDR_ENV=production, a valid X-API-Key is required; in development the server may accept requests without a key so local tools and the analytics page can call KPIs from the browser.

X-API-Key: rnp_your_secret_key

Provider feedback reports require a separate provider token:

X-Provider-Token: grv-prov-...

BYOK headers are supported per request:

X-BYOK-OpenAI: sk-...
X-BYOK-Anthropic: sk-ant-...
X-BYOK-Google: AI...
X-BYOK-XAI: xai-...
X-BYOK-Brave: brave-...
X-BYOK-Goggles: https://my-goggles-config

Generate keys using the included utility:

python manage_keys.py generate --name "my-app"
python manage_keys.py list
python manage_keys.py revoke rnp_abc123

System Architecture

Groundr exposes two related surfaces: Verify v1 (POST /v1/verify) is the production trust contract. The legacy negotiation engine (/agents, /negotiate, WebSocket) remains available for experiments and older integrations.

Verify v1 stack (production contract)

  • src/verify_orchestrator.py — end-to-end verify pipeline
  • src/verify_schemas.py — frozen request/response models
  • src/policy_engine.py and config/policy.yaml — declarative trust decisions
  • src/atoms.py — claim atomization
  • src/brave_research.py — evidence retrieval
  • api/v1_verify.py — public HTTP routes for Verify v1

Legacy negotiation engine

Four primary components backing the engine API and WebSocket:

Negotiation Engine

src/rnp_core.py — The brain. Collects assertions, detects conflicts using proprietary semantic analysis, and executes negotiation strategies.

Evidence Gatherer

src/evidence_gatherer.py — Interfaces with web search APIs to pull live evidence. Automatically identifies Truth Anchors from high-authority domains.

Model Broker

src/broker.py — Normalizes API calls to OpenAI, Anthropic, and Gemini so all models communicate using the same Assertion format.

Persistence Layer

api/database.py — SQLAlchemy-based storage for agents, assertions, conflicts, and results. Supports SQLite (dev) and PostgreSQL (prod).

Data Flow

User Query
  → Model Broker (queries GPT-4, Claude, Gemini in parallel)
    → Assertions submitted to Negotiation Engine
      → Evidence Gatherer pulls live web sources
        → Conflict Detection (semantic similarity)
          → Arbitration (40/60 Rule + Truth Anchors)
            → Shared Reality (verified answer returned)

Truth-Augmented Generation API

The /generate endpoint provides a complete "Truth-in, Content-out" pipeline. It takes a prompt, arbitrates facts behind the scenes, corrects any detected AI hallucinations using live web evidence, and synthesizes polished, fact-checked content in your requested format.

POST /generate

Generate polished, truth-grounded content.

{
  "prompt": "Write a short essay on Bitcoin",
  "format": "essay",           // essay, summary, bullet_points, report, answer
  "tone": "professional",      // professional, casual, academic
  "max_words": 500,
  "mode": "fast"               // fast, balanced, deep
}

Response:

{
  "content": "Bitcoin, a digital cryptocurrency...",
  "format": "essay",
  "grounding_status": "corrected", // verified, corrected, or unverifiable
  "corrections_made": [
    {
      "wrong_claim": "Bitcoin is backed by gold",
      "correct_fact": "Bitcoin is not backed by physical assets",
      "source": "https://investopedia.com/..."
    }
  ],
  "confidence_score": 0.95,
  "citations": [
    {
      "id": "src_1",
      "url": "https://investopedia.com/...",
      "title": "Bitcoin Definition",
      "trust_tier": "tier_2",
      "snippet": "..."
    }
  ],
  "sources_used": 4,
  "models_consulted": 3,
  "factual_queries": ["What is Bitcoin?", "What is Bitcoin backed by?"],
  "latency_ms": 14000
}

Key Concept: Hallucination Correction. If the internal arbitration models hallucinate a fact (e.g. they all claim the current US President is Joe Biden), the grounding engine detects the contradiction with web evidence, flags the claim as `CRITICAL`, and extracts the true fact (Donald Trump). The synthesis engine is then forced to use the corrected fact in the final content. These corrections are returned in the corrections_made array for transparency.

Verify v1 API (Primary)

The primary product surface for production integrations. Authoritative JSON models: interactive schema at /docs (OpenAPI), or src/verify_schemas.py in the repo.

POST /v1/verify

Submit one or more model outputs and receive a structured trust verdict with explicit action guidance.

Optional request fields include domain_risk_class, tenant_id, allow_escalation, and policy_overrides; each output may include cited_sources (URLs).

{
  "query": "Does the EU Pro plan include feature X?",
  "domain": "policy",
  "mode": "balanced",
  "outputs": [
    {"model":"gpt-4o","provider":"openai","claim":"Yes...","confidence":0.90},
    {"model":"claude-3-5-sonnet","provider":"anthropic","claim":"No...","confidence":0.86}
  ]
}

Response (illustrative; arrays/objects may be longer in production):

{
  "schema_version": "1.0.0",
  "run_id": "v_ab12cd34ef56",
  "created_at": "2026-01-15T12:00:00+00:00",
  "safe_to_display": false,
  "risk_level": "HIGH",
  "recommended_action": "HUMAN_REVIEW",
  "confidence_score": 0.42,
  "consensus_claim": "",
  "warnings": ["Multiple models disagree on material facts."],
  "failure_modes": [
    {
      "kind": "search_outage",
      "description": "Live evidence fetch degraded.",
      "severity": "HIGH",
      "related_atom_ids": ["a_abc123def0"]
    }
  ],
  "blocked_atoms": [],
  "edit_suggestions": [],
  "atoms": [
    {
      "id": "a_abc123def0",
      "text": "The EU Pro plan includes feature X.",
      "type": "fact",
      "sensitivity": "medium",
      "source_models": ["gpt-4o"]
    }
  ],
  "atom_verdicts": [
    {
      "atom_id": "a_abc123def0",
      "status": "conflicting",
      "authority_summary": {"tier_1": 0, "tier_2": 1, "tier_3": 0, "tier_4": 0},
      "nli_summary": {"entailment": 0, "contradiction": 1, "neutral": 0},
      "diversity_groups": 1,
      "freshness_ok": true,
      "notes": []
    }
  ],
  "evidence": [
    {
      "id": "e_xyz789",
      "url": "https://example.com/pricing",
      "hostname": "example.com",
      "authority_tier": "tier_2",
      "diversity_group": "example.com",
      "title": "",
      "snippet": "...",
      "relevance": 0.71,
      "nli_label": "contradiction",
      "nli_confidence": 0.82,
      "is_outdated": false
    }
  ],
  "models_used": ["gpt-4o", "claude-3-5-sonnet"],
  "policy": {
    "version": "1.0.0",
    "rules_fired": ["evidence.search_outage"],
    "domain_risk_class": "HIGH",
    "mode_used": "balanced",
    "escalated": false,
    "escalation_reason": null
  },
  "uncertainty": {
    "disclaimer": "Groundr reduces risk by arbitrating AI outputs and checking sources; it does not guarantee factual truth.",
    "notes": ["Review cited sources before acting."]
  },
  "atomizer_version": "1.0.0",
  "latency_ms": 1840,
  "cost_estimate_cents": 0.0
}
POST /v1/verify/{run_id}/override

Record what the user did after seeing the verdict. Powers override-rate KPI.

{"user_action_taken": "HUMAN_REVIEW"}

Use user_action_taken to describe the host product action. Common pattern: when the user explicitly followed Groundr, send the same string as recommended_action from the verify response (SHOW, SHOW_WITH_WARNING, EDIT_SUGGESTED, HUMAN_REVIEW, or BLOCK). Send IGNORED when the user dismissed or overrode Groundr's guidance (not an enum value on the verify response, but accepted here for analytics).

GET /v1/kpis

Returns operational trust KPIs (false-safe rate, interception rate, action mix, override rate, p95 latency). See Authentication for when X-API-Key is required.

GET /v1/audit/{tenant_id}

Tenant-scoped decision replay export for governance and compliance review.

GET /v1/providers/me/report

Provider-isolated feedback report. Requires X-Provider-Token.

GET /v1/health

Health and schema/policy version metadata for the v1 stack.

Legacy API

Legacy compatibility endpoints are still available but are no longer the primary integration path.

POST /validate/query

Legacy query arbitration endpoint.

POST /validate/

Legacy multi-output validation endpoint.

GET /validate/health

Legacy health endpoint.

Migration guide

POST /validate/query             -> POST /v1/verify (supply outputs[] or host-generated drafts)
GET  /validate/health            -> GET  /v1/health
legacy UI verdict parsing        -> use recommended_action + policy.rules_fired + uncertainty.notes

Engine API

Low-level endpoints for direct interaction with the negotiation engine.

POST /agents

Register a new AI agent with the negotiation engine.

{
  "agent_id": "gpt-4",
  "name": "GPT-4 Turbo",
  "expertise": {"geography": 0.9, "finance": 0.85}
}
GET /agents

List all registered agents with reputation info, wins, losses, and active assertions.

GET /agents/{agent_id}/reputation

Get detailed reputation breakdown for a specific agent, including win rate and expertise scores.

POST /assertions

Submit an assertion from a registered agent.

{
  "agent_id": "gpt-4",
  "domain": "geography",
  "claim": "Lusaka is the capital of Zambia",
  "confidence": 0.98,
  "evidence": [
    {
      "source": "https://en.wikipedia.org/wiki/Zambia",
      "reliability": 0.85,
      "verification_tier": "verified",
      "is_truth_anchor": false
    }
  ]
}
POST /negotiate

Trigger the full negotiation pipeline: detect conflicts and resolve them. Supports all strategies.

{
  "strategy": "majority_vote"
}
// Valid strategies: "majority_vote", "bayesian_update", "weighted_consensus"
GET /shared-reality

Retrieve the current "Shared Reality" — all verified, arbitrated truths across all domains.

GET /conflicts

List all detected conflicts with semantic similarity scores between competing claims.

WebSocket API

Connect to ws://localhost:8000/ws/negotiate for real-time streaming of negotiation events.

Events Received

  • agent_registered — When a new agent joins
  • assertion_submitted — When a new claim is made
  • negotiation_complete — When arbitration finishes with results

Commands

// Trigger a negotiation
{"action": "negotiate", "strategy": "bayesian_update"}

// Get current system status
{"action": "status"}

JavaScript Example

const ws = new WebSocket('ws://localhost:8000/ws/negotiate');

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);

  if (data.event === 'negotiation_complete') {
    console.log(`Resolved ${data.total_conflicts} conflicts`);
  }
};

// Trigger negotiation
ws.send(JSON.stringify({
  action: 'negotiate',
  strategy: 'majority_vote'
}));

Core Algorithms

Evidence-First Scoring

Score = 0.4 × Confidence + 0.6 × Evidence

A proprietary formula ensures evidence always outweighs raw model confidence. Unsupported claims are systematically down-ranked, regardless of how confident the model appears.

Truth Anchor Multiplier

Effective Reliability = f(Base Reliability, Authority Bonus)

When evidence comes from a high-authority source, it receives a significant reliability bonus. This creates a "gravity well" for verified truth, allowing a single well-sourced minority claim to defeat an unsourced majority.

Hallucination Snowball Detection

If Avg Evidence Weight of Consensus is below threshold → FLAG WARNING

When all models agree but none have evidence, Groundr flags it as a "Consensus Hallucination." This prevents the dangerous scenario where multiple AI models reinforce each others' lies because they share the same training data bias.

Temporal Decay

Decayed Confidence = Confidence × decay_factor(time_elapsed)

Claims lose confidence over time, encouraging frequent re-validation. This prevents stale assertions from persisting in the Shared Reality when newer, more accurate information becomes available.

Negotiation Strategies

Majority Vote

The most common cluster of semantically similar claims wins. Enhanced with evidence-awareness: if the majority cluster's average evidence weight is below 0.3, a "Consensus Hallucination" warning is triggered. A minority with a Truth Anchor can override the majority.

Bayesian Update

Uses agent reputation as a prior and evidence as the likelihood function. Agents with higher win rates get more weight, but evidence can still override reputation. Best for domains requiring long-term expertise tracking.

Weighted Consensus

Direct weighting based on pre-defined expertise scores per domain. Useful when you know which models excel in specific areas (e.g., GPT-4 for code, Gemini for multimodal).

Evidence & Grounding

Groundr supports three tiers of evidence verification:

Verified

Evidence from external, authoritative sources that have been cross-referenced. Highest reliability.

Self-Reported

Evidence provided by the agent itself without external verification. Moderate reliability.

Unverified

No evidence provided. Lowest reliability — the agent is making a claim purely on internal knowledge.

Truth Anchor Sources

The following domains are automatically flagged as Truth Anchors with a 1.5× reliability bonus:

.gov domains     — Government sources
reuters.com      — Reuters news agency
nature.com       — Nature research journal
who.int          — World Health Organization
worldbank.org    — World Bank
un.org           — United Nations
nasa.gov         — NASA
esa.int          — ESA

Risk Levels

Every validation response includes a risk assessment:

Level Confidence Criteria Recommendation
LOW Confidence ≥ 0.8 High agreement, strong evidence (ratio ≤ 0.2) Safe to display to users
MEDIUM 0.6 – 0.8 Partial agreement or moderate evidence (ratio ≤ 0.5) Display with disclaimer
HIGH 0.4 – 0.6 Conflict detected, weak evidence Do not display without review
CRITICAL < 0.40 Consensus hallucination suspected Block output entirely

Deployment

Docker

# Build and run with Docker Compose
docker-compose up --build

# The API will be available at http://localhost:8000

Environment Variables

Variable Description Required
OPENAI_API_KEY OpenAI API key for GPT models Yes
ANTHROPIC_API_KEY Anthropic API key for Claude Yes
GOOGLE_API_KEY Google API key for Gemini Yes
DATABASE_URL PostgreSQL connection string Production only
BRAVE_SEARCH_API_KEY Brave Search for evidence gathering Optional
SERPAPI_KEY SerpAPI for evidence gathering Optional