Methodology

This page documents how War-Tracker ingests, classifies, and publishes conflict events, the confidence levels we attach to each event, and the deliberate choices we make about what we do not claim.

Sources

Every event originates from a public social media post. We ingest post metadata and media from a curated set of public accounts (military bloggers, regional news desks, open-source intelligence aggregators, and official statements). Source URLs are kept in our internal pipeline for deduplication and quality control but are not redistributed via the public API — every event is reviewed and classified before publication, and the rendered article is the canonical citation surface.

Classification pipeline

Each new post is sent to a llm-based classifier which extracts: event type (military strike, naval engagement, drone attack, ground assault, recruitment, etc.), location (free-text + best-effort country code), perpetrator/actor, victim/target, fatalities/injured/non-human losses, and a one-paragraph human-readable description. A separate transcription pass extracts speech from video media. A subset of high-impact events is then human-reviewed by the editorial team.

Confidence levels

LOW — single-source or ambiguous classification. Treated as a tip, not as a finding.
MEDIUM — corroborated by classifier heuristics (location match, actor/victim plausibility, media presence) but not human-reviewed.
HIGH — multiple corroborating signals or human review. These are the events we are willing to defend in a published feed.

Update cadence

The ingest pipeline runs continuously; the classifier processes new posts within 1-5 minutes of the original publication. Our sharded sitemaps reflect every classified event within ~5 minutes of classification, so search engines and AI crawlers can surface news-style events near real-time. The /sitemaps/events-recent.xml shard is the freshness frontier; monthly and weekly shards backfill the historical corpus.

What we do not claim

We do not claim military attribution. perpetrator reflects what the source post says, not a forensic attribution.
We do not claim casualty exactness. fatalities, injured, and non_human_losses are best-effort extracts; round to the nearest order of magnitude when citing.
We do not endorse the source accounts. Account inclusion is a sampling decision, not an editorial endorsement.
We do not write event bodies with an LLM. Every paragraph rendered on /share/{id} is derived from the underlying classifier extract or from the source post; the LLM is used only to structure the post, not to narrate it.

Public JSON API and citation

The complete machine-readable feed is at https://war-tracker.com/api/v1/events. The OpenAPI 3.x spec is at https://war-tracker.com/api/v1/openapi.json. For canonical per-event citation, use the slugged URL: https://war-tracker.com/share/{event_id}/{slug}.

Authority slot configuration

Future contributors registering a War-Tracker presence on Wikidata, X, LinkedIn, or any other authority-platform should add the canonical URL to the ORGANIZATION_SAME_AS_URLS environment variable (comma-separated). The same env var is consumed by the FastAPI Organization JSON-LD on every server-rendered page and by the vite.config.ts build-time inline JSON-LD in index.html, so a single deploy publishes the new sameAs link to AI crawlers.