Architecture & next steps

How the prototype works, and where it goes next

Today every verdict comes from transparent, deterministic heuristics that run inside the app's own server functions. That makes the prototype fast, free, and explainable. The roadmap below replaces each heuristic with a real model while keeping the same result contract.

The pipeline today

01UploadText, image, or video in the browser

02EncodeFiles read as base64, sent to a server function

03InspectHeader parse + metadata + statistical features

04ScoreWeighted signals combine into 0-100

05ExplainFindings returned with direction and weight

Text

Sentence-length burstiness, lexical variety, connective density, register cues, and stock-phrase matching approximate what a perplexity model measures.

Image

Byte-level scan for EXIF/XMP/C2PA generator fingerprints (DALL·E, Midjourney, Firefly), camera-EXIF counter-evidence, and common output resolutions.

Video

Container detection (MP4/MOV, Matroska, AVI) plus a metadata scan for generator and capture-device tags. Frame analysis is stubbed and reported as pending.

From heuristics to models

Text: swap heuristics for a fine-tuned classifier

Replace the statistical proxies with a RoBERTa-based detector fine-tuned on human-vs-generated pairs, scoring real per-token perplexity and burstiness. Run it as a hosted inference endpoint and fall back to the current heuristics when the model is unavailable.

RoBERTa
DetectGPT
perplexity
ONNX Runtime

Image: pixel-level model detection

Add a CLIP / ViT classifier and a frequency-domain (FFT) artifact detector to catch generated images that carry no metadata at all. Keep the metadata scan as a fast, high-precision first pass and let the model decide the ambiguous middle.

CLIP
ViT
FFT artifacts
diffusion fingerprints

Video: frame-level consistency

Decode sampled frames, run the image model per frame, and add temporal checks for flicker, identity drift, and warping artifacts. Lip-sync and blink-rate analysis target face-swap and avatar deepfakes specifically.

ffmpeg
optical flow
temporal coherence
deepfake cues

External detection APIs

For higher confidence and broader coverage, ensemble third-party detectors behind the same result contract. Each provider becomes one weighted signal alongside the in-house model, so a single vendor never owns the verdict.

Hive AI
Sensity
Reality Defender
C2PA verify

Scaling file processing

Inline analysis works for text and small files, but model inference and frame decoding are slow and memory-hungry. The path to volume is a queue-and-worker split so uploads never block on processing.

1
Queue uploads instead of blockingOn upload, store the file in object storage and enqueue a job (e.g. a Redis/BullMQ or SQS queue). The request returns an analysis ID immediately.
2
Run inference on GPU workersA pool of background workers pulls jobs, runs the model (and ffmpeg frame decoding for video), and writes results back to the database keyed by analysis ID.
3
Stream results to the clientThe UI polls or subscribes (SSE/WebSocket) on the analysis ID and renders findings as each signal completes, with cached results for repeat files by content hash.
4
Rate-limit and observePer-key rate limits, size caps, content hashing for dedupe, and metrics on model latency and agreement keep cost and accuracy in view as volume grows.

A note on honesty

No detector is perfect, and any tool that claims certainty is lying. Metadata can be stripped or forged, heuristics misfire on short or edited samples, and even state-of-the-art models carry real false- positive rates. Veridies reports a probability and its reasoning so a human stays in the loop on every call.

Run the detector