Replicate

Lightweight, dependency-free, in-memory Replicate HTTP API fake for testing code that uses the real replicate Node.js SDK (and the language-agnostic Replicate REST API).

Default port: 4856

Quick start

import { ReplicateServer } from "./services/replicate/src/server.js";

const server = new ReplicateServer(4856);
await server.start();
// ... run your app/tests ...
await server.stop();

Point the real replicate client at it via baseUrl:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: "r8_parlel",
  baseUrl: "http://127.0.0.1:4856",
});

const output = await replicate.run("stability-ai/sdxl:version", {
  input: { prompt: "a cat" },
});
// output => deterministic array derived from the input hash

All generated output is deterministic: prediction outputs are derived from a hash of the input so tests are repeatable.

Access via MCP / preview URL

Base URL: http://127.0.0.1:4856
Health: GET /health → { "status": "ok" }
Root metadata: GET / → { name, version, protocol, documentation }
MCP / agent tooling can target the base URL directly; auth via Authorization: Token r8_... (any non-empty token accepted).

Implemented operations

All /v1/* routes require Authorization: Token <key> (or Bearer). Any non-empty token is accepted.

POST /v1/predictions — create a prediction. Returns 201 with { id, status: "starting", urls, ... }.
GET /v1/predictions/:id — poll a prediction. Resolves to status: "succeeded" with a deterministic output array on the first GET.
POST /v1/predictions/:id/cancel — cancel a running prediction (status: "canceled").
GET /v1/models/:owner/:name — retrieve model metadata including a deterministic latest_version.

Service & inspection operations (parlel extensions)

GET / — service metadata.
GET /health — health check.
POST /__parlel/reset — reset all in-memory state.
GET /__parlel/predictions — list captured predictions.
OPTIONS * — CORS preflight (204).

Surface coverage

This emulator faithfully replicates the API surface most application code and agents exercise. Anything below the supported lines is either an intentional design choice for a fast, zero-cost local emulator (✓ By design) or a candidate for a future release (⟳ Roadmap) — never a silent inaccuracy.

Legend: ✅ fully supported · ◐ accepted (stored, not strictly enforced) · ✓ by design · ⟳ on the roadmap.

Feature	Status
`predictions.create` / `get` / `cancel`	✅ Supported
`models.get`	✅ Supported
Deterministic, reproducible output	✅ Supported
Real model inference / GPU compute	✓ By design — Deterministic stub output — repeatable assertions, no API spend
Webhooks / streaming output URLs	⟳ Roadmap — poll-only
Training / fine-tunes / deployments	⟳ Roadmap
Token validity / quota enforcement	✓ By design — Never throttles — local tests run at full speed, zero cost

Error codes & shapes

Errors use the Replicate envelope: { "detail": "...", "status": <code> }.

Status	When
`401`	missing/invalid `Authorization`
`404`	unknown prediction or endpoint
`422`	malformed request body

Manifest

See services/replicate/manifest.json:

name: replicate, image: parlel/replicate:1.0
port: 4856, protocol: http, healthcheck: /health, startup ≈ 100ms
env: REPLICATE_API_TOKEN, REPLICATE_BASE_URL

Configuration — `test.env`

Copy these into your test.env (used by the bridge sidecar flow). Tokens are Parlel's seeded test credentials — any non-empty value is accepted by the emulator, so you rarely need to change them. Swap in real credentials only when pointing at the live service in prod.env.

REPLICATE_API_TOKEN=r8_parlel
REPLICATE_BASE_URL=http://parlel-bridge:4856