Cohere
Lightweight, dependency-free, in-memory Cohere v2 API fake for testing code that uses the real cohere-ai SDK (and the language-agnostic v2 REST API). All generated content is deterministic — text, embedding vectors, and rerank scores are derived from a hash of the input.
Default port: 4754
Quick start
import { CohereServer } from "./services/cohere/src/server.js";
const server = new CohereServer(4754);
await server.start();
// ... run your app/tests ...
await server.stop();
Point the real cohere-ai client at it via environment:
import { CohereClientV2 } from "cohere-ai";
const cohere = new CohereClientV2({
token: "parlel",
environment: "http://127.0.0.1:4754",
});
const response = await cohere.chat({
model: "command-r-plus",
messages: [{ role: "user", content: "Hello Cohere" }],
});
// response.message.content[0].text => deterministic text
Implemented operations
All /v2/* routes require an Authorization: Bearer <key> header (any non-empty bearer token is accepted). State is in-memory and ephemeral.
POST /v2/chat— chat. Returns{ id, message: { role, content: [{ type: "text", text }] }, finish_reason, usage }. Supportsstream: truevia the Cohere v2 SSE event types (message-start,content-start,content-delta,content-end,message-end), ending withdata: [DONE].POST /v2/embed— embeddings. Returns{ id, embeddings: { float: [...] }, texts, meta }(deterministic 1024-dim vectors). Requirestexts,model,input_type.POST /v2/rerank— rerank documents against a query. Returns{ id, results: [{ index, relevance_score }], meta }, sorted descending, honoringtop_n.
Service & inspection operations (parlel extensions)
GET /— service metadata.GET /health— health check ({ status: "ok" }).POST /__parlel/reset— reset all in-memory state.GET /__parlel/requests— list captured requests.DELETE /__parlel/requests— clear the captured request log.
SDK usage example
import cohere
co = cohere.ClientV2(api_key="parlel", base_url="http://127.0.0.1:4754")
res = co.rerank(
model="rerank-english-v3.0",
query="What is parlel?",
documents=["parlel is a tool", "unrelated", "another parlel mention"],
top_n=2,
)
for r in res.results:
print(r.index, r.relevance_score)
Access via MCP / preview URL
HTTP services are auto-exposed at the Daytona preview URL. Send requests to the preview URL with the x-daytona-preview-token header. Set COHERE_BASE_URL to the preview URL.
Surface coverage
This emulator faithfully replicates the API surface most application code and agents exercise. Anything below the supported lines is either an intentional design choice for a fast, zero-cost local emulator (✓ By design) or a candidate for a future release (⟳ Roadmap) — never a silent inaccuracy.
Legend: ✅ fully supported · ◐ accepted (stored, not strictly enforced) · ✓ by design · ⟳ on the roadmap.
| Feature | Status |
|---|---|
v2/chat (+ streaming SSE) | ✅ Supported |
v2/embed (deterministic float vectors) | ✅ Supported |
v2/rerank (sorted scores, top_n) | ✅ Supported |
| Request inspection | ✅ Supported (parlel extension) |
| Real model inference / quality | ✓ By design — Deterministic stub output — repeatable assertions, no API spend |
tools / connectors / web search | ◐ Accepted, not executed |
int8 / binary embedding types | ◐ Only float returned |
| Token / billing counts | ◐ Approximate word-based |
| Bearer-token validity / quota | ✓ By design — Never throttles — local tests run at full speed, zero cost |
Manifest
See services/cohere/manifest.json:
- name:
cohere, image:parlel/cohere:1 - port:
4754, protocol:http, healthcheck:/health, startup ≈ 100ms - env:
COHERE_API_KEY,CO_API_KEY,COHERE_BASE_URL
Configuration — test.env
Copy these into your test.env (used by the bridge sidecar flow). Tokens are Parlel's seeded test credentials — any non-empty value is accepted by the emulator, so you rarely need to change them. Swap in real credentials only when pointing at the live service in prod.env.
COHERE_API_KEY=parlel
CO_API_KEY=parlel
COHERE_BASE_URL=http://parlel-bridge:4754
<!-- parlel:testenv:end -->