Fireworks AI
Lightweight, dependency-free, in-memory Fireworks AI API fake. Fireworks AI is OpenAI-compatible and serves under /inference/v1. Works with the official openai SDK pointed at it. All output is deterministic (hash-derived); SSE streaming supported.
Default port: 4864
Quick start
import { FireworksAiServer } from "./services/fireworks-ai/src/server.js";
const server = new FireworksAiServer(4864);
await server.start();
// ... run your app/tests ...
await server.stop();
Point the openai SDK at it:
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "parlel-fireworks", baseURL: "http://127.0.0.1:4864/inference/v1" });
const res = await client.chat.completions.create({
model: "accounts/fireworks/models/llama-v3p3-70b-instruct",
messages: [{ role: "user", content: "hello" }],
});
Access via MCP / preview URL
- Base URL:
http://127.0.0.1:4864/inference/v1 - Health:
GET /health→{ "status": "ok" } - Auth:
Authorization: Bearer <key>(any non-empty token).
Implemented operations
POST /inference/v1/chat/completions— OpenAI-compatible chat. Supportsstream: true(SSE,data: [DONE]).POST /inference/v1/completions— legacy text completion.POST /inference/v1/embeddings— deterministic 768-dim embeddings.GET /inference/v1/models— model catalog (accounts/fireworks/models/...).
Service & inspection operations (parlel extensions)
GET //GET /health/POST /__parlel/reset/GET /__parlel/requests.
Surface coverage
This emulator faithfully replicates the API surface most application code and agents exercise. Anything below the supported lines is either an intentional design choice for a fast, zero-cost local emulator (✓ By design) or a candidate for a future release (⟳ Roadmap) — never a silent inaccuracy.
Legend: ✅ fully supported · ◐ accepted (stored, not strictly enforced) · ✓ by design · ⟳ on the roadmap.
| Feature | Status |
|---|---|
chat.completions (non-stream + SSE) | ✅ Supported |
completions (legacy) | ✅ Supported |
embeddings | ✅ Supported |
models list | ✅ Supported |
| Deterministic, reproducible output | ✅ Supported |
| Real model inference | ✓ By design — Deterministic stub output — repeatable assertions, no API spend |
| Image/audio models, fine-tuning | ⟳ Roadmap |
| Tool/function calling, grammar mode | ◐ Accepted, not specially handled |
Error codes & shapes
Errors use the OpenAI envelope: { "error": { "message", "type", "code" } }.
| Status | When |
|---|---|
401 | missing/invalid Authorization |
400 | missing model/messages/prompt/input or bad JSON |
404 | unknown endpoint |
Manifest
See services/fireworks-ai/manifest.json:
- name:
fireworks-ai, port:4864, protocol:http, healthcheck:/health, startup ≈ 100ms - env:
FIREWORKS_API_KEY,FIREWORKS_BASE_URL
Configuration — test.env
Copy these into your test.env (used by the bridge sidecar flow). Tokens are Parlel's seeded test credentials — any non-empty value is accepted by the emulator, so you rarely need to change them. Swap in real credentials only when pointing at the live service in prod.env.
FIREWORKS_API_KEY=parlel-fireworks
FIREWORKS_BASE_URL=http://parlel-bridge:4864/inference/v1
<!-- parlel:testenv:end -->