API Reference — LiteRTLM Gateway

Base URL

http://<host>:<port>

All REST endpoints are prefixed with /api. The WebSocket endpoint is at /ws.

Authentication

Two credential types are accepted depending on the endpoint:

Type	Header	Endpoints
JWT	Authorization: Bearer <accessToken>	All `/api/conversations/`, `/api/api-key/`
API Key	Authorization: Bearer lrtlm_<key>	All `/api/conversations/*`, WebSocket

API key management endpoints (/api/api-key/*) require JWT only.

Auth

POST /api/auth/login

Authenticate and receive a JWT pair. No auth header required.

Request body

{
  "username": "admin",
  "password": "your-password"
}

Response 200

{
  "ok": true,
  "accessToken": "eyJ...",
  "refreshToken": "eyJ..."
}

POST /api/auth/refresh

Exchange a refresh token for a new JWT pair. Refresh tokens rotate on every use.

Request body

{
  "refreshToken": "eyJ..."
}

Response 200

{
  "ok": true,
  "accessToken": "eyJ...",
  "refreshToken": "eyJ..."
}

POST /api/auth/logout

Invalidate the current session. The access token expires naturally after 15 minutes.

Request body

{
  "refreshToken": "eyJ..."
}

Response 200

{ "ok": true }

Conversations

All conversation endpoints accept JWT or API key via Authorization: Bearer.

GET /api/conversations

List all conversations.

Response 200

{
  "ok": true,
  "conversations": ["my-chat", "code-review"]
}

POST /api/conversations

Create a new conversation. Choose a builtin preset or supply a custom system instruction. Optionally bind tools by name.

Request body — builtin preset

{
  "name": "my-chat",
  "config": "assistant"    // "assistant" | "coder" | "concise" | "creative"
}

Request body — with tools

{
  "name": "my-chat",
  "config": "assistant",
  "tools": ["datetime", "calculator"]    // optional — bind tools by name
}

Request body — custom instruction

{
  "name": "my-chat",
  "systemInstruction": "You are a pirate. Respond only in pirate speak.",
  "tools": ["datetime"]    // tools work with custom instructions too
}

Response 201

{
  "ok": true,
  "name": "my-chat",
  "config": "assistant"
}

GET /api/conversations/{name}/messages

Retrieve full message history for a conversation, ordered oldest-first.

Response 200

{
  "ok": true,
  "messages": [
    { "role": "user",  "text": "Hello", "seq": 0, "createdAt": 1700000000000 },
    { "role": "model", "text": "Hi! How can I help?", "seq": 1, "createdAt": 1700000000000 }
  ]
}

POST /api/conversations/{name}/messages

Send a message and receive the full reply in a single blocking response. For streaming, use the WebSocket endpoint instead.

Request body

{
  "message": "Explain binary search trees"
}

Response 200

{
  "ok": true,
  "reply": "A binary search tree is a data structure..."
}

DELETE /api/conversations/{name}

Permanently delete a conversation and all its message history.

Response 200

{ "ok": true }

Tools

Tools let the model call server-side functions during inference. When a tool is bound to a conversation, the model can invoke it automatically — the entire call/response loop happens inside the SDK before any token reaches the client. From the client's perspective, the reply arrives as normal streaming tokens.

GET /api/tools

List all tools currently registered on the server. No authentication required.

Response 200

{
  "ok": true,
  "tools": [
    {
      "name": "datetime",
      "description": "Get the current date and time...",
      "parameters": [
        { "name": "format",   "type": "STRING", "description": "Date format pattern", "required": false },
        { "name": "timezone", "type": "STRING", "description": "IANA timezone ID",    "required": false }
      ]
    },
    {
      "name": "calculator",
      "description": "Evaluate a mathematical expression...",
      "parameters": [
        { "name": "expression", "type": "STRING", "description": "A math expression", "required": true }
      ]
    }
  ]
}

POST /api/conversations — bind tools at creation

Pass "tools" when creating a conversation to bind tools by name. Tools are resolved from the server registry — unknown names are silently skipped. Omit "tools" or pass [] for a tool-free conversation.

Request body

{
  "name":   "research-chat",
  "config": "assistant",
  "tools":  ["datetime", "calculator"]
}

Built-in tools

Name	Description	Parameters
datetime	Returns the current date and time	`format` (optional) — Java date pattern, e.g. `yyyy-MM-dd HH:mm:ss` `timezone` (optional) — IANA ID, e.g. `Asia/Tokyo`
calculator	Evaluates a math expression and returns the result	`expression` (required) — e.g. `(3 + 5) * 2`, `Math.sqrt(16)`

How it works

// 1. Create a conversation with tools bound
POST /api/conversations
{ "name": "helper", "config": "assistant", "tools": ["datetime", "calculator"] }

// 2. Send a message — tool calls happen inside the model, transparently
WS /ws/conversations/helper?token=lrtlm_...
→ { "message": "What is 2 to the power of 10, and what time is it in Tokyo?" }

// Model calls: calculator("Math.pow(2,10)") → "1024"
//              datetime(timezone="Asia/Tokyo") → "2026-04-11 18:30:00"
// Then generates the final reply using both results.

← { "type": "token", "token": "2 to the power of 10 is 1024..." }
← { "type": "done" }

// 3. The client sees only the final answer — tool calls are invisible

Tool error handling

If a tool fails (bad parameters, runtime error), it returns a descriptive error string to the model — e.g. "Error: could not evaluate expression '1/0': Division by zero". The model reads this as the tool result and responds accordingly. Tool errors never interrupt streaming or cause a 503.

WebSocket — Streaming

WS /ws/conversations/{name}?token=<accessToken|apiKey>

Stream model replies token-by-token over a persistent WebSocket connection. Pass your JWT access token or API key as the token query parameter. The connection stays open across multiple turns — send a new message after receiving "done".

Client → Server (send message)

{ "message": "What is Kotlin coroutines?" }

Server → Client (streaming token)

{ "type": "token", "token": "Kotlin" }
{ "type": "token", "token": " coroutines" }
// ... one frame per token

Server → Client (turn complete)

{ "type": "done" }

Server → Client (error)

{ "type": "error", "error": "Conversation 'my-chat' not found" }

Example — JavaScript

const ws = new WebSocket(
  'ws://localhost:8080/ws/conversations/my-chat?token=lrtlm_...'
);

ws.onopen = () => {
  ws.send(JSON.stringify({ message: 'Hello!' }));
};

ws.onmessage = (evt) => {
  const frame = JSON.parse(evt.data);
  if (frame.type === 'token') process.stdout.write(frame.token);
  if (frame.type === 'done')  console.log('\n[done]');
  if (frame.type === 'error') console.error('[error]', frame.error);
};

API Key Management

These endpoints require JWT only (Authorization: Bearer <accessToken>).

POST /api/api-key/generate

Generate a new API key. The raw key is returned once — store it immediately.

Request body

{ "name": "mobile-client" }

Response 201

{
  "ok": true,
  "key": "lrtlm_...",       // raw key — shown once only
  "id": "uuid",
  "prefix": "lrtlm_Xx",
  "name": "mobile-client"
}

GET /api/api-key/list

List all API keys (active and revoked). Raw keys are never returned.

Response 200

{
  "ok": true,
  "keys": [
    {
      "id": "uuid",
      "prefix": "lrtlm_Xx",
      "name": "mobile-client",
      "active": true,
      "createdAt": 1700000000000,
      "lastUsedAt": 1700000001000
    }
  ]
}

GET /api/api-key/info?key=lrtlm_...

Look up metadata for a specific key by its raw value.

DELETE /api/api-key/revoke

Soft-revoke an API key. The record is kept for audit purposes.

Request body

{ "key": "lrtlm_..." }

Response 200

{ "ok": true }

Error Responses

All errors follow the same structure:

{ "ok": false, "error": "Human-readable message" }

Status	Meaning
400	Bad request — missing or invalid field
401	Unauthorized — missing, invalid, or expired token
404	Resource not found
409	Conflict — resource already exists
503	Engine not ready — model is still loading