Use the REST and WebSocket API to integrate LiteRTLM into your application.
http://<host>:<port>
All REST endpoints are prefixed with /api. The WebSocket endpoint is at /ws.
Two credential types are accepted depending on the endpoint:
| Type | Header | Endpoints |
|---|---|---|
| JWT | Authorization: Bearer <accessToken> | All /api/conversations/*, /api/api-key/* |
| API Key | Authorization: Bearer lrtlm_<key> | All /api/conversations/*, WebSocket |
API key management endpoints (/api/api-key/*) require JWT only.
Authenticate and receive a JWT pair. No auth header required.
{
"username": "admin",
"password": "your-password"
}
{
"ok": true,
"accessToken": "eyJ...",
"refreshToken": "eyJ..."
}
Exchange a refresh token for a new JWT pair. Refresh tokens rotate on every use.
{
"refreshToken": "eyJ..."
}
{
"ok": true,
"accessToken": "eyJ...",
"refreshToken": "eyJ..."
}
Invalidate the current session. The access token expires naturally after 15 minutes.
{
"refreshToken": "eyJ..."
}
{ "ok": true }
All conversation endpoints accept JWT or API key via Authorization: Bearer.
List all conversations.
{
"ok": true,
"conversations": ["my-chat", "code-review"]
}
Create a new conversation. Choose a builtin preset or supply a custom system instruction. Optionally bind tools by name.
{
"name": "my-chat",
"config": "assistant" // "assistant" | "coder" | "concise" | "creative"
}
{
"name": "my-chat",
"config": "assistant",
"tools": ["datetime", "calculator"] // optional — bind tools by name
}
{
"name": "my-chat",
"systemInstruction": "You are a pirate. Respond only in pirate speak.",
"tools": ["datetime"] // tools work with custom instructions too
}
{
"ok": true,
"name": "my-chat",
"config": "assistant"
}
Retrieve full message history for a conversation, ordered oldest-first.
{
"ok": true,
"messages": [
{ "role": "user", "text": "Hello", "seq": 0, "createdAt": 1700000000000 },
{ "role": "model", "text": "Hi! How can I help?", "seq": 1, "createdAt": 1700000000000 }
]
}
Send a message and receive the full reply in a single blocking response. For streaming, use the WebSocket endpoint instead.
{
"message": "Explain binary search trees"
}
{
"ok": true,
"reply": "A binary search tree is a data structure..."
}
Permanently delete a conversation and all its message history.
{ "ok": true }
Tools let the model call server-side functions during inference. When a tool is bound to a conversation, the model can invoke it automatically — the entire call/response loop happens inside the SDK before any token reaches the client. From the client's perspective, the reply arrives as normal streaming tokens.
List all tools currently registered on the server. No authentication required.
{
"ok": true,
"tools": [
{
"name": "datetime",
"description": "Get the current date and time...",
"parameters": [
{ "name": "format", "type": "STRING", "description": "Date format pattern", "required": false },
{ "name": "timezone", "type": "STRING", "description": "IANA timezone ID", "required": false }
]
},
{
"name": "calculator",
"description": "Evaluate a mathematical expression...",
"parameters": [
{ "name": "expression", "type": "STRING", "description": "A math expression", "required": true }
]
}
]
}
Pass "tools" when creating a conversation to bind tools by name.
Tools are resolved from the server registry — unknown names are silently skipped.
Omit "tools" or pass [] for a tool-free conversation.
{
"name": "research-chat",
"config": "assistant",
"tools": ["datetime", "calculator"]
}
| Name | Description | Parameters |
|---|---|---|
| datetime | Returns the current date and time |
format (optional) — Java date pattern, e.g. yyyy-MM-dd HH:mm:sstimezone (optional) — IANA ID, e.g. Asia/Tokyo
|
| calculator | Evaluates a math expression and returns the result |
expression (required) — e.g. (3 + 5) * 2, Math.sqrt(16)
|
// 1. Create a conversation with tools bound POST /api/conversations { "name": "helper", "config": "assistant", "tools": ["datetime", "calculator"] } // 2. Send a message — tool calls happen inside the model, transparently WS /ws/conversations/helper?token=lrtlm_... → { "message": "What is 2 to the power of 10, and what time is it in Tokyo?" } // Model calls: calculator("Math.pow(2,10)") → "1024" // datetime(timezone="Asia/Tokyo") → "2026-04-11 18:30:00" // Then generates the final reply using both results. ← { "type": "token", "token": "2 to the power of 10 is 1024..." } ← { "type": "done" } // 3. The client sees only the final answer — tool calls are invisible
If a tool fails (bad parameters, runtime error), it returns a descriptive error string
to the model — e.g. "Error: could not evaluate expression '1/0': Division by zero".
The model reads this as the tool result and responds accordingly.
Tool errors never interrupt streaming or cause a 503.
Stream model replies token-by-token over a persistent WebSocket connection.
Pass your JWT access token or API key as the token query parameter.
The connection stays open across multiple turns — send a new message after receiving "done".
{ "message": "What is Kotlin coroutines?" }
{ "type": "token", "token": "Kotlin" }
{ "type": "token", "token": " coroutines" }
// ... one frame per token
{ "type": "done" }
{ "type": "error", "error": "Conversation 'my-chat' not found" }
const ws = new WebSocket(
'ws://localhost:8080/ws/conversations/my-chat?token=lrtlm_...'
);
ws.onopen = () => {
ws.send(JSON.stringify({ message: 'Hello!' }));
};
ws.onmessage = (evt) => {
const frame = JSON.parse(evt.data);
if (frame.type === 'token') process.stdout.write(frame.token);
if (frame.type === 'done') console.log('\n[done]');
if (frame.type === 'error') console.error('[error]', frame.error);
};
These endpoints require JWT only (Authorization: Bearer <accessToken>).
Generate a new API key. The raw key is returned once — store it immediately.
{ "name": "mobile-client" }
{
"ok": true,
"key": "lrtlm_...", // raw key — shown once only
"id": "uuid",
"prefix": "lrtlm_Xx",
"name": "mobile-client"
}
List all API keys (active and revoked). Raw keys are never returned.
{
"ok": true,
"keys": [
{
"id": "uuid",
"prefix": "lrtlm_Xx",
"name": "mobile-client",
"active": true,
"createdAt": 1700000000000,
"lastUsedAt": 1700000001000
}
]
}
Look up metadata for a specific key by its raw value.
Soft-revoke an API key. The record is kept for audit purposes.
{ "key": "lrtlm_..." }
{ "ok": true }
All errors follow the same structure:
{ "ok": false, "error": "Human-readable message" }
| Status | Meaning |
|---|---|
| 400 | Bad request — missing or invalid field |
| 401 | Unauthorized — missing, invalid, or expired token |
| 404 | Resource not found |
| 409 | Conflict — resource already exists |
| 503 | Engine not ready — model is still loading |