Skip to content
GitHub Login

Responses

Generate model output with the OpenAI Responses–compatible API on 8080.

Use the Responses API when you want an OpenAI-style request/response flow built around input (string or structured items), optional instructions, tools, previous_response_id for multi-turn conversations, and a unified response object in the reply. The 8080 endpoint is designed to match the OpenAI Responses API.

Create a response: POST https://api.8080.io/v1/responses

curl https://api.8080.io/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $_8080_API_KEY" \
-d '{
"model": "8080/taalas/llama3.1-8b-instruct",
"input": "Tell me a short joke about APIs."
}'

input can be a string or an array of items with role and content (similar to chat messages). content may be a string or a list of typed parts (e.g., input_text).

Parameter
TypeDescription
model requiredstringModel ID for the response (same IDs as chat completions where supported).
inputstring | arrayUser/system/developer content: plain string, or array of input items with role and content.
instructionsstringHigh-level system-style instructions separate from input.
conversationstring | objectContinue or scope a thread; may be a conversation id or { "id": "..." }.
previous_response_idstringChain from a prior response for multi-turn flows.
streambooleanStream events with text/event-stream (OpenAI-style response events).
temperaturenumberSampling temperature.
top_pnumberNucleus sampling.
max_output_tokensintegerCap on generated output tokens.
max_tool_callsintegerLimit tool invocations per response.
parallel_tool_callsbooleanAllow parallel tool calls when true.
toolsarrayTools the model may use (functions, file search, etc., as supported).
tool_choicestring | objectauto, none, required, or force a specific function.
textobjectOutput shaping: format (text, json_object, json_schema) and verbosity.
reasoningobjectReasoning controls, e.g., effort: minimal, medium, high.
truncationstringauto or disabled.
metadataobjectOpaque key/value metadata stored with the response.
storebooleanWhether the response is persisted for later retrieval.
backgroundbooleanRun the request in the background when supported.
userstringEnd-user identifier for abuse tracking.
logbooleanWhen true, enable request tracing (see Logging).

Field-level details match the OpenAPI ResponseCreateParams schema where implemented.

Successful POST returns a response object with object: "response". Below is a representative JSON body (with // comments for documentation only — strip them if you paste into a strict JSON parser).

{
// Unique id for this response; use in GET /v1/responses/{id} or as previous_response_id
"id": "resp_01abc123",
"object": "response",
// Unix timestamp (seconds) when the response was created
"created_at": 1735689600.0,
"model": "8080/taalas/llama3.1-8b-instruct",
// completed | in_progress | failed | incomplete | cancelled | queued
"status": "completed",
"parallel_tool_calls": true,
"tool_choice": "auto",
// Tools that were available for this turn (echo of request; may be empty)
"tools": [],
"conversation": {
// Conversation thread id when using the conversation feature
"id": "conv_01xyz789"
},
// Ordered list of model outputs: messages, function_call, reasoning, etc.
"output": [
{
"type": "message",
"id": "msg_01def456",
"role": "assistant",
"content": [
{
"type": "output_text",
// Assistant-visible text; read this for the user-facing answer
"text": "Why did the API go to therapy? Too many unresolved callbacks."
}
]
}
],
"usage": {
"input_tokens": 24,
"output_tokens": 18,
"total_tokens": 42,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 0
}
},
// Present when the run failed or was cut short
"error": null,
"incomplete_details": null
}

Reading the model’s text: walk output for items with type: "message", then each content entry with type: "output_text"; the text field is what you show to the user. Tool calls appear as separate output items (e.g., type: "function_call") with name, arguments, and call_id for follow-up requests.

Other fields you may see include instructions, temperature, top_p, max_output_tokens, reasoning, text, metadata, previous_response_id, and background — often echoing the create request or state for in-progress responses.

With "stream": true, the API returns SSE events (see OpenAI’s response event types: response.created, response.output_text.delta, response.completed, etc.). Handle the stream the same way you would for the OpenAI Responses API.

Pass a tools array and handle function_call items in output, then call POST /v1/responses again with extended input (including function_call_output items) and the same tools, or use previous_response_id to continue the turn. See the Tool calling guide for patterns that map between chat-style tool loops and the Responses API.