Responses
Generate model output with the OpenAI Responses–compatible API on 8080.
Use the Responses API when you want an OpenAI-style request/response flow built around input (string or structured items), optional instructions, tools, previous_response_id for multi-turn conversations, and a unified response object in the reply. The 8080 endpoint is designed to match the OpenAI Responses API.
Create a response: POST https://api.8080.io/v1/responses
curl https://api.8080.io/v1/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $_8080_API_KEY" \ -d '{ "model": "8080/taalas/llama3.1-8b-instruct", "input": "Tell me a short joke about APIs." }'import osimport requests
response = requests.post( "https://api.8080.io/v1/responses", headers={ "Authorization": f"Bearer {os.environ.get('_8080_API_KEY')}", "Content-Type": "application/json", }, json={ "model": "8080/taalas/llama3.1-8b-instruct", "input": "Tell me a short joke about APIs.", },)
if not response.ok: raise Exception(f"API Error {response.status_code}: {response.json()}")
print(response.json())input can be a string or an array of items with role and content (similar to chat messages). content may be a string or a list of typed parts (e.g., input_text).
Request body
Section titled “Request body”Core parameters
Section titled “Core parameters”Parameter | Type | Description |
|---|---|---|
model required | string | Model ID for the response (same IDs as chat completions where supported). |
input | string | array | User/system/developer content: plain string, or array of input items with role and content. |
instructions | string | High-level system-style instructions separate from input. |
conversation | string | object | Continue or scope a thread; may be a conversation id or { "id": "..." }. |
previous_response_id | string | Chain from a prior response for multi-turn flows. |
stream | boolean | Stream events with text/event-stream (OpenAI-style response events). |
temperature | number | Sampling temperature. |
top_p | number | Nucleus sampling. |
max_output_tokens | integer | Cap on generated output tokens. |
max_tool_calls | integer | Limit tool invocations per response. |
parallel_tool_calls | boolean | Allow parallel tool calls when true. |
tools | array | Tools the model may use (functions, file search, etc., as supported). |
tool_choice | string | object | auto, none, required, or force a specific function. |
text | object | Output shaping: format (text, json_object, json_schema) and verbosity. |
reasoning | object | Reasoning controls, e.g., effort: minimal, medium, high. |
truncation | string | auto or disabled. |
metadata | object | Opaque key/value metadata stored with the response. |
store | boolean | Whether the response is persisted for later retrieval. |
background | boolean | Run the request in the background when supported. |
user | string | End-user identifier for abuse tracking. |
log | boolean | When true, enable request tracing (see Logging). |
Field-level details match the OpenAPI ResponseCreateParams schema where implemented.
Response object
Section titled “Response object”Successful POST returns a response object with object: "response". Below is a representative JSON body (with // comments for documentation only — strip them if you paste into a strict JSON parser).
{ // Unique id for this response; use in GET /v1/responses/{id} or as previous_response_id "id": "resp_01abc123", "object": "response", // Unix timestamp (seconds) when the response was created "created_at": 1735689600.0, "model": "8080/taalas/llama3.1-8b-instruct", // completed | in_progress | failed | incomplete | cancelled | queued "status": "completed", "parallel_tool_calls": true, "tool_choice": "auto", // Tools that were available for this turn (echo of request; may be empty) "tools": [], "conversation": { // Conversation thread id when using the conversation feature "id": "conv_01xyz789" }, // Ordered list of model outputs: messages, function_call, reasoning, etc. "output": [ { "type": "message", "id": "msg_01def456", "role": "assistant", "content": [ { "type": "output_text", // Assistant-visible text; read this for the user-facing answer "text": "Why did the API go to therapy? Too many unresolved callbacks." } ] } ], "usage": { "input_tokens": 24, "output_tokens": 18, "total_tokens": 42, "input_tokens_details": { "cached_tokens": 0 }, "output_tokens_details": { "reasoning_tokens": 0 } }, // Present when the run failed or was cut short "error": null, "incomplete_details": null}Reading the model’s text: walk output for items with type: "message", then each content entry with type: "output_text"; the text field is what you show to the user. Tool calls appear as separate output items (e.g., type: "function_call") with name, arguments, and call_id for follow-up requests.
Other fields you may see include instructions, temperature, top_p, max_output_tokens, reasoning, text, metadata, previous_response_id, and background — often echoing the create request or state for in-progress responses.
Streaming
Section titled “Streaming”With "stream": true, the API returns SSE events (see OpenAI’s response event types: response.created, response.output_text.delta, response.completed, etc.). Handle the stream the same way you would for the OpenAI Responses API.
Tool calling
Section titled “Tool calling”Pass a tools array and handle function_call items in output, then call POST /v1/responses again with extended input (including function_call_output items) and the same tools, or use previous_response_id to continue the turn. See the Tool calling guide for patterns that map between chat-style tool loops and the Responses API.
See also
Section titled “See also”- Text completions — Chat Completions (
/v1/chat/completions) when you prefer a strict messages array. - OpenAI Responses API reference