Responses

Generate model output with the OpenAI Responses–compatible API on 8080.

Use the Responses API when you want an OpenAI-style request/response flow built around input (string or structured items), optional instructions, tools, previous_response_id for multi-turn conversations, and a unified response object in the reply. The 8080 endpoint is designed to match the OpenAI Responses API.

API

Create a response: POST https://api.8080.io/v1/responses

cURL
Python

curl https://api.8080.io/v1/responses \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $_8080_API_KEY" \
    -d '{
        "model": "8080/taalas/llama3.1-8b-instruct",
        "input": "Tell me a short joke about APIs."
    }'

import os
import requests

response = requests.post(
    "https://api.8080.io/v1/responses",
    headers={
        "Authorization": f"Bearer {os.environ.get('_8080_API_KEY')}",
        "Content-Type": "application/json",
    },
    json={
        "model": "8080/taalas/llama3.1-8b-instruct",
        "input": "Tell me a short joke about APIs.",
    },
)

if not response.ok:
    raise Exception(f"API Error {response.status_code}: {response.json()}")

print(response.json())

input can be a string or an array of items with role and content (similar to chat messages). content may be a string or a list of typed parts (e.g., input_text).

Request body

Core parameters

Parameter	Type	Description
`model` required	string	Model ID for the response (same IDs as chat completions where supported).
`input`	string \| array	User/system/developer content: plain string, or array of input items with `role` and `content`.
`instructions`	string	High-level system-style instructions separate from `input`.
`conversation`	string \| object	Continue or scope a thread; may be a conversation `id` or `{ "id": "..." }`.
`previous_response_id`	string	Chain from a prior response for multi-turn flows.
`stream`	boolean	Stream events with `text/event-stream` (OpenAI-style response events).
`temperature`	number	Sampling temperature.
`top_p`	number	Nucleus sampling.
`max_output_tokens`	integer	Cap on generated output tokens.
`max_tool_calls`	integer	Limit tool invocations per response.
`parallel_tool_calls`	boolean	Allow parallel tool calls when `true`.
`tools`	array	Tools the model may use (functions, file search, etc., as supported).
`tool_choice`	string \| object	`auto`, `none`, `required`, or force a specific function.
`text`	object	Output shaping: `format` (`text`, `json_object`, `json_schema`) and `verbosity`.
`reasoning`	object	Reasoning controls, e.g., `effort`: `minimal`, `medium`, `high`.
`truncation`	string	`auto` or `disabled`.
`metadata`	object	Opaque key/value metadata stored with the response.
`store`	boolean	Whether the response is persisted for later retrieval.
`background`	boolean	Run the request in the background when supported.
`user`	string	End-user identifier for abuse tracking.
`log`	boolean	When `true`, enable request tracing (see Logging).

Field-level details match the OpenAPI ResponseCreateParams schema where implemented.

Response object

Successful POST returns a response object with object: "response". Below is a representative JSON body (with // comments for documentation only — strip them if you paste into a strict JSON parser).

{
  // Unique id for this response; use in GET /v1/responses/{id} or as previous_response_id
  "id": "resp_01abc123",
  "object": "response",
  // Unix timestamp (seconds) when the response was created
  "created_at": 1735689600.0,
  "model": "8080/taalas/llama3.1-8b-instruct",
  // completed | in_progress | failed | incomplete | cancelled | queued
  "status": "completed",
  "parallel_tool_calls": true,
  "tool_choice": "auto",
  // Tools that were available for this turn (echo of request; may be empty)
  "tools": [],
  "conversation": {
    // Conversation thread id when using the conversation feature
    "id": "conv_01xyz789"
  },
  // Ordered list of model outputs: messages, function_call, reasoning, etc.
  "output": [
    {
      "type": "message",
      "id": "msg_01def456",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          // Assistant-visible text; read this for the user-facing answer
          "text": "Why did the API go to therapy? Too many unresolved callbacks."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 24,
    "output_tokens": 18,
    "total_tokens": 42,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  // Present when the run failed or was cut short
  "error": null,
  "incomplete_details": null
}

Reading the model’s text: walk output for items with type: "message", then each content entry with type: "output_text"; the text field is what you show to the user. Tool calls appear as separate output items (e.g., type: "function_call") with name, arguments, and call_id for follow-up requests.

Other fields you may see include instructions, temperature, top_p, max_output_tokens, reasoning, text, metadata, previous_response_id, and background — often echoing the create request or state for in-progress responses.

Streaming

With "stream": true, the API returns SSE events (see OpenAI’s response event types: response.created, response.output_text.delta, response.completed, etc.). Handle the stream the same way you would for the OpenAI Responses API.

Tool calling

Pass a tools array and handle function_call items in output, then call POST /v1/responses again with extended input (including function_call_output items) and the same tools, or use previous_response_id to continue the turn. See the Tool calling guide for patterns that map between chat-style tool loops and the Responses API.