Text Generation

Create chat completions with the OpenAI-compatible chat API on 8080—requests, responses, and common options.

Chat completions are how you generate natural-language (or structured) output from a model by sending a conversation as a list of messages and receiving the model’s reply in a single HTTP response. On 8080, this matches the familiar POST /v1/chat/completions shape used by OpenAI’s Chat Completions API, so existing clients and patterns largely work unchanged.

Prerequisites

An 8080 API key (see Quickstart).
A model ID your project can use (see Models). Examples below use 8080/taalas/llama3.1-8b-instruct.

Set your key for shell snippets:

export _8080_API_KEY="your-api-key"

Create a chat completion

Endpoint: POST https://api.8080.io/v1/chat/completions
Headers: Content-Type: application/json, Authorization: Bearer <api_key>
Body (minimum): model (string) and messages (array of message objects with role and content).

Each message has:

role — Who “spoke” the message: typically user, assistant, developer, or system (see message types below).
content — The text of that turn (string, or structured content if your model supports it).

curl https://api.8080.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $_8080_API_KEY" \
  -d '{
    "model": "8080/taalas/llama3.1-8b-instruct",
    "messages": [
      { "role": "user", "content": "Say hello in one short sentence." }
    ]
  }'

import os
import requests

url = "https://api.8080.io/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {os.environ.get('_8080_API_KEY')}",
    "Content-Type": "application/json",
}
payload = {
    "model": "8080/taalas/llama3.1-8b-instruct",
    "messages": [
        {"role": "user", "content": "Say hello in one short sentence."},
    ],
}

response = requests.post(url, headers=headers, json=payload, timeout=120)
response.raise_for_status()
data = response.json()
print(data["choices"][0]["message"]["content"])

Read the response

A successful JSON body includes:

choices — Usually one entry; use choices[0].message.content for the assistant’s text.
choices[0].finish_reason — Why generation stopped (e.g., stop, length, tool_calls).
usage — Token counts (prompt_tokens, completion_tokens, total_tokens) for billing and debugging.

Errors follow the same error object style as OpenAI (error.message, etc.).

Multi-turn conversations

Send the full history in messages: prior user / assistant turns in order, then the new user message. The model only sees what you pass in this request—there is no server-side session unless you build one.

{
  "model": "8080/taalas/llama3.1-8b-instruct",
  "messages": [
    { "role": "user", "content": "My name is Alex." },
    { "role": "assistant", "content": "Nice to meet you, Alex!" },
    { "role": "user", "content": "What is my name?" }
  ]
}

Common request options

You can steer behavior without changing the endpoint:

Option	Purpose
`temperature`	Randomness (0–2); higher = more varied.
`max_completion_tokens`	Cap on tokens in the assistant reply.
`stop`	Stop when the model emits one of these strings (string or array).
`stream`	`true` to receive Server-Sent Events chunks instead of one JSON body.
`tools`	Declarative function calling; see Tool calling.

For the full parameter list (penalties, n, seed, reasoning_effort, etc.), see Text completions (reference-style table).

Message roles

user — End-user or application input.
assistant — Prior model outputs you include for context.
developer — Strong instructions (preferred on newer models where supported).
system — Legacy system-style instructions; prefer developer when the model supports it.
tool — Results of tool calls when using the tools API.

Compatibility and further reading

Compatibility — OpenAI client base URL and supported endpoints.
OpenAI Chat Completions reference — Field-level behavior where 8080 matches the upstream API.