Skip to content
GitHub Login

Text Generation

Create chat completions with the OpenAI-compatible chat API on 8080—requests, responses, and common options.

Chat completions are how you generate natural-language (or structured) output from a model by sending a conversation as a list of messages and receiving the model’s reply in a single HTTP response. On 8080, this matches the familiar POST /v1/chat/completions shape used by OpenAI’s Chat Completions API, so existing clients and patterns largely work unchanged.

  • An 8080 API key (see Quickstart).
  • A model ID your project can use (see Models). Examples below use 8080/taalas/llama3.1-8b-instruct.

Set your key for shell snippets:

export _8080_API_KEY="your-api-key"

Endpoint: POST https://api.8080.io/v1/chat/completions
Headers: Content-Type: application/json, Authorization: Bearer <api_key>
Body (minimum): model (string) and messages (array of message objects with role and content).

Each message has:

  • role — Who “spoke” the message: typically user, assistant, developer, or system (see message types below).
  • content — The text of that turn (string, or structured content if your model supports it).
curl https://api.8080.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $_8080_API_KEY" \
-d '{
"model": "8080/taalas/llama3.1-8b-instruct",
"messages": [
{ "role": "user", "content": "Say hello in one short sentence." }
]
}'

A successful JSON body includes:

  • choices — Usually one entry; use choices[0].message.content for the assistant’s text.
  • choices[0].finish_reason — Why generation stopped (e.g., stop, length, tool_calls).
  • usage — Token counts (prompt_tokens, completion_tokens, total_tokens) for billing and debugging.

Errors follow the same error object style as OpenAI (error.message, etc.).

Send the full history in messages: prior user / assistant turns in order, then the new user message. The model only sees what you pass in this request—there is no server-side session unless you build one.

{
"model": "8080/taalas/llama3.1-8b-instruct",
"messages": [
{ "role": "user", "content": "My name is Alex." },
{ "role": "assistant", "content": "Nice to meet you, Alex!" },
{ "role": "user", "content": "What is my name?" }
]
}

You can steer behavior without changing the endpoint:

OptionPurpose
temperatureRandomness (0–2); higher = more varied.
max_completion_tokensCap on tokens in the assistant reply.
stopStop when the model emits one of these strings (string or array).
streamtrue to receive Server-Sent Events chunks instead of one JSON body.
toolsDeclarative function calling; see Tool calling.

For the full parameter list (penalties, n, seed, reasoning_effort, etc.), see Text completions (reference-style table).

  • user — End-user or application input.
  • assistant — Prior model outputs you include for context.
  • developer — Strong instructions (preferred on newer models where supported).
  • system — Legacy system-style instructions; prefer developer when the model supports it.
  • tool — Results of tool calls when using the tools API.