Text Generation
Create chat completions with the OpenAI-compatible chat API on 8080—requests, responses, and common options.
Chat completions are how you generate natural-language (or structured) output from a model by sending a conversation as a list of messages and receiving the model’s reply in a single HTTP response. On 8080, this matches the familiar POST /v1/chat/completions shape used by OpenAI’s Chat Completions API, so existing clients and patterns largely work unchanged.
Prerequisites
Section titled “Prerequisites”- An 8080 API key (see Quickstart).
- A model ID your project can use (see Models). Examples below use
8080/taalas/llama3.1-8b-instruct.
Set your key for shell snippets:
export _8080_API_KEY="your-api-key"Create a chat completion
Section titled “Create a chat completion”Endpoint: POST https://api.8080.io/v1/chat/completions
Headers: Content-Type: application/json, Authorization: Bearer <api_key>
Body (minimum): model (string) and messages (array of message objects with role and content).
Each message has:
role— Who “spoke” the message: typicallyuser,assistant,developer, orsystem(see message types below).content— The text of that turn (string, or structured content if your model supports it).
Example: first request
Section titled “Example: first request”curl https://api.8080.io/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $_8080_API_KEY" \ -d '{ "model": "8080/taalas/llama3.1-8b-instruct", "messages": [ { "role": "user", "content": "Say hello in one short sentence." } ] }'import osimport requests
url = "https://api.8080.io/v1/chat/completions"headers = { "Authorization": f"Bearer {os.environ.get('_8080_API_KEY')}", "Content-Type": "application/json",}payload = { "model": "8080/taalas/llama3.1-8b-instruct", "messages": [ {"role": "user", "content": "Say hello in one short sentence."}, ],}
response = requests.post(url, headers=headers, json=payload, timeout=120)response.raise_for_status()data = response.json()print(data["choices"][0]["message"]["content"])Read the response
Section titled “Read the response”A successful JSON body includes:
choices— Usually one entry; usechoices[0].message.contentfor the assistant’s text.choices[0].finish_reason— Why generation stopped (e.g.,stop,length,tool_calls).usage— Token counts (prompt_tokens,completion_tokens,total_tokens) for billing and debugging.
Errors follow the same error object style as OpenAI (error.message, etc.).
Multi-turn conversations
Section titled “Multi-turn conversations”Send the full history in messages: prior user / assistant turns in order, then the new user message. The model only sees what you pass in this request—there is no server-side session unless you build one.
{ "model": "8080/taalas/llama3.1-8b-instruct", "messages": [ { "role": "user", "content": "My name is Alex." }, { "role": "assistant", "content": "Nice to meet you, Alex!" }, { "role": "user", "content": "What is my name?" } ]}Common request options
Section titled “Common request options”You can steer behavior without changing the endpoint:
| Option | Purpose |
|---|---|
temperature | Randomness (0–2); higher = more varied. |
max_completion_tokens | Cap on tokens in the assistant reply. |
stop | Stop when the model emits one of these strings (string or array). |
stream | true to receive Server-Sent Events chunks instead of one JSON body. |
tools | Declarative function calling; see Tool calling. |
For the full parameter list (penalties, n, seed, reasoning_effort, etc.), see Text completions (reference-style table).
Message roles
Section titled “Message roles”user— End-user or application input.assistant— Prior model outputs you include for context.developer— Strong instructions (preferred on newer models where supported).system— Legacy system-style instructions; preferdeveloperwhen the model supports it.tool— Results of tool calls when using the tools API.
Compatibility and further reading
Section titled “Compatibility and further reading”- Compatibility — OpenAI client base URL and supported endpoints.
- OpenAI Chat Completions reference — Field-level behavior where 8080 matches the upstream API.