<SYSTEM>This is the full developer documentation for 8080 API & SDK</SYSTEM>

# Deploying code

> Configure Edge deployments with 8080.yaml—entrypoint, project, and resource limits for Python apps on 8080 Edge.

Edge projects use a small manifest file at the **repository root**—typically **`8080.yaml`** (sometimes `8080.yml`)—that tells the platform how to run your application. The CLI (`8080 init`, `8080 deploy`) reads this file when building and deploying.

## Runtime

[Section titled “Runtime”](#runtime)

Deployments today use a **Python** runtime: you provide an ASGI application (for example built with the `e80` SDK and `eighty80_app()`). **TypeScript and JavaScript** application runtimes are **coming soon**; this page documents the current Python-oriented manifest.

## Example `8080.yaml`

[Section titled “Example 8080.yaml”](#example-8080yaml)

```yaml
entrypoint: handlers.main:app
organization_slug: your-organization
project: examples
project_slug: examples
cpu_mhz: 500
gpu_count: 0
memory_size_mb: 1024
```

## Field reference

[Section titled “Field reference”](#field-reference)

| Field               | Type    | Description                                                                                                                                                                                                                                                                                                           |
| ------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `entrypoint`        | string  | Python import path to the ASGI app, in the form `module.path:variable`. The module path uses dots for packages (e.g., `handlers.main` → `handlers/main.py` or package layout). The variable is the app instance (e.g., FastAPI / Starlette / SDK app). Example: `handlers.main:app` loads `app` from `handlers.main`. |
| `organization_slug` | string  | Organization identifier in 8080 (matches your team/org in the dashboard and CLI context).                                                                                                                                                                                                                             |
| `project`           | string  | Human-readable project name (display and grouping).                                                                                                                                                                                                                                                                   |
| `project_slug`      | string  | Stable project slug used in URLs, routing, and hosting (e.g., `{project_slug}.hosted.8080.io` patterns where applicable).                                                                                                                                                                                             |
| `cpu_mhz`           | number  | CPU allocation for the deployment, in **megahertz** (e.g., `500`).                                                                                                                                                                                                                                                    |
| `gpu_count`         | integer | Number of GPUs attached to each instance. Use `0` for CPU-only workloads.                                                                                                                                                                                                                                             |
| `memory_size_mb`    | integer | RAM limit per instance in **megabytes** (e.g., `1024` for 1 GB).                                                                                                                                                                                                                                                      |

## Creating or editing the file

[Section titled “Creating or editing the file”](#creating-or-editing-the-file)

1. Run **`8080 init`** in an empty directory to scaffold a project, or add **`8080.yaml`** yourself next to your Python package and handler code.
2. Set **`entrypoint`** to the module and app object that exposes your HTTP API (must match how you run locally with `8080 dev`).
3. Align **`organization_slug`**, **`project`**, and **`project_slug`** with your org and project in the 8080 dashboard.
4. Tune **`cpu_mhz`**, **`gpu_count`**, and **`memory_size_mb`** to your latency and throughput needs; invalid or unsupported combinations may be rejected at deploy time.

For a full walkthrough of a minimal handler and local run, see [Custom endpoints](/guides/custom-endpoints).

# Sandboxes

> Isolated environments for code execution on 8080 Edge — create, list, connect, and stop sandboxes via the CLI or API.

Sandboxes are **short-lived, isolated compute environments** tied to your 8080 project. They sit **next to** inference and Edge infrastructure so you can **execute code**—including **AI-generated** or agent-driven Python—safely, with access to the file system and utilities, without running that workload on end-user devices or a distant server.

## CLI overview

[Section titled “CLI overview”](#cli-overview)

The `8080 sandbox` command group manages sandboxes from your terminal.

**Prerequisites:** Install and log in with the **8080 CLI** (see [Quickstart](/getting-started/quickstart#cli)). Run commands from a project directory that you initialized with `8080 init`.

### Typical workflow

[Section titled “Typical workflow”](#typical-workflow)

**`8080 sandbox create`** starts a new sandbox and prints its **ID**. Use that ID with **`8080 sandbox connect`** and **`8080 sandbox stop`** when you are done.

```bash
8080 sandbox create
8080 sandbox connect <sandbox_id>
8080 sandbox list
8080 sandbox stop <sandbox_id>
```

Run `8080 sandbox --help` for options (image, resources, labels, etc., as your CLI version exposes). Use `8080 sandbox connect` to open an **SSH-like interactive session** into a running sandbox; replace `<sandbox_id>` with the ID from `create` or `list`.

## Using the SDK

[Section titled “Using the SDK”](#using-the-sdk)

The 8080 SDK offers an easy-to-use interface for creating and running code in a sandbox. To get started adding sandbox functionality to your 8080 application, add code like the following that implements tool calling with Python code execution:

```python
from e80_sdk import eighty80_app


app = eighty80_app()
api = app.completion_sdk()


@app.tool
async def run_python(code: str) -> str:
    """
    Execute Python code in a sandboxed environment and return the result.
    Args:
        code: Python code to execute. Use print() to output results.
    """
    async with app.sandbox() as sandbox:
        return sandbox.run_python(code)


@app.endpoint("/chat")
async def chat(request: app.Request):
    body = await request.json()
    body["tools"] = app.get_tools()
    return api.chat.completions.create(**body)
```

## REST API

[Section titled “REST API”](#rest-api)

You can **create**, **list**, and **stop** sandboxes over HTTPS with your 8080 API key (same Bearer token as the inference API). Paths below follow the usual `https://api.8080.io/v1/...` pattern; field names may match the CLI’s JSON output—check the [API reference](/api-reference) for the latest schemas.

### Create a sandbox

[Section titled “Create a sandbox”](#create-a-sandbox)

`POST /v1/sandboxes` creates a sandbox for the project associated with your credentials (or the project you pass in the body, if supported).

```bash
curl -s -X POST "https://api.8080.io/v1/sandboxes" \
  -H "Authorization: Bearer $_8080_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'
```

The response includes an `id` (and usually `status`). Use that `id` for connect-oriented flows and for stop.

### List sandboxes

[Section titled “List sandboxes”](#list-sandboxes)

`GET /v1/sandboxes` returns sandboxes for your project.

```bash
curl -s "https://api.8080.io/v1/sandboxes" \
  -H "Authorization: Bearer $_8080_API_KEY"
```

### Stop a sandbox

[Section titled “Stop a sandbox”](#stop-a-sandbox)

Stop a specific sandbox by ID (exact method and path may be `POST` with a `stop` action—align with the published OpenAPI spec).

```bash
curl -s -X POST "https://api.8080.io/v1/sandboxes/<sandbox_id>/stop" \
  -H "Authorization: Bearer $_8080_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'
```

### Connecting via the API

[Section titled “Connecting via the API”](#connecting-via-the-api)

**Interactive `connect` sessions** are negotiated by the CLI (authentication, routing, and session tokens). If you need programmatic access, use the endpoints documented for **sandbox sessions** or **exec** in the API reference, or continue to use `8080 sandbox connect <sandbox_id>` for a terminal UI.

## See also

[Section titled “See also”](#see-also)

* [Custom endpoints](/guides/custom-endpoints) — Projects, `8080 dev`, and deploying handlers
* [Deploying code](/edge/deploying-code) — `8080.yaml` and deployment configuration
* [Logging and traces](/inference/logging) — Request logging for API and edge debugging

# Edge Compute

> Run your application next to 8080 inference—ASIC-backed LLMs and general compute on the same fabric for minimal end-to-end latency.

**Edge Compute** is 8080’s way of running **your code on the same network and infrastructure** that serves inference. Models run on **ASIC accelerators** tuned for LLM workloads; your handlers, gateways, and tools run on **general-purpose compute** provisioned **alongside** that stack—not in a distant region reached only over the public internet.

8080 **datacenters are placed in closest physical proximity** to major **AWS** and **GCP** cloud regions. That keeps the path between **applications you run in those clouds** and **8080 Edge and inference** as short as possible—fewer miles and fewer intermediate hops than if inference lived far from your existing regional footprint—so hybrid setups still get **minimal distance** for traffic to and from your hyperscaler workloads.

## Why it matters

[Section titled “Why it matters”](#why-it-matters)

When a user request hits your app and your app calls the inference API, every hop across the open internet adds milliseconds (often tens or hundreds) of delay you cannot fully control. On Edge Compute, orchestration keeps **your application logic and the accelerator-backed inference path** on a **co-located fabric**, so the round trip from your code to the model and back is **as short as the platform allows**. That makes **end-to-end latencies**—browser or client through your logic, into the model, and out again—**practically impossible to reproduce** if you host the app elsewhere and only call `api.8080.io` over the public internet.

Use Edge when you care about **agents**, **multi-step flows**, **RAG**, **custom routing**, or any pattern where **many small model calls** or **tight coupling** between business logic and inference would otherwise dominate latency.

## SDKs

[Section titled “SDKs”](#sdks)

* **Python** — The primary SDK today packages the FastAPI-style app builder, tools, sandboxes, and helpers to call completions from the same environment you deploy. See [Custom endpoints](/guides/custom-endpoints) to get started.
* **TypeScript** — A **TypeScript/JavaScript** SDK is **coming soon**; it will follow the same idea—define endpoints and deploy them next to inference—once the runtime support lands (see [Deploying code](/edge/deploying-code) for the current manifest and Python-oriented workflow).

## Next steps

[Section titled “Next steps”](#next-steps)

* **[Deploying code](/edge/deploying-code)** — `8080.yaml`, entrypoints, resources, and how deployments are configured.
* **[Sandboxes](/edge/sandboxes)** — Isolated environments for code execution and tooling next to your Edge app.
* **[Custom endpoints](/guides/custom-endpoints)** — Build and run a minimal Python service locally, then deploy with the CLI.

# Inference

> Overview of the 8080 Inference API—authentication, OpenAI compatibility, chat completions, responses, and batch jobs.

The **Inference API** is the hosted LLM surface at `https://api.8080.io`. It follows **OpenAI-compatible** paths and payloads where possible so you can use familiar clients, SDKs, and patterns. This page summarizes how to authenticate and which capabilities to use; each topic links to a dedicated guide.

## Authentication

[Section titled “Authentication”](#authentication)

Every request must include your API key as a **Bearer token** in the `Authorization` header:

```http
Authorization: Bearer YOUR_API_KEY
```

Create and manage keys in the [8080 dashboard](https://app.8080.io). For local examples in these docs, the env var `_8080_API_KEY` is used consistently:

```bash
export _8080_API_KEY="your-api-key-here"
```

All inference routes are under **`https://api.8080.io`** (HTTPS). There is no trailing path segment on the host; versioned routes start with **`/v1/...`**.

## Compatibility

[Section titled “Compatibility”](#compatibility)

8080 implements a **subset** of the OpenAI HTTP API: same general request shapes for chat completions, responses, models, files, and batches, but not every OpenAI-only feature may be available. You can point official OpenAI clients at 8080 by setting the base URL and API key—see **[Compatibility](/inference/compatibility)** for environment variables, the Python client, and the list of supported endpoints.

Quick configuration for tools that expect OpenAI’s variables:

```bash
export OPENAI_API_KEY="your_8080_api_key_here"
export OPENAI_BASE_URL="https://api.8080.io/v1"
```

## Chat completions

[Section titled “Chat completions”](#chat-completions)

**Chat completions** use the familiar **messages** array (`system` / `developer` / `user` / `assistant` / `tool`) and return chat completion objects, including optional **tool calling** and streaming.

* **Endpoint:** `POST /v1/chat/completions`
* **Guide:** [Text Generation](/guides/text-generation) — step-by-step: building requests, reading responses, multi-turn, and common options.
* **Reference:** [Completions](/inference/completions) — full parameter table, message types, and tool calling link.

Use this API when you want parity with OpenAI’s Chat Completions product and existing integrations that call `/v1/chat/completions`.

## Responses

[Section titled “Responses”](#responses)

The **Responses** API is an alternative interface built around **`input`** (string or structured items), optional **`instructions`**, **`tools`**, **`previous_response_id`** for multi-turn flows, and a unified **response** object (including **`output`** items such as `message` / `function_call`).

* **Endpoints:** `POST /v1/responses`, `GET /v1/responses/{response_id}`
* **Guide:** [Responses](/inference/responses) — request fields, annotated response JSON, streaming, and tool calling notes.

Use this API when you prefer OpenAI’s Responses model or need features that map naturally to the response/output item shape.

## Batch

[Section titled “Batch”](#batch)

**Batch** jobs let you submit many requests asynchronously (e.g., thousands of chat completion calls) via a **JSONL** input file uploaded through the **Files** API, then poll and download results—aligned with OpenAI’s batch workflow.

* **Flow:** upload file (`POST /v1/files`) → create batch (`POST /v1/batches`) → poll status → download output/error files.
* **Guide:** [Batch](/inference/batch) — JSONL format, examples, listing and cancelling batches.

Use batches for offline evaluation, backfills, or any workload that does not need an immediate per-request response.

## Next steps

[Section titled “Next steps”](#next-steps)

* Pick a model from **[Models](/getting-started/models)**.
* Follow **[Quickstart](/getting-started/quickstart)** to create a key and send a first request.
* Browse the **[API reference](/api-reference)** for full schemas.

# Models

> List of models available in 8080.

## Available Models

[Section titled “Available Models”](#available-models)

8080 is currently in private beta and is offering only a limited number of models in our API:

| Model Name                         | Description           | Max Tokens | Rate Limits (RPM / TPM) |
| ---------------------------------- | --------------------- | ---------- | ----------------------- |
| `8080/taalas/llama3.1-8b-instruct` | Smallest model, fast  | 25k        | 250 RPM / 100k TPM      |
| `8080/llm_server/gpt-oss-20b`      | Small model, balanced | 64k        | 250 RPM / 100k TPM      |

For pricing and quotas for each model, see **[8080 pricing](https://www.8080.io/pricing)**.

# Quickstart

> A guide to getting started with 8080.

This guide will walk you through the process of getting started with 8080, from creating your account to sending your first API request.

Begin by signing up for an account on 8080 by visiting the [sign-up page](https://app.8080.io/accounts/signup/). Registration is currently limited to an invite-only group of people. If you would like to request an invitation, please request one [here](https://www.8080.io/).

## REST API

[Section titled “REST API”](#rest-api)

Integrate directly with the [OpenAI-compatible](/inference/compatibility) API to generate chat completions, responses, and more. For more details on the API, refer to the [API Reference](/api-reference/) section.

1. ### Generate an API Key

   [Section titled “Generate an API Key”](#generate-an-api-key)

   Once you have signed up, go to your organization settings and generate a new key.

   Optionally, for convenience, set the following environment variable that’s used throughout example code snippets:

   ```bash
   export _8080_API_KEY="your new 8080 API key"
   ```

2. ### Send a test request

   [Section titled “Send a test request”](#send-a-test-request)

   With your API key, you can now cURL the 8080 API like so:

   ```bash
   curl https://api.8080.io/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $_8080_API_KEY" \
      -d '{
         "model": "8080/taalas/llama3.1-8b-instruct",
         "messages": [
            {
               "role": "user",
               "content": "why is the sky blue?"
            }
         ]
      }'
   ```

## CLI

[Section titled “CLI”](#cli)

For optimal performance, deploy your code to the 8080 Edge compute cloud. In doing so, your application will be optimized for the lowest latencies possible when calling into the 8080 Inference API. To get started in deploying code to the edge, follow the instructions below:

1. ### Install the 8080 CLI

   [Section titled “Install the 8080 CLI”](#install-the-8080-cli)

   We recommend using the package manager [uv](https://docs.astral.sh/uv/) for installing the [`e80`](https://pypi.org/project/e80) Python package:

   ```bash
   uv tool install e80
   ```

2. ### Configure

   [Section titled “Configure”](#configure)

   Log in to your account and fetch an API token to use:

   ```bash
   8080 login
   ```

3. ### Test commands

   [Section titled “Test commands”](#test-commands)

   Fetch a list of your available models to confirm your access token was saved successfully:

   ```bash
   8080 models
   ```

To learn more about deploying code to 8080, see [Custom endpoints](/guides/custom-endpoints) and [Deploying code](/edge/deploying-code).

# Custom Endpoints

> Deploy custom API endpoints to the 8080 Edge cloud (Edge deployment guide).

This guide will walk you through the process of getting started with deploying your first project to the 8080 Edge cloud, with a simple example of a chat completions API wrapper.

## Prerequisites

[Section titled “Prerequisites”](#prerequisites)

First you’ll need to have the 8080 CLI installed and authenticated. In order to successfully deploy your code, you’ll also need to have Docker installed on your system.

## Create a project

[Section titled “Create a project”](#create-a-project)

Next step is to initialize a project:

```bash
# first create a new directory and cd into it
mkdir chat && cd chat


# run the 8080 init to create project files
8080 init
```

## Local development

[Section titled “Local development”](#local-development)

You should now see some files created by the script to help you get started, but let’s direct our attention to the `main.py` file.

```python
from e80_sdk import Eighty80, eighty80_app
from fastapi import Request


app = eighty80_app()


# Get an OpenAI SDK-compatible object to talk to the 8080 API
api = Eighty80().completion_sdk()


@app.post("/v1/chat/completions")
async def completions(request: Request):
    body = await request.json()
    return api.chat.completions.create(**body)
```

In this example, you’ll see a very simple web application exposing a single endpoint `/v1/chat/completions`, essentially serving as a thin wrapper around the OpenAI-compatible API with your own custom code.

For example, try prepending a system prompt that gets added to every completion request:

```python
@app.post("/v1/chat/completions")
async def completions(request: Request):
    body = await request.json()
    body['messages'].insert(0, {
        "role": "developer",
        "content": "Only respond in pig latin no matter what the user prompts."
    })
    response = api.chat.completions.create(**body)
    return response
```

## Running locally

[Section titled “Running locally”](#running-locally)

To run this app locally you can do so with the `8080 dev` CLI command:

```text
❯ 8080 dev
Fetching secrets for local development for: chat
INFO:     Will watch for changes in these directories: ['~/projects/chat']
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO:     Started reloader process [4851] using StatReload
INFO:     Started server process [4854]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
```

If you already have a process running on port 8080, you can specify a different one with the `--port` option.

Try cURLing your application locally:

```bash
curl http://localhost:8080/v1/chat/completions \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "8080/taalas/llama3.1-8b-instruct",
    "messages": [
      {
        "role": "user",
        "content": "Hello, world"
      }
    ]
  }'
```

## Deploy to 8080

[Section titled “Deploy to 8080”](#deploy-to-8080)

Once ready, simply run the following to start a deploy to 8080:

```bash
8080 deploy
```

Watch the progress of your deployment by following the URL returned. Once the deployment successfully completes, your application will now be accessible at `{your_project_slug}.hosted.8080.io`.

## Making requests

[Section titled “Making requests”](#making-requests)

This application requires authentication using an API key. Connect to this app by including one of your project’s API keys:

```bash
curl https://{{ your_project_slug }}.hosted.8080.io/v1/chat/completions \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $_8080_API_KEY" \
  -d '{
    "model": "8080/taalas/llama3.1-8b-instruct",
    "messages": [
      {
        "role": "user",
        "content": "Hello, world"
      }
    ]
  }'
```

Alternatively, send a request with the CLI:

```bash
8080 apikey set $_8080_API_KEY
8080 call v1/chat/completions '{
  "model": "8080/taalas/llama3.1-8b-instruct",
  "messages": [
    {
      "role": "developer",
      "content": "Hello, world"
    }
  ]
}'
```

## OpenAI compatibility

[Section titled “OpenAI compatibility”](#openai-compatibility)

By wrapping the standard `/v1/chat/completions` endpoint, this service can be used as the base OpenAI URL in your applications and integrated easily:

```bash
export OPENAI_API_KEY=$_8080_API_KEY
export OPENAI_BASE_URL="https://{{ your_project_slug }}.hosted.8080.io/v1/"
```

# Text Generation

> Create chat completions with the OpenAI-compatible chat API on 8080—requests, responses, and common options.

**Chat completions** are how you generate natural-language (or structured) output from a model by sending a **conversation** as a list of **messages** and receiving the model’s reply in a single HTTP response. On 8080, this matches the familiar **`POST /v1/chat/completions`** shape used by OpenAI’s Chat Completions API, so existing clients and patterns largely work unchanged.

## Prerequisites

[Section titled “Prerequisites”](#prerequisites)

* An **8080 API key** (see [Quickstart](/getting-started/quickstart)).
* A **model ID** your project can use (see [Models](/getting-started/models)). Examples below use `8080/taalas/llama3.1-8b-instruct`.

Set your key for shell snippets:

```bash
export _8080_API_KEY="your-api-key"
```

## Create a chat completion

[Section titled “Create a chat completion”](#create-a-chat-completion)

**Endpoint:** `POST https://api.8080.io/v1/chat/completions`\
**Headers:** `Content-Type: application/json`, `Authorization: Bearer <api_key>`\
**Body (minimum):** `model` (string) and `messages` (array of message objects with `role` and `content`).

Each **message** has:

* **`role`** — Who “spoke” the message: typically `user`, `assistant`, `developer`, or `system` (see [message types](#message-roles) below).
* **`content`** — The text of that turn (string, or structured content if your model supports it).

### Example: first request

[Section titled “Example: first request”](#example-first-request)

* cURL

  ```bash
  curl https://api.8080.io/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $_8080_API_KEY" \
    -d '{
      "model": "8080/taalas/llama3.1-8b-instruct",
      "messages": [
        { "role": "user", "content": "Say hello in one short sentence." }
      ]
    }'
  ```

* Python

  ```python
  import os
  import requests


  url = "https://api.8080.io/v1/chat/completions"
  headers = {
      "Authorization": f"Bearer {os.environ.get('_8080_API_KEY')}",
      "Content-Type": "application/json",
  }
  payload = {
      "model": "8080/taalas/llama3.1-8b-instruct",
      "messages": [
          {"role": "user", "content": "Say hello in one short sentence."},
      ],
  }


  response = requests.post(url, headers=headers, json=payload, timeout=120)
  response.raise_for_status()
  data = response.json()
  print(data["choices"][0]["message"]["content"])
  ```

## Read the response

[Section titled “Read the response”](#read-the-response)

A successful JSON body includes:

* **`choices`** — Usually one entry; use **`choices[0].message.content`** for the assistant’s text.
* **`choices[0].finish_reason`** — Why generation stopped (e.g., `stop`, `length`, `tool_calls`).
* **`usage`** — Token counts (`prompt_tokens`, `completion_tokens`, `total_tokens`) for billing and debugging.

Errors follow the same error object style as OpenAI (`error.message`, etc.).

## Multi-turn conversations

[Section titled “Multi-turn conversations”](#multi-turn-conversations)

Send the **full history** in `messages`: prior `user` / `assistant` turns in order, then the new `user` message. The model only sees what you pass in this request—there is no server-side session unless you build one.

```json
{
  "model": "8080/taalas/llama3.1-8b-instruct",
  "messages": [
    { "role": "user", "content": "My name is Alex." },
    { "role": "assistant", "content": "Nice to meet you, Alex!" },
    { "role": "user", "content": "What is my name?" }
  ]
}
```

## Common request options

[Section titled “Common request options”](#common-request-options)

You can steer behavior without changing the endpoint:

| Option                  | Purpose                                                                        |
| ----------------------- | ------------------------------------------------------------------------------ |
| `temperature`           | Randomness (0–2); higher = more varied.                                        |
| `max_completion_tokens` | Cap on tokens in the assistant reply.                                          |
| `stop`                  | Stop when the model emits one of these strings (string or array).              |
| `stream`                | `true` to receive **Server-Sent Events** chunks instead of one JSON body.      |
| `tools`                 | Declarative **function calling**; see [Tool calling](/inference/tool-calling). |

For the full parameter list (penalties, `n`, `seed`, `reasoning_effort`, etc.), see **[Text completions](/inference/completions)** (reference-style table).

## Message roles

[Section titled “Message roles”](#message-roles)

* **`user`** — End-user or application input.
* **`assistant`** — Prior model outputs you include for context.
* **`developer`** — Strong instructions (preferred on newer models where supported).
* **`system`** — Legacy system-style instructions; prefer `developer` when the model supports it.
* **`tool`** — Results of tool calls when using the tools API.

## Compatibility and further reading

[Section titled “Compatibility and further reading”](#compatibility-and-further-reading)

* **[Compatibility](/inference/compatibility)** — OpenAI client base URL and supported endpoints.
* **[OpenAI Chat Completions reference](https://platform.openai.com/docs/api-reference/chat)** — Field-level behavior where 8080 matches the upstream API.

# Batch

> Run asynchronous batch jobs with the Batch API and Files API.

The Batch API lets you run large numbers of requests asynchronously without real-time responses. It is compatible with the [OpenAI Batch API](https://platform.openai.com/docs/guides/batch): same JSONL format, Files API for uploads, and batch create/retrieve/cancel/list endpoints. Use `https://api.8080.io` and your 8080 API key in place of OpenAI’s base URL and key.

Batches complete within the chosen completion window (e.g., 24 hours). Input files can contain up to 50,000 requests and be up to 200 MB.

1. ### Prepare your JSONL file

   [Section titled “Prepare your JSONL file”](#prepare-your-jsonl-file)

   Each line must be a single JSON object with:

   | Field       | Type   | Description                                                  |
   | ----------- | ------ | ------------------------------------------------------------ |
   | `custom_id` | string | Your unique ID for this request (used to match results).     |
   | `method`    | string | HTTP method, e.g., `"POST"`.                                 |
   | `url`       | string | API path, e.g., `"/v1/chat/completions"`.                    |
   | `body`      | object | Request body for that endpoint (same as the underlying API). |

   All requests in one file must target the same endpoint. The `body` must match what the endpoint expects (e.g., `model`, `messages` for chat completions).

   #### Example: chat completions input file

   [Section titled “Example: chat completions input file”](#example-chat-completions-input-file)

   Save as `batch_input.jsonl`:

   ```jsonl
   {"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "8080/taalas/llama3.1-8b-instruct", "messages": [{"role": "user", "content": "Say hello in one word."}], "max_tokens": 50}}
   {"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "8080/taalas/llama3.1-8b-instruct", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 50}}
   {"custom_id": "req-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "8080/taalas/llama3.1-8b-instruct", "messages": [{"role": "user", "content": "Name a color."}], "max_tokens": 50}}
   ```

   Each line is one request. Use `custom_id` later to map output lines back to your inputs (output order may not match input order).

2. ### Upload the file (Files API)

   [Section titled “Upload the file (Files API)”](#upload-the-file-files-api)

   Upload the JSONL file with **purpose** `batch` so it can be used as batch input.

   * cURL

     ```bash
     curl https://api.8080.io/v1/files \
       -H "Authorization: Bearer $_8080_API_KEY" \
       -F purpose="batch" \
       -F file="@batch_input.jsonl"
     ```

   * Python

     ```python
     import os
     import requests


     API_KEY = os.environ.get("_8080_API_KEY")
     BASE = "https://api.8080.io"


     with open("batch_input.jsonl", "rb") as f:
         r = requests.post(
             f"{BASE}/v1/files",
             headers={"Authorization": f"Bearer {API_KEY}"},
             files={"file": ("batch_input.jsonl", f, "application/jsonl")},
             data={"purpose": "batch"},
         )
     r.raise_for_status()
     file_id = r.json()["id"]
     print("Uploaded file ID:", file_id)
     ```

   The response includes an `id` (e.g., `ae1a17d0-...`). Use this as `input_file_id` when creating the batch.

3. ### Create the batch

   [Section titled “Create the batch”](#create-the-batch)

   **Request:** `POST /v1/batches`

   | Parameter              | Type   | Required | Description                                                                             |
   | ---------------------- | ------ | -------- | --------------------------------------------------------------------------------------- |
   | `input_file_id`        | string | Yes      | File ID from the upload step.                                                           |
   | `endpoint`             | string | Yes      | Endpoint for all requests in the file, e.g., `"/v1/chat/completions"`.                  |
   | `completion_window`    | string | Yes      | Time window to complete the batch; e.g., `"24h"` if supported.                          |
   | `metadata`             | object | No       | Key-value pairs (e.g., for labeling).                                                   |
   | `output_expires_after` | object | No       | Expiration for output/error files (e.g., `{"anchor": "created_at", "seconds": 86400}`). |

   * cURL

     ```bash
     curl https://api.8080.io/v1/batches \
       -H "Authorization: Bearer $_8080_API_KEY" \
       -H "Content-Type: application/json" \
       -d '{
         "input_file_id": "ae1a17d0-...",
         "endpoint": "/v1/chat/completions",
         "completion_window": "24h"
       }'
     ```

   * Python

     ```python
     r = requests.post(
         f"{BASE}/v1/batches",
         headers={
             "Authorization": f"Bearer {API_KEY}",
             "Content-Type": "application/json",
         },
         json={
             "input_file_id": file_id,
             "endpoint": "/v1/chat/completions",
             "completion_window": "24h",
             "metadata": {"job": "nightly-eval"},
         },
     )
     r.raise_for_status()
     batch = r.json()
     batch_id = batch["id"]
     print("Batch ID:", batch_id)
     ```

   The response is a batch object with `id`, `status` (e.g., `validating`), `input_file_id`, `output_file_id`, `error_file_id` (often null until the batch progresses), and timestamps.

4. ### Check batch status

   [Section titled “Check batch status”](#check-batch-status)

   **Request:** `GET /v1/batches/{batch_id}`

   Poll this until `status` is a terminal state.

   * cURL

     ```bash
     curl https://api.8080.io/v1/batches/batch_abc123 \
       -H "Authorization: Bearer $_8080_API_KEY"
     ```

   * Python

     ```python
     r = requests.get(
         f"{BASE}/v1/batches/{batch_id}",
         headers={"Authorization": f"Bearer {API_KEY}"},
     )
     r.raise_for_status()
     batch = r.json()
     print("Status:", batch["status"])
     print("Request counts:", batch.get("request_counts"))
     ```

   #### Status values

   [Section titled “Status values”](#status-values)

   | Status        | Description                                                              |
   | ------------- | ------------------------------------------------------------------------ |
   | `validating`  | Input file is being validated.                                           |
   | `failed`      | Validation failed; see `errors` on the batch.                            |
   | `in_progress` | Batch is running.                                                        |
   | `finalizing`  | Batch finished; results are being prepared.                              |
   | `completed`   | Done; use `output_file_id` (and optionally `error_file_id`) to download. |
   | `expired`     | Did not finish within the completion window.                             |
   | `cancelling`  | Cancel requested; waiting for in-flight work.                            |
   | `cancelled`   | Batch was cancelled.                                                     |

   When `status` is `completed`, `output_file_id` points to a JSONL file of successful results, and `error_file_id` (if set) points to a JSONL file of failed requests.

5. ### Retrieve results and errors

   [Section titled “Retrieve results and errors”](#retrieve-results-and-errors)

   **Request:** `GET /v1/files/{file_id}/content`

   Use the batch’s `output_file_id` and `error_file_id` from the completed batch object.

   * cURL

     ```bash
     # Output (successful requests)
     curl https://api.8080.io/v1/files/RESULTS_FILE_ID/content \
       -H "Authorization: Bearer $_8080_API_KEY" \
       -o batch_output.jsonl


     # Errors (failed requests)
     curl https://api.8080.io/v1/files/ERROR_FILE_ID/content \
       -H "Authorization: Bearer $_8080_API_KEY" \
       -o batch_errors.jsonl
     ```

   * Python

     ```python
     # After batch status is "completed"
     output_file_id = batch["output_file_id"]
     error_file_id = batch.get("error_file_id")


     r = requests.get(
         f"https://api.8080.io/v1/files/{output_file_id}/content",
         headers={"Authorization": f"Bearer {API_KEY}"},
     )
     r.raise_for_status()
     with open("batch_output.jsonl", "w") as f:
         f.write(r.text)


     if error_file_id:
         r = requests.get(
             f"https://api.8080.io/v1/files/{error_file_id}/content",
             headers={"Authorization": f"Bearer {API_KEY}"},
         )
         r.raise_for_status()
         with open("batch_errors.jsonl", "w") as f:
             f.write(r.text)
     ```

   #### Output file format (JSONL)

   [Section titled “Output file format (JSONL)”](#output-file-format-jsonl)

   One line per **successful** request. Each line is a JSON object with:

   | Field       | Description                                                                  |
   | ----------- | ---------------------------------------------------------------------------- |
   | `id`        | Batch request ID.                                                            |
   | `custom_id` | The `custom_id` from your input line.                                        |
   | `response`  | Object with `status_code`, `request_id`, and `body` (the API response body). |
   | `error`     | `null` for success lines.                                                    |

   Example output line (chat completions):

   ```jsonl
   {"id": "batch_req_abc", "custom_id": "req-1", "response": {"status_code": 200, "request_id": "req_xyz", "body": {"id": "chatcmpl-...", "object": "chat.completion", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello."}, "finish_reason": "stop"}], "usage": {"prompt_tokens": 12, "completion_tokens": 1, "total_tokens": 13}}}, "error": null}
   ```

   Match results to inputs using `custom_id`; do not rely on line order.

   #### Error file format (JSONL)

   [Section titled “Error file format (JSONL)”](#error-file-format-jsonl)

   One line per **failed** request. Each line has `id`, `custom_id`, `response: null`, and `error` with `code` and `message`, for example:

   ```jsonl
   {"id": "batch_req_456", "custom_id": "req-2", "response": null, "error": {"code": "invalid_request", "message": "Invalid model."}}
   ```

## List batches

[Section titled “List batches”](#list-batches)

`GET /v1/batches` (optional query: `limit`, `after` for pagination).

* cURL

  ```bash
  curl https://api.8080.io/v1/batches?limit=20 \
    -H "Authorization: Bearer $_8080_API_KEY"
  ```

* Python

  ```python
  import os
  import requests


  API_KEY = os.environ.get("_8080_API_KEY")
  BASE = "https://api.8080.io"


  r = requests.get(
      f"{BASE}/v1/batches",
      params={"limit": 20},
      headers={"Authorization": f"Bearer {API_KEY}"},
  )
  r.raise_for_status()
  print(r.json())
  ```

## Cancel a batch

[Section titled “Cancel a batch”](#cancel-a-batch)

`POST /v1/batches/{batch_id}/cancel`

* cURL

  ```bash
  curl https://api.8080.io/v1/batches/batch_abc123/cancel \
    -H "Authorization: Bearer $_8080_API_KEY" \
    -X POST
  ```

* Python

  ```python
  import os
  import requests


  API_KEY = os.environ.get("_8080_API_KEY")
  BASE = "https://api.8080.io"
  batch_id = "batch_abc123"


  r = requests.post(
      f"{BASE}/v1/batches/{batch_id}/cancel",
      headers={"Authorization": f"Bearer {API_KEY}"},
  )
  r.raise_for_status()
  print(r.json())
  ```

After cancelling, status moves to `cancelling` then `cancelled` (may take a few minutes).

# Compatibility

> Use the OpenAI client with 8080 by setting the base URL and API key.

## OpenAI Compatibility

[Section titled “OpenAI Compatibility”](#openai-compatibility)

The 8080 API provides partial compatibility with the OpenAI API specification to facilitate easy integration with existing applications and commonly used tools.

### Environment variables

[Section titled “Environment variables”](#environment-variables)

Most applications that are compatible with the OpenAI API or require an OpenAI key to run can be easily configured to use the 8080 API by setting these two variables:

```bash
export OPENAI_API_KEY="your_8080_api_key_here"
export OPENAI_BASE_URL="https://api.8080.io/v1"
```

### Python Client

[Section titled “Python Client”](#python-client)

By replacing your API client’s base URL with `https://api.8080.io/v1` and your OpenAI access token with an 8080 API key, you can use the standard [OpenAI Python](https://github.com/openai/openai-python) client as you would normally:

```python
from openai import OpenAI
import os


client = OpenAI(
    base_url="https://api.8080.io/v1",
    # optionally set the OpenAI env var OPENAI_API_KEY and omit this line
    api_key=os.environ.get('_8080_API_KEY'),
)


completion = client.chat.completions.create(
    model="8080/taalas/llama3.1-8b-instruct",
    messages=[
        {
            "role": "user",
            "content": "How do I output all files in a directory using Python?",
        },
    ],
)


print(completion.choices[0].message.content)
```

### Currently supported

[Section titled “Currently supported”](#currently-supported)

The 8080 API currently supports the following endpoints that are compatible with the OpenAI-style API:

* `/v1/chat/completions` — Chat-based text completions, with support for tool calling and structured outputs using the same format options available in the OpenAI API
* `/v1/responses` — Generate text completions using the Responses format
* `/v1/models` — List models available for text completions
* `/v1/batches` — Create batch jobs for completions to be executed at a lower priority
* `/v1/files` — Upload batch input files, download results files, deploy file assets for Edge apps

8080 does **not** currently support any other API functionality not included above.

### Further reading

[Section titled “Further reading”](#further-reading)

* Read more about the [OpenAI Python library](https://platform.openai.com/docs/libraries/python)
* Check out the [OpenAI API reference](https://platform.openai.com/docs/api-reference/introduction)

# Text Completions

> A quick walkthrough on how to generate chat completions on 8080.

Generate chat completions using an OpenAI-style chat completions API or the `e80` SDK.

## API

[Section titled “API”](#api)

The 8080 chat completions endpoint is designed to match the OpenAI API for generating chat completions. To learn more, refer to the [OpenAI docs](https://platform.openai.com/docs/api-reference/chat).

* cURL

  ```bash
  curl https://api.8080.io/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $_8080_API_KEY" \
      -d '{
          "model": "8080/taalas/llama3.1-8b-instruct",
          "messages": [
              {
                  "role": "user",
                  "content": "Tell me a joke"
              }
          ]
      }'
  ```

* Python

  ```python
  import requests
  import json
  import os


  response = requests.post(
      "https://api.8080.io/v1/chat/completions",
      headers={
          "Authorization": f"Bearer {os.environ.get('_8080_API_KEY')}",
          "Content-Type": "application/json"
      },
      json={
          "model": "8080/taalas/llama3.1-8b-instruct",
          "messages": [
              {
                  "role": "developer",
                  "content": "why is the sky blue?"
              }
          ]
      }
  )


  if not response.ok:
      raise Exception(f"API Error {response.status_code}: {response.json()}")


  print(response.json())
  ```

## Request body

[Section titled “Request body”](#request-body)

| Parameter               | Type         | Description                                                                                                                                  |
| ----------------------- | ------------ | -------------------------------------------------------------------------------------------------------------------------------------------- |
| `messages` required     | array        | A list of messages comprising the conversation so far. Supports different message types like text, images, and audio depending on the model. |
| `model` required        | string       | Model ID used to generate the response, like `8080/taalas/llama3.1-8b-instruct`                                                              |
| `frequency_penalty`     | number       | 0                                                                                                                                            |
| `logit_bias`            | map          | null                                                                                                                                         |
| `max_completion_tokens` | integer      | null                                                                                                                                         |
| `n`                     | integer      | 1                                                                                                                                            |
| `presence_penalty`      | number       | 0                                                                                                                                            |
| `reasoning_effort`      | string       | ”medium”                                                                                                                                     |
| `seed`                  | integer      | null                                                                                                                                         |
| `stop`                  | string/array | null                                                                                                                                         |
| `stream`                | boolean      | false                                                                                                                                        |
| `temperature`           | number       | 1                                                                                                                                            |
| `top_p`                 | number       | 1                                                                                                                                            |
| `tools`                 | array        | null                                                                                                                                         |

### Message Types

[Section titled “Message Types”](#message-types)

The `messages` array can contain different types of messages:

* **Developer message**: Instructions for the model to follow, replacing system messages in newer models
* **System message**: Legacy instructions for the model (prefer developer messages for newer models)
* **User message**: Messages from end users containing prompts or context
* **Assistant message**: Model-generated responses
* **Tool message**: Messages related to tool/function calling

## Tool calling

[Section titled “Tool calling”](#tool-calling)

Pass a `tools` array in the request to let the model call your functions. When the model returns `tool_calls`, add tool messages with the results and call the API again until the model sends a final text response. See the [Tool calling](/inference/tool-calling) guide for the request format and a full Python example.

# Logging

> Enable request logging to inspect API and edge traces, latencies, and raw request/response data including prompts.

The 8080 API can record traces for your API and edge requests. Use them to understand latencies, inspect raw request and response payloads, and see prompts sent in each request.

## What you get

[Section titled “What you get”](#what-you-get)

When logging is enabled for a request, 8080 records:

* **Traces** — End-to-end traces for API and edge requests
* **Latency** — Timing data to debug slow or variable response times
* **Request and response data** — Raw request body and response body
* **Prompts** — The full prompt (messages) included in the request

Traces are available in the [8080 dashboard](https://app.8080.io) so you can correlate requests with your application logs.

## Enabling logging

[Section titled “Enabling logging”](#enabling-logging)

Add `"log": true` to the JSON body of your request. Logging is supported on endpoints that accept a request body (for example, chat completions and responses).

### Chat completions

[Section titled “Chat completions”](#chat-completions)

* cURL

  ```bash
  curl https://api.8080.io/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $_8080_API_KEY" \
      -d '{
          "model": "8080/taalas/llama3.1-8b-instruct",
          "messages": [{"role": "user", "content": "Hello"}],
          "log": true
      }'
  ```

* Python

  ```python
  import requests
  import os


  response = requests.post(
      "https://api.8080.io/v1/chat/completions",
      headers={
          "Authorization": f"Bearer {os.environ.get('_8080_API_KEY')}",
          "Content-Type": "application/json",
      },
      json={
          "model": "8080/taalas/llama3.1-8b-instruct",
          "messages": [{"role": "user", "content": "Hello"}],
          "log": True,
      },
  )
  ```

### Responses endpoint

[Section titled “Responses endpoint”](#responses-endpoint)

For the responses API, include `"log": true` in the request body as well:

```json
{
  "template": "my-template",
  "inputs": { "name": "World" },
  "log": true
}
```

## Viewing traces

[Section titled “Viewing traces”](#viewing-traces)

After sending requests with `"log": true`, open the [dashboard](https://app.8080.io) and use the logging or traces section to filter by time range, endpoint, or request ID. You can expand each trace to see latency breakdowns, the full request (including messages/prompts), and the response payload.

## Best practices

[Section titled “Best practices”](#best-practices)

* **Debugging** — Turn on `log: true` when diagnosing errors or unexpected behavior so you can see exactly what was sent and returned.
* **Sensitive data** — Logged requests store the full prompt and response. Avoid enabling logging for requests that contain secrets or highly sensitive content, or use it only in non-production environments.
* **Sampling** — For high-volume traffic, enable logging only for a subset of requests (for example, on a percentage of calls or when a debug flag is set) to control volume and cost.

# Responses

> Generate model output with the OpenAI Responses–compatible API on 8080.

Use the **Responses** API when you want an OpenAI-style request/response flow built around **`input`** (string or structured items), optional **`instructions`**, **`tools`**, **`previous_response_id`** for multi-turn conversations, and a unified **`response`** object in the reply. The 8080 endpoint is designed to match the [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses).

## API

[Section titled “API”](#api)

**Create a response:** `POST https://api.8080.io/v1/responses`

* cURL

  ```bash
  curl https://api.8080.io/v1/responses \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $_8080_API_KEY" \
      -d '{
          "model": "8080/taalas/llama3.1-8b-instruct",
          "input": "Tell me a short joke about APIs."
      }'
  ```

* Python

  ```python
  import os
  import requests


  response = requests.post(
      "https://api.8080.io/v1/responses",
      headers={
          "Authorization": f"Bearer {os.environ.get('_8080_API_KEY')}",
          "Content-Type": "application/json",
      },
      json={
          "model": "8080/taalas/llama3.1-8b-instruct",
          "input": "Tell me a short joke about APIs.",
      },
  )


  if not response.ok:
      raise Exception(f"API Error {response.status_code}: {response.json()}")


  print(response.json())
  ```

`input` can be a **string** or an **array** of items with `role` and `content` (similar to chat messages). `content` may be a string or a list of typed parts (e.g., `input_text`).

## Request body

[Section titled “Request body”](#request-body)

### Core parameters

[Section titled “Core parameters”](#core-parameters)

| Parameter              | Type             | Description                                                                                     |
| ---------------------- | ---------------- | ----------------------------------------------------------------------------------------------- |
| `model` required       | string           | Model ID for the response (same IDs as chat completions where supported).                       |
| `input`                | string \| array  | User/system/developer content: plain string, or array of input items with `role` and `content`. |
| `instructions`         | string           | High-level system-style instructions separate from `input`.                                     |
| `conversation`         | string \| object | Continue or scope a thread; may be a conversation `id` or `{ "id": "..." }`.                    |
| `previous_response_id` | string           | Chain from a prior response for multi-turn flows.                                               |
| `stream`               | boolean          | Stream events with `text/event-stream` (OpenAI-style response events).                          |
| `temperature`          | number           | Sampling temperature.                                                                           |
| `top_p`                | number           | Nucleus sampling.                                                                               |
| `max_output_tokens`    | integer          | Cap on generated output tokens.                                                                 |
| `max_tool_calls`       | integer          | Limit tool invocations per response.                                                            |
| `parallel_tool_calls`  | boolean          | Allow parallel tool calls when `true`.                                                          |
| `tools`                | array            | Tools the model may use (functions, file search, etc., as supported).                           |
| `tool_choice`          | string \| object | `auto`, `none`, `required`, or force a specific function.                                       |
| `text`                 | object           | Output shaping: `format` (`text`, `json_object`, `json_schema`) and `verbosity`.                |
| `reasoning`            | object           | Reasoning controls, e.g., `effort`: `minimal`, `medium`, `high`.                                |
| `truncation`           | string           | `auto` or `disabled`.                                                                           |
| `metadata`             | object           | Opaque key/value metadata stored with the response.                                             |
| `store`                | boolean          | Whether the response is persisted for later retrieval.                                          |
| `background`           | boolean          | Run the request in the background when supported.                                               |
| `user`                 | string           | End-user identifier for abuse tracking.                                                         |
| `log`                  | boolean          | When `true`, enable request tracing (see [Logging](/inference/logging)).                        |

Field-level details match the [OpenAPI](/api-reference) `ResponseCreateParams` schema where implemented.

## Response object

[Section titled “Response object”](#response-object)

Successful `POST` returns a **response** object with `object: "response"`. Below is a representative JSON body (with `//` comments for documentation only — strip them if you paste into a strict JSON parser).

```jsonc
{
  // Unique id for this response; use in GET /v1/responses/{id} or as previous_response_id
  "id": "resp_01abc123",
  "object": "response",
  // Unix timestamp (seconds) when the response was created
  "created_at": 1735689600.0,
  "model": "8080/taalas/llama3.1-8b-instruct",
  // completed | in_progress | failed | incomplete | cancelled | queued
  "status": "completed",
  "parallel_tool_calls": true,
  "tool_choice": "auto",
  // Tools that were available for this turn (echo of request; may be empty)
  "tools": [],
  "conversation": {
    // Conversation thread id when using the conversation feature
    "id": "conv_01xyz789"
  },
  // Ordered list of model outputs: messages, function_call, reasoning, etc.
  "output": [
    {
      "type": "message",
      "id": "msg_01def456",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          // Assistant-visible text; read this for the user-facing answer
          "text": "Why did the API go to therapy? Too many unresolved callbacks."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 24,
    "output_tokens": 18,
    "total_tokens": 42,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  // Present when the run failed or was cut short
  "error": null,
  "incomplete_details": null
}
```

**Reading the model’s text:** walk `output` for items with `type: "message"`, then each `content` entry with `type: "output_text"`; the `text` field is what you show to the user. Tool calls appear as separate `output` items (e.g., `type: "function_call"`) with `name`, `arguments`, and `call_id` for follow-up requests.

Other fields you may see include **`instructions`**, **`temperature`**, **`top_p`**, **`max_output_tokens`**, **`reasoning`**, **`text`**, **`metadata`**, **`previous_response_id`**, and **`background`** — often echoing the create request or state for in-progress responses.

## Streaming

[Section titled “Streaming”](#streaming)

With `"stream": true`, the API returns **SSE** events (see OpenAI’s response event types: `response.created`, `response.output_text.delta`, `response.completed`, etc.). Handle the stream the same way you would for the OpenAI Responses API.

## Tool calling

[Section titled “Tool calling”](#tool-calling)

Pass a `tools` array and handle `function_call` items in `output`, then call **`POST /v1/responses`** again with extended `input` (including `function_call_output` items) and the same `tools`, or use `previous_response_id` to continue the turn. See the [Tool calling](/inference/tool-calling) guide for patterns that map between chat-style tool loops and the Responses API.

## See also

[Section titled “See also”](#see-also)

* [Text completions](/inference/completions) — Chat Completions (`/v1/chat/completions`) when you prefer a strict messages array.
* [OpenAI Responses API reference](https://platform.openai.com/docs/api-reference/responses)

# Tool Calling

> Use the chat completions API to let the model call your functions (tools).

You can pass a `tools` array in chat completion requests. The model may respond with `tool_calls` instead of text; your client runs the requested functions and sends the results back as tool messages, then requests again until the model returns a final text response.

## Request format

[Section titled “Request format”](#request-format)

Include a `tools` array in the request body. Each tool is an object with `type: "function"` and a `function` object that has:

* **`name`** (required): Function name the model will use when calling.
* **`description`** (optional): Description for the model; helps it decide when to call the tool.
* **`parameters`** (optional): JSON Schema for the arguments the function accepts.

```json
{
  "model": "8080/taalas/llama3.1-8b-instruct",
  "messages": [{"role": "user", "content": "What's the weather in Paris?"}],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current temperature for a location by latitude and longitude.",
        "parameters": {
          "type": "object",
          "properties": {
            "latitude": {"type": "number", "description": "Latitude"},
            "longitude": {"type": "number", "description": "Longitude"}
          },
          "required": ["latitude", "longitude"]
        }
      }
    }
  ]
}
```

## Response and the tool-call loop

[Section titled “Response and the tool-call loop”](#response-and-the-tool-call-loop)

1. **First request**: Send `messages` and `tools`. The response may be:

   * Normal text: `choices[0].message.content` is set and `finish_reason` is `"stop"`. You’re done.
   * Tool calls: `choices[0].message.tool_calls` is set and `finish_reason` is `"tool_calls"`. Each item has `id`, `type: "function"`, and `function` with `name` and `arguments` (JSON string).

2. **Append assistant and tool messages**: Add the assistant message (including `tool_calls`) to your conversation. For each tool call, append a message with `role: "tool"`, `tool_call_id` (same as in the assistant’s `tool_calls`), and `content` set to the result of running that function (string, e.g., JSON).

3. **Second request**: Send the updated `messages` (user + assistant + tool messages) with the same `tools`. Repeat until `finish_reason` is `"stop"` or you hit a max-turns limit.

## Python example: get\_weather

[Section titled “Python example: get\_weather”](#python-example-get_weather)

This example defines a `get_weather` tool, sends a user message, and runs the tool-call loop until the model returns a final answer.

```python
import os
import json
import requests


API_KEY = os.environ.get("_8080_API_KEY")
BASE_URL = "https://api.8080.io"


def get_weather(latitude: float, longitude: float) -> str:
    """Get current temperature for a location (mock implementation)."""
    # In production you might call a real weather API
    return json.dumps({"temperature_c": 18, "conditions": "Partly cloudy"})


def run_tool(name: str, arguments: str) -> str:
    args = json.loads(arguments)
    if name == "get_weather":
        return get_weather(args["latitude"], args["longitude"])
    return json.dumps({"error": f"Unknown tool: {name}"})


def chat_with_tools():
    messages = [
        {"role": "user", "content": "What's the weather in Paris right now?"}
    ]
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current temperature for a location by latitude and longitude.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "latitude": {"type": "number", "description": "Latitude"},
                        "longitude": {"type": "number", "description": "Longitude"}
                    },
                    "required": ["latitude", "longitude"]
                }
            }
        }
    ]


    while True:
        resp = requests.post(
            f"{BASE_URL}/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {API_KEY}",
                "Content-Type": "application/json"
            },
            json={"model": "8080/taalas/llama3.1-8b-instruct", "messages": messages, "tools": tools}
        )
        resp.raise_for_status()
        data = resp.json()
        choice = data["choices"][0]
        message = choice["message"]


        messages.append(message)


        if choice.get("finish_reason") == "stop":
            print(message.get("content", ""))
            return


        if choice.get("finish_reason") == "tool_calls" and message.get("tool_calls"):
            for tc in message["tool_calls"]:
                fn = tc["function"]
                result = run_tool(fn["name"], fn["arguments"])
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc["id"],
                    "content": result
                })
        else:
            print(message.get("content", ""))
            return


if __name__ == "__main__":
    chat_with_tools()
```

Run it (after setting `_8080_API_KEY`):

```bash
export _8080_API_KEY="your-api-key"
python chat_with_tools.py
```

## Using the eighty80 SDK

[Section titled “Using the eighty80 SDK”](#using-the-eighty80-sdk)

The `e80` Python SDK simplifies tool calling by decorating your functions and passing them as `tools`:

```python
import requests
from eighty80 import chat, tool, Message


@tool
def get_weather(latitude: float, longitude: float) -> str:
    """Get the current temperature for a location by latitude and longitude."""
    response = requests.get(
        f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m"
    )
    data = response.json()
    return str(data["current"]["temperature_2m"])


result = chat(
    model="8080/taalas/llama3.1-8b-instruct",
    messages=[Message("user", "What's the weather in San Francisco?")],
    tools=[get_weather]
)
```

The SDK handles the tool-call loop and argument parsing for you.