Skip to content
GitHub Login

Inference

Overview of the 8080 Inference API—authentication, OpenAI compatibility, chat completions, responses, and batch jobs.

The Inference API is the hosted LLM surface at https://api.8080.io. It follows OpenAI-compatible paths and payloads where possible so you can use familiar clients, SDKs, and patterns. This page summarizes how to authenticate and which capabilities to use; each topic links to a dedicated guide.

Every request must include your API key as a Bearer token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Create and manage keys in the 8080 dashboard. For local examples in these docs, the env var _8080_API_KEY is used consistently:

export _8080_API_KEY="your-api-key-here"

All inference routes are under https://api.8080.io (HTTPS). There is no trailing path segment on the host; versioned routes start with /v1/....

8080 implements a subset of the OpenAI HTTP API: same general request shapes for chat completions, responses, models, files, and batches, but not every OpenAI-only feature may be available. You can point official OpenAI clients at 8080 by setting the base URL and API key—see Compatibility for environment variables, the Python client, and the list of supported endpoints.

Quick configuration for tools that expect OpenAI’s variables:

export OPENAI_API_KEY="your_8080_api_key_here"
export OPENAI_BASE_URL="https://api.8080.io/v1"

Chat completions use the familiar messages array (system / developer / user / assistant / tool) and return chat completion objects, including optional tool calling and streaming.

  • Endpoint: POST /v1/chat/completions
  • Guide: Text Generation — step-by-step: building requests, reading responses, multi-turn, and common options.
  • Reference: Completions — full parameter table, message types, and tool calling link.

Use this API when you want parity with OpenAI’s Chat Completions product and existing integrations that call /v1/chat/completions.

The Responses API is an alternative interface built around input (string or structured items), optional instructions, tools, previous_response_id for multi-turn flows, and a unified response object (including output items such as message / function_call).

  • Endpoints: POST /v1/responses, GET /v1/responses/{response_id}
  • Guide: Responses — request fields, annotated response JSON, streaming, and tool calling notes.

Use this API when you prefer OpenAI’s Responses model or need features that map naturally to the response/output item shape.

Batch jobs let you submit many requests asynchronously (e.g., thousands of chat completion calls) via a JSONL input file uploaded through the Files API, then poll and download results—aligned with OpenAI’s batch workflow.

  • Flow: upload file (POST /v1/files) → create batch (POST /v1/batches) → poll status → download output/error files.
  • Guide: Batch — JSONL format, examples, listing and cancelling batches.

Use batches for offline evaluation, backfills, or any workload that does not need an immediate per-request response.