Edge Deployment
For the fastest experience, 8080 provides a CLI and SDK for easily deploying applications to our
mix-compute cloud of inference and general compute infrastructure. This is an ideal solution for any
application requiring a chain of prompts, mult-step agentic workflows, retrieval-based use cases, and
much more. Co-locating your application’s business logic alongside the 8080 Inference API enables the
lowest possible end-to-end latencies for your LLM needs.
This guide will walk you through the process of getting started with deploying your first project, an example of a simple chat wrapper application.
Prerequisites
Section titled “Prerequisites”First you’ll need to have the 8080 CLI installed and authenticated. In order to successfully deploy your code, you’ll also need to have Docker installed on your system.
Create a project
Section titled “Create a project”Next step is to initialize a project:
# first create a new directory and cd into itmkdir chat && cd chat
# run the 8080 init to create project files8080 initLocal development
Section titled “Local development”You should now see some files created by the script to help you get started, but let’s direct our
attention to the main.py file.
from e80_sdk import Eighty80, eighty80_appfrom fastapi import Request
app = eighty80_app()
# Get an OpenAI SDK-compatible object to talk to the 8080 APIapi = Eighty80().completion_sdk()
@app.post("/v1/chat/completions")async def completions(request: Request): body = await request.json() return api.chat.completions.create(**body)In this example, you’ll see a very simple web application exposing a single endpoint /chat/completions,
essentially serving as a type for wrapping an OpenAI API with your own custom code. For example, try
prepending a system prompt for all requests to include:
@app.post("/v1/chat/completions")async def completions(request: Request): body = await request.json() body['messages'].insert(0, { "role": "developer", "content": "Only respond in pig latin no matter what the user prompts." }) return api.chat.completions.create(**body)Running locally
Section titled “Running locally”To run this app locally you can do so with the 8080 dev CLI command:
$❯ 8080 devFetching secrets for local development for: chatINFO: Will watch for changes in these directories: ['~/projects/chat']INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)INFO: Started reloader process [4851] using StatReloadINFO: Started server process [4854]INFO: Waiting for application startup.INFO: Application startup complete.If you already have a process running on port 8080, you can specify a different one
with the --port argument.
Try cURLing your application locally:
curl http://localhost:8080/chat/completions \ -X POST \ -H "Content-Type: application/json" \ -d '{ "model": "8080/taalas/llama3.1-8b-instruct", "messages": [ { "role": "user", "content": "Hello, world" } ] }'Deploy to 8080
Section titled “Deploy to 8080”Once ready, simply run the following to start a deploy to 8080:
8080 deployWatch the progress of your deployment by following the URL returned. Once the deployment
successfully completes, your application will now be accessible at {your_project_slug}.hosted.8080.io.
Making requests
Section titled “Making requests”This application requires authentication using an API key. Connect to this app by including one of your project’s API keys:
curl https://{{ your_project_slug }}.hosted.8080.io/chat/completions \ -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $_8080_API_KEY" \ -d '{ "model": "8080/taalas/llama3.1-8b-instruct", "messages": [ { "role": "user", "content": "Hello, world" } ] }'Alternatively, send a request with the CLI:
8080 apikey set $_8080_API_KEY8080 call chat/completions '{ "model": "8080/taalas/llama3.1-8b-instruct", "messages": [ { "role": "developer", "content": "Hello, world" } ]}'