Skip to content
GitHub Login

Edge Deployment

For the fastest experience, 8080 provides a CLI and SDK for easily deploying applications to our mix-compute cloud of inference and general compute infrastructure. This is an ideal solution for any application requiring a chain of prompts, mult-step agentic workflows, retrieval-based use cases, and much more. Co-locating your application’s business logic alongside the 8080 Inference API enables the lowest possible end-to-end latencies for your LLM needs.

This guide will walk you through the process of getting started with deploying your first project, an example of a simple chat wrapper application.

First you’ll need to have the 8080 CLI installed and authenticated. In order to successfully deploy your code, you’ll also need to have Docker installed on your system.

Next step is to initialize a project:

# first create a new directory and cd into it
mkdir chat && cd chat
# run the 8080 init to create project files
8080 init

You should now see some files created by the script to help you get started, but let’s direct our attention to the main.py file.

from e80_sdk import Eighty80, eighty80_app
from fastapi import Request
app = eighty80_app()
# Get an OpenAI SDK-compatible object to talk to the 8080 API
api = Eighty80().completion_sdk()
@app.post("/v1/chat/completions")
async def completions(request: Request):
body = await request.json()
return api.chat.completions.create(**body)

In this example, you’ll see a very simple web application exposing a single endpoint /chat/completions, essentially serving as a type for wrapping an OpenAI API with your own custom code. For example, try prepending a system prompt for all requests to include:

@app.post("/v1/chat/completions")
async def completions(request: Request):
body = await request.json()
body['messages'].insert(0, {
"role": "developer",
"content": "Only respond in pig latin no matter what the user prompts."
})
return api.chat.completions.create(**body)

To run this app locally you can do so with the 8080 dev CLI command:

$❯ 8080 dev
Fetching secrets for local development for: chat
INFO: Will watch for changes in these directories: ['~/projects/chat']
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO: Started reloader process [4851] using StatReload
INFO: Started server process [4854]
INFO: Waiting for application startup.
INFO: Application startup complete.

If you already have a process running on port 8080, you can specify a different one with the --port argument.

Try cURLing your application locally:

curl http://localhost:8080/chat/completions \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "8080/taalas/llama3.1-8b-instruct",
"messages": [
{
"role": "user",
"content": "Hello, world"
}
]
}'

Once ready, simply run the following to start a deploy to 8080:

8080 deploy

Watch the progress of your deployment by following the URL returned. Once the deployment successfully completes, your application will now be accessible at {your_project_slug}.hosted.8080.io.

This application requires authentication using an API key. Connect to this app by including one of your project’s API keys:

curl https://{{ your_project_slug }}.hosted.8080.io/chat/completions \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $_8080_API_KEY" \
-d '{
"model": "8080/taalas/llama3.1-8b-instruct",
"messages": [
{
"role": "user",
"content": "Hello, world"
}
]
}'

Alternatively, send a request with the CLI:

8080 apikey set $_8080_API_KEY
8080 call chat/completions '{
"model": "8080/taalas/llama3.1-8b-instruct",
"messages": [
{
"role": "developer",
"content": "Hello, world"
}
]
}'