Skip to main content
Migrating from OpenAI to Muna.
Run open-source model inference with an OpenAI-compatible client, and specify where the inference runs per-request: H100s, B200s, or on the local device.

Installing Muna

We provide SDKs for common development frameworks:
# Run this in Terminal
$ npm install muna
Most of our client SDKs are open-source. Star them on GitHub!

Run your First Inference

Generate an access key, then create a chat completion locally with @openai/gpt-oss-20b.
import { Muna } from "muna"

// πŸ’₯ Create an OpenAI client
const openai = new Muna({ accessKey: "..." }).beta.openai;

// πŸ”₯ Create a chat completion
const completion = await openai.chat.completions.create({
  model: "@openai/gpt-oss-20b",
  messages: [{ role: "user", content: "What is the capital of France?" }]
});

// πŸš€ Print the result
console.log(completion.choices[0]);
Our OpenAI-style client in muna.beta.openai that has the same interface as the official OpenAI client. This allows you to migrate in two lines of code.
The first time you run the code above might take a few minutes, because we have to download the (rather large) model weights. Subsequent runs should take a few seconds.

Run on a Datacenter GPU

Muna’s central feature is the ability to choose where inference runs, on each request. Let’s run the same model on a datacenter GPU:
// πŸ”₯ Create a chat completion with a datacenter GPU
const completion = await openai.chat.completions.create({
  model: "@openai/gpt-oss-20b",
  messages: [{ role: "user", content: "What is the capital of France?" }],
  acceleration: "remote_a100"
});
The first time you run the code above might take a few minutes, because we have to spin up a container on the cloud GPUs. Subsequent runs take only a second.

Next Steps

This works because of Muna’s compiler platform, which transpiles a Python function into portable C++ that is compiled to run natively on server, desktop, mobile, and web.