Skip to main content
Run open-source model inference with an OpenAI-compatible client, and specify where the inference runs per-request: H100s, B200s, or on the local device.

Installing Muna

We provide SDKs for common development frameworks:
# Run this in Terminal
$ npm install muna
Most of our client SDKs are open-source. Star them on GitHub!

Run your First Inference

Create an account on Muna to generate an access key. Then create a chat completion with @openai/gpt-oss-20b.
import { Muna } from "muna"

// πŸ’₯ Create an OpenAI client
const openai = new Muna({ accessKey: "..." }).beta.openai;

// πŸ”₯ Create a chat completion with an Nvidia B200 GPU
const completion = await openai.chat.completions.create({
  model: "@openai/gpt-oss-20b",
  messages: [{ role: "user", content: "What is the capital of France?" }],
  acceleration: "remote_b200"
});

// πŸš€ Print the result
console.log(completion.choices[0]);
We provide an OpenAI-style client in muna.beta.openai that has the same interface as the official OpenAI client. This allows you to migrate in one line of code.

Run Inference Locally

Muna also supports running inference locally. In this case, Muna will download a self-contained executable binary that executes the model:
// πŸ”₯ Create a chat completion with the local NPU
const completion = await openai.chat.completions.create({
  model: "@openai/gpt-oss-20b",
  messages: [{ role: "user", content: "What is the capital of France?" }],
  acceleration: "npu"
});

Next Steps

This works because of Muna’s compiler platform, which transpiles a Python function into portable C++ that is compiled to run natively on server, desktop, mobile, and web.