Run open-source model inference with an OpenAI-compatible client, and specify
where the inference runs per-request: H100s, B200s, or on the local device.
import { Muna } from "muna"// π₯ Create an OpenAI clientconst openai = new Muna({ accessKey: "..." }).beta.openai;// π₯ Create a chat completionconst completion = await openai.chat.completions.create({ model: "@openai/gpt-oss-20b", messages: [{ role: "user", content: "What is the capital of France?" }]});// π Print the resultconsole.log(completion.choices[0]);
Our OpenAI-style client in muna.beta.openai that has the same interface
as the official OpenAI client. This allows you to migrate in two lines of code.
The first time you run the code above might take a few minutes, because we have to download the (rather large) model weights. Subsequent runs should take a few seconds.
Munaβs central feature is the ability to choose where inference runs, on each request. Letβs run
the same model on a datacenter GPU:
Copy
// π₯ Create a chat completion with a datacenter GPUconst completion = await openai.chat.completions.create({ model: "@openai/gpt-oss-20b", messages: [{ role: "user", content: "What is the capital of France?" }], acceleration: "remote_a100"});
The first time you run the code above might take a few minutes, because we have to spin up a container on the cloud GPUs. Subsequent runs take only a second.
This works because of Munaβs compiler platform, which transpiles a Python function into portable C++ that is compiled
to run natively on server, desktop, mobile, and web.