Run open-source model inference with an OpenAI-compatible client, and specify
where the inference runs per-request: H100s, B200s, or on the local device.
Create an account on Muna to generate an access key.
Then create a chat completion with @openai/gpt-oss-20b.
Copy
import { Muna } from "muna"// π₯ Create an OpenAI clientconst openai = new Muna({ accessKey: "..." }).beta.openai;// π₯ Create a chat completion with an Nvidia B200 GPUconst completion = await openai.chat.completions.create({ model: "@openai/gpt-oss-20b", messages: [{ role: "user", content: "What is the capital of France?" }], acceleration: "remote_b200"});// π Print the resultconsole.log(completion.choices[0]);
We provide an OpenAI-style client in muna.beta.openai that has the same interface
as the official OpenAI client. This allows you to migrate in one line of code.
Muna also supports running inference locally. In this case, Muna will download a
self-contained executable binary that executes the model:
Copy
// π₯ Create a chat completion with the local NPUconst completion = await openai.chat.completions.create({ model: "@openai/gpt-oss-20b", messages: [{ role: "user", content: "What is the capital of France?" }], acceleration: "npu"});
This works because of Munaβs compiler platform, which transpiles a Python function into portable C++ that is compiled
to run natively on server, desktop, mobile, and web.