Run open-source model inference with an OpenAI-compatible client, and specify
where the inference runs per-request: H100s, B200s, or on the local device.
Create an account on Muna to generate an access key.
Then create a chat completion with @openai/gpt-oss-20b.
Copy
import { Muna } from "muna"// π₯ Create an OpenAI clientconst openai = new Muna({ accessKey: "..." }).beta.openai;// π₯ Create a chat completion with an Nvidia B200 GPUconst completion = await openai.chat.completions.create({ model: "@openai/gpt-oss-20b", messages: [{ role: "user", content: "What is the capital of France?" }], acceleration: "remote_a100"});// π Print the resultconsole.log(completion.choices[0]);
Our OpenAI-style client in muna.beta.openai that has the same interface
as the official OpenAI client. This allows you to migrate in two lines of code.
The first time you run the code above might take a few minutes, because we have to find a cloud GPU and download the (rather large) model weights. Subsequent runs take only a second.
Muna also supports running inference locally. In this case, Muna will download a
self-contained executable binary that executes the model:
Copy
// π₯ Create a chat completion with the local GPUconst completion = await openai.chat.completions.create({ model: "@openai/gpt-oss-20b", messages: [{ role: "user", content: "What is the capital of France?" }], acceleration: "local_gpu"});
The first time you run the code above might take a few minutes, because we have to download the (rather large) model weights. Subsequent runs should take a few seconds.
This works because of Munaβs compiler platform, which transpiles a Python function into portable C++ that is compiled
to run natively on server, desktop, mobile, and web.