Skip to main content
OpenAI’s client is widely used by developers who consume AI inference in their applications. Muna’s OpenAI client allows developers to access millions of open-source AI models by one line of code.
Unlike the official OpenAI client, Muna allows you to specify where the inference runs per-request: H100s, B200s, or on the local device.
You can easily compile your own custom models to be compatible with Muna’s OpenAI client. See the guide.

Creating Chat Completions

Muna supports using predictors that run Llama.cpp models via the OpenAI client’s openai.chat.completions.create API:
import { Muna } from "muna"

// 💥 Create an OpenAI client
const openai = new Muna().beta.openai;

// 🔥 Create a chat completion with an Nvidia A10 GPU
const completion = await openai.chat.completions.create({
  model: "@google/gemma-3-270m",
  messages: [{ role: "user", content: "What is life?" }],
  acceleration: "remote_a10"
});

// 🚀 Print the result
console.log(completion.choices[0]);
The mock OpenAI client also supports creating streaming completions:
import { Muna } from "muna"

// 💥 Create a mock OpenAI client
const openai = new Muna().beta.openai;

// 🔥 Stream a chat completion
const stream = await openai.chat.completions.create({
  model: "@google/gemma-3-270m",
  messages: [{ role: "user", content: "What is life?" }],
  stream: true
});

// 🚀 Use completion chunks
for await (const chunk of stream)
  ...

Creating Embeddings

Muna supports using text embedding predictors via the OpenAI client’s openai.embeddings.create API:
import { Muna } from "muna"

// 💥 Create a mock OpenAI client
const openai = new Muna().beta.openai;

// 🔥 Create a text embedding
const embedding = await openai.embeddings.create({
    model: "@nomic/nomic-embed-text-v1.5",
    input: "What is the capital of France?"
});

// 🚀 Use the embedding
console.log(embedding.data[0].embedding);

Creating Speech

Muna supports using text-to-speech predictors via the OpenAI client’s openai.audio.speech.create API:
import { Muna } from "muna"

// 💥 Create a mock OpenAI client
const openai = new Muna().beta.openai;

// 🔥 Create speech
const response = await openai.audio.speech.create({
    model: "@kitten-ml/kitten-tts",
    input: "The quick brown fox jumped over the lazy dog.",
    voice: "expr-voice-2-m",
    response_format: "pcm"
});

// 🚀 Use the speech
console.log(response);
Currently, only pcm is supported for the speech generation response_format. The response headers contain information about the sample rate and channel count of the PCM data.
Currently, only audio is supported for the speech generation stream_format.