Skip to main content
OpenAIโ€™s client is widely used by developers who consume AI inference in their applications. As such, we provide a mock OpenAI client for developers to access any of the millions of open-source AI models by changing only two lines of code:
ai.js
// ๐Ÿ’ฅ Create a Muna client
const muna = new Muna();

// ๐Ÿ”ฅ Retrieve the mock OpenAI client
const openai = muna.beta.openai;

// ๐Ÿš€ Create a chat completion
const completion = await openai.chat.completions.create({
    model: "@google/gemma-3-270m",  // use an LLM predictor from Muna!
    acceleration: "remote_a100",    // use an Nvidia GPU in the โ˜๏ธ
    messages: [{ role: "developer", content: "You are a helpful assistant." }]
});
Unlike OpenAIโ€™s own client, every operation in Munaโ€™s OpenAI client allows for running the model either locally or in the cloud, accelerated by powerful GPUs.
Creating a predictor that is compatible with our mock OpenAI client is as easy as adding a few annotations to your Python function before compiling it. See the guide.
The mock OpenAI client is experimental, with many features subject to change.

Creating Chat Completions

Muna supports using predictors that run Llama.cpp models via the OpenAI clientโ€™s openai.chat.completions.create API:
import { Muna } from "muna"

// ๐Ÿ’ฅ Create a mock OpenAI client
const openai = new Muna().beta.openai;

// ๐Ÿ”ฅ Create a chat completion
const completion = await openai.chat.completions.create({
    model: "@google/gemma-3-270m",
    messages: [{ role: "user", content: "What is life?" }]
});

// ๐Ÿš€ Use the chat completion
console.log(completion);
The mock OpenAI client also supports creating streaming completions:
import { Muna } from "muna"

// ๐Ÿ’ฅ Create a mock OpenAI client
const openai = new Muna().beta.openai;

// ๐Ÿ”ฅ Create a streaming chat completion
const stream = await openai.chat.completions.create({
    model: "@google/gemma-3-270m",
    messages: [{ role: "user", content: "What is life?" }],
    stream: true
});

// ๐Ÿš€ Use completion chunks
for await (const chunk of stream)
    ...

Creating Embeddings

Muna supports using text embedding predictors via the OpenAI clientโ€™s openai.embeddings.create API:
import { Muna } from "muna"

// ๐Ÿ’ฅ Create a mock OpenAI client
const openai = new Muna().beta.openai;

// ๐Ÿ”ฅ Create a text embedding
const embedding = await openai.embeddings.create({
    model: "@nomic/nomic-embed-text-v1.5",
    input: "What is the capital of France?"
});

// ๐Ÿš€ Use the embedding
console.log(embedding);

Creating Speech

Muna supports using text-to-speech predictors via the OpenAI clientโ€™s openai.audio.speech.create API:
import { Muna } from "muna"

// ๐Ÿ’ฅ Create a mock OpenAI client
const openai = new Muna().beta.openai;

// ๐Ÿ”ฅ Create speech
const response = await openai.audio.speech.create({
    model: "@kitten-ml/kitten-tts",
    input: "The quick brown fox jumped over the lazy dog.",
    voice: "expr-voice-2-m",
    response_format: "pcm"
});

// ๐Ÿš€ Use the embedding
console.log(embedding);
Currently, only pcm is supported for the speech generation response_format. The response headers contain information about the sample rate and channel count of the PCM data.
Currently, only audio is supported for the speech generation stream_format.