OpenAIโs client is widely used by developers who consume AI inference in their applications. As such, we provide
a mock OpenAI client for developers to access any of the millions of open-source AI models by changing only two lines
of code:
// ๐ฅ Create a Muna client
const muna = new Muna();
// ๐ฅ Retrieve the mock OpenAI client
const openai = muna.beta.openai;
// ๐ Create a chat completion
const completion = await openai.chat.completions.create({
model: "@google/gemma-3-270m", // use an LLM predictor from Muna!
acceleration: "remote_a100", // use an Nvidia GPU in the โ๏ธ
messages: [{ role: "developer", content: "You are a helpful assistant." }]
});
Unlike OpenAIโs own client, every operation in Munaโs OpenAI client allows for running the model either
locally or
in the cloud, accelerated by powerful GPUs.
Creating a predictor that is compatible with our mock OpenAI client is as easy as adding a few annotations
to your Python function before compiling it. See the guide.
The mock OpenAI client is experimental, with many features subject to change.
Creating Chat Completions
Muna supports using predictors that run Llama.cpp models via the OpenAI clientโs
openai.chat.completions.create
API:
import { Muna } from "muna"
// ๐ฅ Create a mock OpenAI client
const openai = new Muna().beta.openai;
// ๐ฅ Create a chat completion
const completion = await openai.chat.completions.create({
model: "@google/gemma-3-270m",
messages: [{ role: "user", content: "What is life?" }]
});
// ๐ Use the chat completion
console.log(completion);
The mock OpenAI client also supports creating streaming completions:
import { Muna } from "muna"
// ๐ฅ Create a mock OpenAI client
const openai = new Muna().beta.openai;
// ๐ฅ Create a streaming chat completion
const stream = await openai.chat.completions.create({
model: "@google/gemma-3-270m",
messages: [{ role: "user", content: "What is life?" }],
stream: true
});
// ๐ Use completion chunks
for await (const chunk of stream)
...
Creating Embeddings
Muna supports using text embedding predictors via the OpenAI clientโs
openai.embeddings.create
API:
import { Muna } from "muna"
// ๐ฅ Create a mock OpenAI client
const openai = new Muna().beta.openai;
// ๐ฅ Create a text embedding
const embedding = await openai.embeddings.create({
model: "@nomic/nomic-embed-text-v1.5",
input: "What is the capital of France?"
});
// ๐ Use the embedding
console.log(embedding);
Creating Speech
Muna supports using text-to-speech predictors via the OpenAI clientโs
openai.audio.speech.create
API:
import { Muna } from "muna"
// ๐ฅ Create a mock OpenAI client
const openai = new Muna().beta.openai;
// ๐ฅ Create speech
const response = await openai.audio.speech.create({
model: "@kitten-ml/kitten-tts",
input: "The quick brown fox jumped over the lazy dog.",
voice: "expr-voice-2-m",
response_format: "pcm"
});
// ๐ Use the embedding
console.log(embedding);
Currently, only pcm
is supported for the speech generation response_format
. The response
headers contain information about the sample rate and channel count of the PCM data.
Currently, only audio
is supported for the speech generation stream_format
.