Migrating from OpenAI

OpenAI’s client is widely used by developers who consume AI inference in their applications. As such, we provide a mock OpenAI client for developers to access any of the millions of open-source AI models by changing only two lines of code:

ai.js

// 💥 Create a Muna client
const muna = new Muna();

// 🔥 Retrieve the mock OpenAI client
const openai = muna.beta.openai;

// 🚀 Create a chat completion
const completion = await openai.chat.completions.create({
    model: "@google/gemma-3-270m",  // use an LLM predictor from Muna!
    acceleration: "remote_a100",    // use an Nvidia GPU in the ☁️
    messages: [{ role: "developer", content: "You are a helpful assistant." }]
});

Unlike OpenAI’s own client, every operation in Muna’s OpenAI client allows for running the model either locally or in the cloud, accelerated by powerful GPUs.

Creating a predictor that is compatible with our mock OpenAI client is as easy as adding a few annotations to your Python function before compiling it. See the guide.

The mock OpenAI client is experimental, with many features subject to change.

Creating Chat Completions

Muna supports using predictors that run Llama.cpp models via the OpenAI client’s openai.chat.completions.create API:

import { Muna } from "muna"

// 💥 Create a mock OpenAI client
const openai = new Muna().beta.openai;

// 🔥 Create a chat completion
const completion = await openai.chat.completions.create({
    model: "@google/gemma-3-270m",
    messages: [
        { role: "user", content: "What is life?" }
    ]
});

// 🚀 Use the chat completion
console.log(completion);

The mock OpenAI client also supports creating streaming completions:

import { Muna } from "muna"

// 💥 Create a mock OpenAI client
const openai = new Muna().beta.openai;

// 🔥 Stream a chat completion
const stream = await openai.chat.completions.create({
    model: "@google/gemma-3-270m",
    messages: [{ role: "user", content: "What is life?" }],
    stream: true
});

// 🚀 Use completion chunks
for await (const chunk of stream)
    ...

Creating Embeddings

Muna supports using text embedding predictors via the OpenAI client’s openai.embeddings.create API:

import { Muna } from "muna"

// 💥 Create a mock OpenAI client
const openai = new Muna().beta.openai;

// 🔥 Create a text embedding
const embedding = await openai.embeddings.create({
    model: "@nomic/nomic-embed-text-v1.5",
    input: "What is the capital of France?"
});

// 🚀 Use the embedding
console.log(embedding.data[0].embedding);

Creating Speech

Muna supports using text-to-speech predictors via the OpenAI client’s openai.audio.speech.create API:

import { Muna } from "muna"

// 💥 Create a mock OpenAI client
const openai = new Muna().beta.openai;

// 🔥 Create speech
const response = await openai.audio.speech.create({
    model: "@kitten-ml/kitten-tts",
    input: "The quick brown fox jumped over the lazy dog.",
    voice: "expr-voice-2-m",
    response_format: "pcm"
});

// 🚀 Use the speech
console.log(response);

Currently, only pcm is supported for the speech generation response_format. The response headers contain information about the sample rate and channel count of the PCM data.

Currently, only audio is supported for the speech generation stream_format.

Get Started

Making Predictions

Creating Predictors

Insiders

Migrating from OpenAI

Creating Chat Completions

Creating Embeddings

Creating Speech

Get Started

Making Predictions

Creating Predictors

Insiders

​Creating Chat Completions

​Creating Embeddings

​Creating Speech

Creating Chat Completions

Creating Embeddings

Creating Speech