> ## Documentation Index
> Fetch the complete documentation index at: https://docs.muna.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# OpenAI Compatibility

> Compiling OpenAI-compatible models.

The OpenAI client is widely used by developers who consume AI inference in their applications. This guide explains how to compile models that can be used via Muna's OpenAI-compatible client by leveraging [parameter annotations](/predictors/requirements#using-parameter-annotations).

<Tip>
  Muna's OpenAI-compatible client allows developers to use millions of open-source AI
  models without changing their existing code.
</Tip>

## Compiling Chat Completion Models

You can compile chat completion models compatible with Muna's
`openai.chat.completions.create` interface.

<Steps>
  <Step title="Accepting Chat Messages">
    Chat completion functions should accept a list of input messages with type `list[muna.beta.openai.Message]`:

    ```py llm.py icon="python" focus={2,7-10} theme={null}
    from muna import compile, Parameter
    from muna.beta.openai import Message
    from typing import Annotated

    @compile(...)
    def create_chat_completion(
        messages: Annotated[
            list[Message],
            Parameter.Generic(description="Messages comprising the conversation so far.")
        ]
    ):
        ...
    ```
  </Step>

  <Step title="Returning Chat Completion Chunks">
    Chat completion functions must return an iterator of completion chunks, with type
    `Iterator[muna.beta.openai.ChatCompletionChunk]`:

    ```py llm.py icon="python" focus={2,11} theme={null}
    from muna import compile, Parameter
    from muna.beta.openai import ChatCompletionChunk, Message
    from typing import Iterator

    @compile(...)
    def create_chat_completion(
        messages: Annotated[
            list[Message],
            Parameter.Generic(description="Messages comprising the conversation so far.")
        ]
    ) -> Iterator[ChatCompletionChunk]:
        ...
    ```
  </Step>

  <Step title="Creating Chat Completions">
    We recommend using the [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) package to
    create chat completions using [`Llama.cpp`](https://github.com/ggml-org/llama.cpp):

    ```py llm.py icon="python" focus={3,6,15-21} theme={null}
    from muna import compile, Parameter
    from muna.beta.openai import ChatCompletionChunk, Message
    from llama_cpp import Llama
    from typing import Iterator

    model = Llama(model_path=model_path)

    @compile(...)
    def create_chat_completion(
        messages: Annotated[
            list[Message],
            Parameter.Generic(description="Messages comprising the conversation so far.")
        ]
    ) -> Iterator[ChatCompletionChunk]:
        stream = model.create_chat_completion(
            messages=messages,
            max_tokens=1_000,
            stream=True
        )
        for chunk in stream:
            yield chunk
    ```
  </Step>
</Steps>

## Compiling Embedding Models

You can compile text embedding models compatible with Muna's
`openai.embeddings.create` interface.

<Steps>
  <Step title="Accepting Input Texts">
    Embedding functions should accept a list of input texts to embed, as a `list[str]`:

    ```py embed_text.py icon="python" focus={7-10} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def embed_text(
        texts: Annotated[
            list[str],
            Parameter.Generic(description="Input texts to embed.")
        ]
    ) -> ndarray:
        ...
    ```
  </Step>

  <Step title="Returning the Embeddings">
    Embedding functions must return an embedding matrix as a Numpy `ndarray`.
    The array must have a [`Parameter.Embedding`](/predictors/requirements#embedding-annotation)
    annotation:

    ```py embed_text.py icon="python" focus={11-14} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def embed_text(
        texts: Annotated[
            list[str],
            Parameter.Generic(description="Input texts to embed.")
        ]
    ) -> Annotated[
        ndarray,
        Parameter.Embedding(description="Embedding matrix.")
    ]:
        ...
    ```

    <Note>
      The returned `ndarray` must have a `float32` data type.
    </Note>

    <Note>
      The returned `ndarray` must be a 2D array with shape `(N,D)`, where
      `N` is the number of input texts and `D` is the embedding dimension.
    </Note>
  </Step>

  <Step title="(Optional) Supporting Matryoshka Embeddings">
    Some embedding models allow for specifying the number of embedding dimensions, based on Matryoshka representation learning.
    To expose this setting, add an `int` parameter with the
    [`Parameter.EmbeddingDims`](/predictors/requirements#embedding-dimensions-annotation) annotation:

    ```py embed_text.py icon="python" focus={11-18} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def embed_text(
        texts: Annotated[
            list[str],
            Parameter.Generic(description="Input texts to embed.")
        ],
        dimensions: Annotated[
            int,
            Parameter.EmbeddingDims(
                description="The number of dimensions the embeddings should have.",
                min=256,
                max=768
            )
        ] = 768
    ) -> Annotated[
        ndarray,
        Parameter.Embedding(description="Embedding matrix.")
    ]:
        ...
    ```

    <Warning>
      To remain compatible with the OpenAI embeddings interface, the function **must have only one required parameter**. As a result, make sure to specify a default value for all other parameters.
    </Warning>
  </Step>
</Steps>

## Compiling Speech Models

You can compile text-to-speech models compatible with Muna's
`openai.audio.speech.create` interface.

<Steps>
  <Step title="Accepting Input Text">
    Text-to-speech functions should accept an input text `str`:

    ```py generate_speech.py icon="python" focus={7-10} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def generate_speech(
        text: Annotated[
            str,
            Parameter.Generic(description="Input text.")
        ]
    ) -> ndarray:
        ...
    ```
  </Step>

  <Step title="Accepting a Generation Voice">
    Text-to-speech functions must also accept a generation voice argument. We recommend using a
    [`Literal`](https://typing.python.org/en/latest/spec/literal.html) or
    [`StrEnum`](https://docs.python.org/3/library/enum.html#enum.StrEnum) type.
    Regardless of the type you choose, the parameter must have a
    [`Parameter.AudioVoice`](/predictors/requirements#audio-voice-annotation) annotation:

    ```py generate_speech.py icon="python" focus={11-14} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated, Literal

    @compile(...)
    def generate_speech(
        text: Annotated[
            str,
            Parameter.Generic(description="Input text.")
        ],
        voice: Annotated[
            Literal["voice_a", "voice_b"],
            Parameter.AudioVoice(description="Voice to use in generating audio.")
        ]
    ) -> ndarray:
        ...
    ```

    <Note>
      The generation voice must be a required parameter, because developers are required to specify
      the voice in the OpenAI interface.
    </Note>
  </Step>

  <Step title="Returning the Generated Audio">
    Speech generation functions must return the generated audio as a Numpy `ndarray` containing
    linear PCM samples. The array must have a [`Parameter.Audio`](/predictors/requirements#audio-annotation)
    annotation:

    ```py generate_speech.py icon="python" focus={15-18} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated, Literal

    @compile(...)
    def generate_speech(
        text: Annotated[
            str,
            Parameter.Generic(description="Input text.")
        ],
        voice: Annotated[
            Literal["voice_a", "voice_b"],
            Parameter.AudioVoice(description="Voice to use in generating audio.")
        ]
    ) -> Annotated[
        ndarray,
        Parameter.Audio(description="Generated speech.", sample_rate=24_000)
    ]:
        ...
    ```

    <Note>
      The returned `ndarray` must have a `float32` data type.
    </Note>

    <Note>
      The returned `ndarray` must either be a 1D array with shape `(F,)` for single channel audio; or a
      2D array with shape `(F,C)` where `C` is the channel count (interleaved).
    </Note>
  </Step>

  <Step title="(Optional) Supporting Audio Speed">
    Some text-to-speech functions support configuring the speed of the generated audio. To expose this
    setting, add a `float` parameter with a [`Parameter.AudioSpeed`](/predictors/requirements#audio-speed-annotation) annotation:

    ```py generate_speech.py icon="python" focus={12-19} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated, Literal

    @compile(...)
    def generate_speech(
        text: Annotated[str, Parameter.Generic(description="Input text.")],
        voice: Annotated[
            Literal["voice_a", "voice_b"],
            Parameter.AudioVoice(description="Voice to use in generating audio.")
        ],
        speed: Annotated[
            float,
            Parameter.AudioSpeed(
                description="The speed of the generated audio.",
                min=0.25,
                max=4.0
            )
        ] = 1.0
    ) -> Annotated[
        ndarray,
        Parameter.Audio(description="Generated speech.", sample_rate=24_000)
    ]:
        ...
    ```

    <Warning>
      The audio speed parameter **must** have a default value, because it is an optional setting in the OpenAI interface.
    </Warning>
  </Step>
</Steps>

## Compiling Transcription Models

You can compile speech-to-text models compatible with Muna's
`openai.audio.transcriptions.create` interface.

<Steps>
  <Step title="Accepting Input Audio">
    Transcription functions should accept input audio as a Numpy `ndarray` annotated
    with the [`Parameter.Audio`](/predictors/requirements#audio-annotation) annotation:

    ```py moonshine_base.py icon="python" focus={7-13} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def moonshine_base(
        audio: Annotated[
            ndarray,
            Parameter.Audio(
                description="Audio to transcribe with shape (F,C).",
                sample_rate=24_000
            )
        ]
    ) -> str:
        ...
    ```

    <Tip>
      When a user runs your compiled model with an audio file (mp3, wav, etc), the
      Muna client will decode it and resample it to your required `sample_rate`.
    </Tip>
  </Step>

  <Step title="Returning the Transcribed Text">
    Transcription functions should return the transcribed text as a string:

    ```py moonshine_base.py icon="python" focus={14-17} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def moonshine_base(
        audio: Annotated[
            ndarray,
            Parameter.Audio(
                description="Audio to transcribe with shape (F,C).",
                sample_rate=24_000
            )
        ]
    ) -> Annotated[
        str,
        Parameter.Generic(description="Transcribed text.")
    ]:
        ...
    ```
  </Step>
</Steps>
