> ## Documentation Index
> Fetch the complete documentation index at: https://docs.muna.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Defining your Functions

> Ensuring that your functions can be compiled successfully.

export const OperatorTable = () => {
  const groupAndSortOperators = operators => {
    if (!operators) return [];
    const operatorMap = operators.reduce((acc, operator) => {
      const module = operator.name.split(".")[0];
      if (!acc[module]) acc[module] = [];
      acc[module].push(operator);
      return acc;
    }, {});
    const sortedOperatorMap = Object.entries(operatorMap).sort(([a], [b]) => a.localeCompare(b));
    const result = sortedOperatorMap.map(([module, ops]) => [module, ops.sort((a, b) => a.kind.localeCompare(b.kind) || a.name.localeCompare(b.name))]);
    return result;
  };
  const [operators, setOperators] = useState(null);
  const operatorMap = useMemo(() => groupAndSortOperators(operators), [operators]);
  useEffect(() => {
    (async () => {
      const response = await fetch("https://api.muna.ai/v1/operators");
      const data = await response.json();
      setOperators(data);
    })();
  }, []);
  return <div className="">
      {operatorMap.map(([module, ops]) => <div key={module}>
          <h3>
            <code className="!text-indigo-500">
              {module}
            </code>
          </h3>
          <table>
            <thead>
              <tr>
                <th></th>
                <th>
                  Name
                </th>
              </tr>
            </thead>
            <tbody>
              {ops.map((operator, index) => <tr key={index}>
                  <td>
                    <Icon icon={operator.kind === "method" ? "cube" : "function"} />
                  </td>
                  <td>
                    <code className="!text-sm">
                      {operator.name}
                    </code>
                  </td>
                </tr>)}
            </tbody>
          </table>
        </div>)}
    </div>;
};

Muna supports compiling a tiny-but-growing subset of Python language constructs. Below are requirements
and guidelines for compiling a Python function with Muna:

## Specifying the Function Signature

The prediction function **must** be a module-level function, and **must** have parameter and return
type annotations:

```py icon="python" theme={null}
from muna import compile

@compile(...)
def greeting(name: str) -> str:
    return f"Hello {name}"
```

<Warning>
  The prediction function **must not** have any variable-length positional or keyword arguments.
</Warning>

### Supported Parameter Types

Muna supports a [fixed set](/predictions/create#using-prediction-values) of predictor
input and output value types. Below are supported type annotations:

<AccordionGroup>
  <Accordion title="Floating Point Values" icon="pi">
    Floating-point input and return values should be annotated with the `float` built-in type.

    ```py icon="python" theme={null}
    from muna import compile

    @compile(...)
    def square(number: float) -> float:
        return number ** 2
    ```

    <Warning>
      Unlike Python which defaults to 64-bit floats, Muna will always lower a Python `float` to 32 bits.
    </Warning>

    For control over the binary width of the number, use the `numpy.float[16,32,64]` types:

    ```py icon="python" theme={null}
    from muna import compile
    import numpy as np

    @compile(...)
    def square(number: np.float64) -> float64:
        return number ** 2
    ```
  </Accordion>

  <Accordion title="Integer Values" icon="hundred-points">
    Integer input and return values should be annotated with the `int` built-in type.

    ```py icon="python" theme={null}
    from muna import compile

    @compile(...)
    def square(number: int) -> int:
        return number ** 2
    ```

    <Warning>
      Unlike Python which supports arbitrary-precision integers, Muna will always lower a Python `int` to 32 bits.
    </Warning>

    For control over the binary width of the integer, use the `numpy.int[8,16,32,64]` types:

    ```py icon="python" theme={null}
    from muna import compile
    import numpy as np

    @compile(...)
    def square(number: np.int16) -> np.int16:
        return number ** 2
    ```
  </Accordion>

  <Accordion title="Boolean Values" icon="toggle-on">
    Boolean input and return values must be annotated with the `bool` built-in type.

    ```py icon="python" theme={null}
    from muna import compile

    @compile(...)
    def invert(on: bool) -> bool:
        return not on
    ```
  </Accordion>

  <Accordion title="Tensor Values" icon="fire-flame-curved">
    Tensor input and return values must be annotated with the NumPy `numpy.typing.NDArray[T]` type, where `T` is
    the tensor element type.

    ```py icon="python" theme={null}
    from muna import compile
    import numpy as np
    from numpy.typing import NDArray

    @compile(...)
    def cholesky_decompose(tensor: NDArray[np.float64]) -> np.ndarray:
        return np.linalg.cholesky(tensor).astype("float32")
    ```

    <Tip>
      You can also annotate with the `np.ndarray` type, but doing so will always assume a `float32` element type (following
      [PyTorch semantics](https://pytorch.org/docs/stable/generated/torch.get_default_dtype.html)).
    </Tip>

    Below are the supported element types:

    | Numpy data type | Muna data type |
    | :-------------- | :------------- |
    | `np.float16`    | `float16`      |
    | `np.float32`    | `float32`      |
    | `np.float64`    | `float64`      |
    | `np.int8`       | `int8`         |
    | `np.int16`      | `int16`        |
    | `np.int32`      | `int32`        |
    | `np.int64`      | `int64`        |
    | `np.uint8`      | `uint8`        |
    | `np.uint16`     | `uint16`       |
    | `np.uint32`     | `uint32`       |
    | `np.uint64`     | `uint64`       |
    | `bool`          | `bool`         |

    <Warning>
      Muna does not yet support complex numbers or tensors.
    </Warning>

    <Warning>
      Muna only supports, and will always assume, little-endian ordering for multi-byte element types.
    </Warning>
  </Accordion>

  <Accordion title="String Values" icon="quote-right" iconType="solid">
    String input and return values must be annotated with the `str` built-in type.

    ```py icon="python" theme={null}
    from muna import compile

    @compile(...)
    def uppercase(text: str) -> str:
        return text.upper()
    ```
  </Accordion>

  <Accordion title="List Values" icon="brackets-square">
    List input and return values must be annotated with the `list[T]` built-in type, where `T` is the element type.

    ```py icon="python" theme={null}
    from muna import compile

    @compile(...)
    def slice(items: list[str]) -> list[str]:
        return items[:3]
    ```

    <Tip>
      When the list element type `T` is a Pydantic `BaseModel`, a full JSON schema will be generated.
    </Tip>

    <Note>
      Providing an element type `T` is optional but strongly recommended because it is used to generate a schema for the parameter or
      return value.
    </Note>
  </Accordion>

  <Accordion title="Dictionary Values" icon="brackets-curly">
    Dictionary input and return values can be annotated in one of two ways:

    1. Using a Pydantic [`BaseModel`](https://docs.pydantic.dev/latest/concepts/models) subclass.
    2. Using the `dict[str, T]` built-in type.

    ```py icon="python" theme={null}
    from muna import compile
    from pydantic import BaseModel
    from typing import Literal

    class Person(BaseModel):
        city: str
        age: int

    class Pet(BaseModel):
        sound: Literal["bark", "meow"]
        legs: int

    @compile(...)
    def choose_favorite_pet(person: Person) -> Pet:
        return Pet(sound="meow", legs=6)
    ```

    <Tip>
      We strongly recommend the Pydantic `BaseModel` annotation, as it allows us to generate a full JSON schema.
    </Tip>

    <Warning>
      When using the `dict` annotation, they key type **must** be `str`. The value type `T` can be any arbitrary type.
    </Warning>
  </Accordion>

  <Accordion title="Image Values" icon="image">
    Image input and return values must be annotated with the Pillow
    [`PIL.Image.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image) type.

    ```py icon="python" theme={null}
    from muna import compile
    from PIL import Image

    @compile(...)
    def resize(image: Image.Image) -> Image.Image:
        return image.resize((512, 512))
    ```
  </Accordion>

  <Accordion title="Binary Values" icon="binary">
    Binary input and return values can be annotated in one of three ways:

    1. Using the `bytes` built-in type.
    2. Using the `bytearray` built-in type.
    3. Using the `io.BytesIO` type.

    ```py icon="python" theme={null}
    from muna import compile
    from PIL import Image

    def resize_pixels(pixels: bytes) -> bytes
        return Image.frombytes("L", (4,4), pixels).resize((8,8)).tobytes()
    ```
  </Accordion>
</AccordionGroup>

### Using Parameter Annotations

Muna supports attaching additional annotations to the function's parameter and return types:

```py icon="python" focus={6-13} theme={null}
from muna import compile, Parameter
from typing import Annotated

@compile(...)
def area(
    radius: Annotated[
        float,
        Parameter.Generic(description="Radius of the circle.")
    ]
) -> Annotated[
    float,
    Parameter.Generic(description="Area of the circle.")
]:
    ...
```

These annotations serve multiple important purposes:

* They help users know what input data to provide to the predictor and how to use output data from the predictor, via the parameter `description`.
* They help users search for predictors using highly detailed queries (e.g. MCP clients).
* They help the Muna client automatically provide familiar interfaces around your prediction function, e.g. with the [OpenAI interface](/predictions/openai).
* They help the Muna website automatically create interactive [`visualizers`](https://github.com/muna-ai/visualizers) for
  your prediction function.

<Tip>
  While not required, we highly recommend using parameter annotations on your compiled functions.
</Tip>

Below are currently supported annotations:

<AccordionGroup>
  <Accordion title="Generic Annotation" icon="binary">
    Use the `Parameter.Generic` annotation to provide information about a general input or output parameters:

    ```py predictor.py icon="python" focus={6-9} theme={null}
    from muna import compile, Parameter
    from typing import Annotated

    @compile(...)
    def area(
        radius: Annotated[
            float,
            Parameter.Generic(description="Radius of the circle.")
        ]
    ) -> float:
        ...
    ```

    Below is the full `Parameter.Generic` annotation definition:

    ```py icon="python" theme={null}
    @classmethod
    def Generic(
        cls,
        *,
        description: str  # Parameter description.
    ) -> Parameter: ...
    ```
  </Accordion>

  <Accordion title="Numeric Annotation" icon="hundred-points">
    Use the `Parameter.Numeric` annotation to specify numeric input or output parameters:

    ```py calculate_area.py icon="python" focus={6-13} theme={null}
    from muna import compile, Parameter
    from typing import Annotated

    @compile(...)
    def area(
        radius: Annotated[
            float,
            Parameter.Numeric(
                description="Circle radius.",
                min=1.,
                max=12.
            )
        ]
    ) -> float:
        ...
    ```

    Below is the full `Parameter.Numeric` annotation definition:

    ```py icon="python" theme={null}
    @classmethod
    def Numeric(
        cls,
        *,
        description: str,         # Parameter description.
        min: float | None=None,   # Minimum value.
        max: float | None=None    # Maximum value.
    ) -> Parameter: ...
    ```
  </Accordion>

  <Accordion title="Audio Annotation" icon="volume">
    Use the `Parameter.Audio` annotation to specify audio parameters:

    ```py transcribe_audio.py icon="python" focus={7-13} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def transcribe_audio(
        audio: Annotated[
            ndarray,
            Parameter.Audio(
                description="Input audio.",
                sample_rate=24_000
            )
        ]
    ) -> str:
        ...
    ```

    <Tip>
      The `Parameter.Audio` annotation allows the compiled predictor to be used by our
      [OpenAI speech client](/predictions/openai#creating-speech).
    </Tip>

    Below is the full `Parameter.Audio` annotation definition:

    ```py icon="python" theme={null}
    @classmethod
    def Audio(
        cls,
        *,
        description: str, # Parameter description.
        sample_rate: int  # Audio sample rate in Hertz.
    ) -> Parameter: ...
    ```
  </Accordion>

  <Accordion title="Audio Speed Annotation" icon="gauge-max">
    Use the `Parameter.AudioSpeed` annotation to specify audio speed parameters in audio generation predictors:

    ```py generate_speech.py icon="python" focus={8-15} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def generate_speech(
        text: str,
        speed: Annotated[
            float,
            Parameter.AudioSpeed(
                description="The speed of the generated audio.",
                min=0.25,
                max=4.0
            )
        ] = 1.0
    ) -> ndarray:
        ...
    ```

    Below is the full `Parameter.AudioSpeed` annotation definition:

    ```py icon="python" theme={null}
    @classmethod
    def AudioSpeed(
        cls,
        *,
        description: str,       # Parameter description.
        min: float | None=None, # Minimum audio speed.
        max: float | None=None  # Maximum audio speed.
    ) -> Parameter: ...
    ```
  </Accordion>

  <Accordion title="Audio Voice Annotation" icon="phone-volume" iconType="solid">
    Use the `Parameter.AudioVoice` annotation to specify audio voice parameters in audio generation predictors:

    ```py generate_speech.py icon="python" focus={10-13} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated, Literal

    Voice = Literal["almas", "parv", "rhea", "sam"]

    @compile(...)
    def generate_speech(
        text: str,
        voice: Annotated[
            Voice,
            Parameter.AudioVoice(description="Voice to use when generating audio.")
        ],
        speed: float=1.0
    ) -> ndarray:
        ...
    ```

    Below is the full `Parameter.AudioVoice` annotation definition:

    ```py icon="python" theme={null}
    @classmethod
    def AudioVoice(
        cls,
        *,
        description: str    # Parameter description.
    ) -> Parameter: ...
    ```
  </Accordion>

  <Accordion title="Bounding Box Annotation" icon="rectangle-wide">
    Use the `Parameter.BoundingBox` or `Parameter.BoundingBoxes` annotations to specify
    bounding box parameters in object detection predictors:

    <CodeGroup>
      ```py Single icon="rectangle-wide" focus={8-11} theme={null}
      from muna import compile, Parameter
      from PIL import Image
      from typing import Annotated, Literal

      @compile(...)
      def detect_object(
          image: Image.Image
      ) -> Annotated[
          Detection,
          Parameter.BoundingBox(description="Detected object.")
      ]:
          ...
      ```

      ```py Multiple icon="rectangles-mixed" focus={8-11} theme={null}
      from muna import compile, Parameter
      from PIL import Image
      from typing import Annotated, Literal

      @compile(...)
      def detect_objects(
          image: Image.Image
      ) -> Annotated[
          list[Detection],
          Parameter.BoundingBoxes(description="Detected objects.")
      ]:
          ...
      ```
    </CodeGroup>

    Below is the full `Parameter.BoundingBox` annotation definition:

    <CodeGroup>
      ```py Single icon="rectangle-wide" theme={null}
      @classmethod
      def BoundingBox(
          cls,
          *,
          description: str    # Parameter description.
      ) -> Parameter: ...
      ```

      ```py Multiple icon="rectangles-mixed" theme={null}
      @classmethod
      def BoundingBoxes(
          cls,
          *,
          description: str    # Parameter description.
      ) -> Parameter: ...
      ```
    </CodeGroup>
  </Accordion>

  <Accordion title="Depth Map Annotation" icon="camera" iconType="solid">
    Use the `Parameter.DepthMap` annotation to specify depth map parameters in depth estimation predictors:

    ```py estimate_depth.py icon="python" focus={9-12} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from PIL import Image
    from typing import Annotated

    @compile(...)
    def estimate_depth(
        image: Image.Image
    ) -> Annotated[
        ndarray,
        Parameter.DepthMap(description="Metric depth tensor.")
    ]:
        ...
    ```

    Below is the full `Parameter.DepthMap` annotation definition:

    ```py icon="python" theme={null}
    @classmethod
    def DepthMap(
        cls,
        *,
        description: str    # Parameter description.
    ) -> Parameter: ...
    ```
  </Accordion>

  <Accordion title="Embedding Annotation" icon="wand-sparkles" iconType="solid">
    Use the `Parameter.Embedding` annotation to specify vector embedding parameters in embedding predictors:

    ```py embed_text.py icon="python" focus={8-11} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def embed_text(
        text: str
    ) -> Annotated[
        ndarray,
        Parameter.Embedding(description="Embedding vector.")
    ]:
        ...
    ```

    <Tip>
      The `Parameter.Embedding` annotation allows the compiled predictor to be used by our\
      [OpenAI embedding client](/predictions/openai#creating-embeddings).
    </Tip>

    Below is the full `Parameter.Embedding` annotation definition:

    ```py icon="python" theme={null}
    @classmethod
    def Embedding(
        cls,
        *,
        description: str  # Parameter description.
    ) -> Parameter: ...
    ```
  </Accordion>

  <Accordion title="Embedding Dimensions Annotation" icon="chart-scatter-3d" iconType="solid">
    Use the `Parameter.EmbeddingDims` annotation to specify an embedding
    Matryoshka dimension parameter in embedding predictors:

    ```py embed_text.py icon="python" focus={8-11} theme={null}
    from muna import compile, Parameter
    from numpy import ndarray
    from typing import Annotated

    @compile(...)
    def embed_text(
        text: str,
        dims: Annotated[
            int,
            Parameter.EmbeddingDims(description="Embedding dimensions.")
        ]
    ) -> ndarray:
        ...
    ```

    Below is the full `Parameter.EmbeddingDims` annotation definition:

    ```py icon="python" theme={null}
    @classmethod
    def EmbeddingDims(
        cls,
        *,
        description: str,       # Parameter description.
        min: int | None=None,   # Minimum embedding dimensions.
        max: int | None=None    # Maximum embedding dimensions.
    ) -> Parameter: ...
    ```
  </Accordion>
</AccordionGroup>

## Writing the Function Body

The function body can contain arbitrary Python code. Given that the Muna compiler is currently a
proof of concept, it has limited coverage for Python language features. Below is a list of Python
language features that we either partially support, or do not support at all:

<AccordionGroup>
  <Accordion title="Functions" icon="function">
    | Statement           | Status | Notes                                                                                                                                            |
    | :------------------ | :----: | :----------------------------------------------------------------------------------------------------------------------------------------------- |
    | Recursive functions |   🔨   | Recursive functions **must** have a return type annotation.                                                                                      |
    | Lambda expressions  |   🚧   | Lambda expressions [can be invoked](https://github.com/muna-ai/compiler/blob/main/predictors/language/lambda.py), but cannot be used as objects. |
  </Accordion>

  <Accordion title="Literals" icon="quote-right" iconType="solid">
    | Collection          | Status | Notes                                                          |
    | :------------------ | :----: | :------------------------------------------------------------- |
    | List literals       |   🚧   | List must contain primitive members (e.g. `int`, `str`).       |
    | Dictionary literals |   🚧   | Dictionary must contain primitive members (e.g. `int`, `str`). |
    | Set literals        |   🚧   | Set must contain primitive members (e.g. `int`, `str`).        |
    | Tuple literals      |   🚧   | Tuple must contain primitive members (e.g. `int`, `str`).      |
  </Accordion>

  <Accordion title="Classes" icon="cube">
    Tracing through classes is not yet supported.
  </Accordion>

  <Accordion title="Exceptions" icon="triangle-exclamation" iconType="solid">
    | Statement               | Status | Notes |
    | :---------------------- | :----: | :---- |
    | `raise` statements      |   🔨   |       |
    | `try..except` statement |   🔨   |       |
  </Accordion>
</AccordionGroup>

<Note>
  Over time the list of unsupported language features will shrink and eventually, will be empty.
</Note>

## Using Compiler Sandboxes

Muna supports defining custom sandboxes that can be used to reconstruct your Python environment before compiling your function.

<Warning>
  Sandboxes are very much experimental, and will likely see major changes, additions, and revisions in the near future.
</Warning>

<AccordionGroup>
  <Accordion title="Installing Python Packages">
    Use the `Sandbox.pip_install` method to install Python packages from the PyPi registry:

    ```py predictor.py icon="python" theme={null}
    from muna import compile, Sandbox

    # Install numpy and sklearn
    sandbox = (Sandbox()
      .pip_install("numpy", "scikit-learn")
    )

    # Compile your function with the sandbox
    @compile(..., sandbox=sandbox)
    def predict() -> np.ndarray:
        ...
    ```

    <Warning>
      We highly recommend pinning the specific versions of Python packages in use, so as to
      prevent incompatibilities when creating the sandbox.
    </Warning>
  </Accordion>

  <Accordion title="Installing Debian Packages">
    Use the `Sandbox.apt_install` method to install Debian system packages:

    ```py predictor.py icon="python" theme={null}
    from muna import compile, Sandbox

    # Install git and wget
    sandbox = (Sandbox()
      .apt_install("git", "wget")
    )

    # Compile your function with the sandbox
    @compile(..., sandbox=sandbox)
    def predict() -> BytesIO:
        ...
    ```
  </Accordion>

  <Accordion title="Defining Environment Variables">
    Use the `Sandbox.env` method to define plaintext environment variables:

    ```py predictor.py icon="python" theme={null}
    from muna import compile, Sandbox

    # Define an environment variable
    sandbox = (Sandbox()
      .env({ "MUNA_WEBSITE": "https://muna.ai" })
    )

    # Compile your function with the sandbox
    @compile(..., sandbox=sandbox)
    def predict(prompt: str) -> str:
        ...
    ```

    <Warning>
      Muna does not yet support defining secrets. **Do not** provide secrets
      using sandbox environment variables as they are not designed for storing secrets.
    </Warning>
  </Accordion>

  <Accordion title="Uploading Files">
    Use the `Sandbox.upload_file` method to upload a file to a path in the sandbox:

    ```py predictor.py icon="python" theme={null}
    from muna import compile, Sandbox

    # Upload a model weight to the sandbox
    sandbox = (Sandbox()
      .upload_file("DeepSeek-R1.gguf", "/Deepseek-R1.gguf")
    )

    # Compile your function with the sandbox
    @compile(..., sandbox=sandbox)
    def predict(prompt: str) -> str:
        ...
    ```
  </Accordion>

  <Accordion title="Uploading Directories">
    Use the `Sandbox.upload_directory` method to upload a directory and all its contents to a path in the sandbox:

    ```py predictor.py icon="python" theme={null}
    from muna import compile, Sandbox

    # Upload a directory to the sandbox
    sandbox = (Sandbox()
      .upload_file("resources/", "/resources")
    )

    # Compile your function with the sandbox
    @compile(..., sandbox=sandbox)
    def predict(prompt: str) -> str:
        ...
    ```
  </Accordion>
</AccordionGroup>

## Using Compiler Metadata

Muna's compiler supports specifying metadata, allowing you to configure the compiler or provide additional information.

<AccordionGroup>
  <Accordion title="TensorRT Inference Metadata">
    Use the `TensorRTInferenceMetadata` metadata type to compile a PyTorch [`nn.Module`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html) to [TensorRT](https://developer.nvidia.com/tensorrt):

    ```py ai.py icon="python" focus={1,5-8,12-20} theme={null}
    from muna.beta import TensorRTInferenceMetadata
    from torch import randn, Tensor
    from torch.nn import Module

    # Given a PyTorch model...
    model: Module = ...
    # With some example arguments...
    example_args: list[Tensor] = [randn(1, 3, 224, 224)]

    @compile(
        ...,
        metadata=[
            # Use TensorRT for model inference
            TensorRTInferenceMetadata(
                model=model,
                model_args=example_args,
                cuda_arch="sm_100",
                precision="int4"
            )
        ]
    )
    def predict() -> None:
        pass
    ```

    <Note>
      The TensorRT inference backend is only available on Linux and Windows devices with compatible Nvidia GPUs.
    </Note>

    <Tip>
      We are working on adding support for consumer RTX GPUs with [TensorRT for RTX](https://developer.nvidia.com/blog/nvidia-tensorrt-for-rtx-introduces-an-optimized-inference-ai-library-on-windows/).
    </Tip>

    #### Target CUDA Architectures

    TensorRT engines must be compiled for specific target CUDA architectures. Below are CUDA architectures that our compiler supports:

    | CUDA Architecture | GPU Family               |
    | :---------------- | :----------------------- |
    | `sm_80`           | Ampere (e.g. A100)       |
    | `sm_86`           | Ampere                   |
    | `sm_87`           | Ampere                   |
    | `sm_89`           | Ada Lovelace (e.g. L40S) |
    | `sm_90`           | Hopper (e.g. H100)       |
    | `sm_100`          | Blackwell (e.g. B200)    |

    #### TensorRT Inference Precision

    TensorRT allows for specifying the inference engine's precision. Below are supported precision modes:

    | Precision | Notes                              |
    | :-------- | :--------------------------------- |
    | `fp32`    | 32-bit single precision inference. |
    | `fp16`    | 16-bit half precision inference.   |
    | `int8`    | 8-bit quantized integer inference. |
  </Accordion>

  <Accordion title="OnnxRuntime Inference Metadata">
    Use the `OnnxRuntimeInferenceMetadata` metadata type to compile a PyTorch [`nn.Module`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html) for inference with [ONNXRuntime](https://onnxruntime.ai/):

    ```py ai.py icon="python" focus={1,5-8,12-18} theme={null}
    from muna.beta import OnnxRuntimeInferenceMetadata
    from torch import randn, Tensor
    from torch.nn import Module

    # Given a PyTorch model...
    model: Module = ...
    # With some example arguments...
    example_args: list[Tensor] = [randn(1, 3, 224, 224)]

    @compile(
        ...,
        metadata=[
            # Use ONNXRuntime for model inference
            OnnxRuntimeInferenceMetadata(
                model=model,
                model_args=example_args
            )
        ]
    )
    def predict() -> None:
        pass
    ```
  </Accordion>

  <Accordion title="OnnxRuntime Inference Session Metadata">
    Use the `OnnxRuntimeInferenceSessionMetadata` metadata type to compile an OnnxRuntime [`InferenceSession`](https://onnxruntime.ai/docs/api/python/api_summary.html#inferencesession):

    ```py ai.py icon="python" focus={1,4-6,10-16} theme={null}
    from muna.beta import OnnxRuntimeInferenceSessionMetadata
    from onnxruntime import InferenceSession

    # Given an ONNXRuntime inference session...
    model_path = "/path/to/model.onnx"
    session = InferenceSession(model_path)

    @compile(
        ...,
        metadata=[
            # Use ONNXRuntime for model inference
            OnnxRuntimeInferenceSessionMetadata(
                session=session,
                model_path=model_path
            )
        ]
    )
    def predict(...) -> None:
        pass
    ```

    <Warning>
      The ONNX model file must exist at the provided `model_path` **within the compiler sandbox**.
    </Warning>
  </Accordion>

  <Accordion title="CoreML Inference Metadata">
    Use the `CoreMLInferenceMetadata` metadata type to compile a PyTorch
    [`nn.Module`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html) to
    [CoreML](https://developer.apple.com/documentation/coreml):

    ```py ai.py icon="python" focus={1,5-8,12-18} theme={null}
    from muna.beta import CoreMLInferenceMetadata
    from torch import randn, Tensor
    from torch.nn import Module

    # Given a PyTorch model...
    model: Module = ...
    # With some example arguments...
    example_args: list[Tensor] = [randn(1, 3, 224, 224)]

    @compile(
        ...,
        metadata=[
            # Use CoreML for model inference
            CoreMLInferenceMetadata(
                model=model,
                model_args=example_args
            )
        ]
    )
    def predict() -> None:
        pass
    ```

    <Note>
      The CoreML inference backend is only available on iOS, macOS, and visionOS devices.
    </Note>
  </Accordion>

  <Accordion title="Llama.cpp Inference Metadata">
    Use the `LlamaCppInferenceMetadata` metadata type to compile a [`Llama`](https://github.com/abetlen/llama-cpp-python)
    instance:

    ```py llm.py icon="python" focus={9-15} theme={null}
    from muna.beta import LlamaCppInferenceMetadata
    from llama_cpp import Llama

    # Given an LLM
    llm = Llama(...)

    @compile(
        ...,
        metadata=[
            # Specify Llama.cpp inference metadata
            LlamaCppInferenceMetadata(
                model=llm,
                backends=["cuda"]
            )
        ]
    )
    def predict() -> None:
        pass
    ```

    ## Llama.cpp Hardware Backends

    Llama.cpp supports several hardware backends to accelerate model inference.
    Below are targets that are currently supported by Muna:

    | Backend | Notes                                                                                                    |
    | :------ | :------------------------------------------------------------------------------------------------------- |
    | `cuda`  | [Nvidia CUDA backend](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cuda). Linux only. |
  </Accordion>

  <Accordion title="ExecuTorch Inference Metadata">
    Use the `ExecuTorchInferenceMetadata` metadata type to compile a PyTorch [`nn.Module`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html) for inference with [ExecuTorch](https://docs.pytorch.org/executorch/stable/index.html):

    ```py ai.py icon="python" focus={1,5-8,12-19} theme={null}
    from muna.beta import ExecuTorchInferenceMetadata
    from torch import randn, Tensor
    from torch.nn import Module

    # Given a PyTorch model...
    model: Module = ...
    # With some example arguments...
    example_args: list[Tensor] = [randn(1, 3, 224, 224)]

    @compile(
        ...,
        metadata=[
            # Use ExecuTorch for model inference
            ExecuTorchInferenceMetadata(
                model=model,
                model_args=example_args,
                backend="xnnpack"
            )
        ]
    )
    def predict() -> None:
        pass
    ```

    <Note>
      The ExecuTorch inference backend is only available on Android.
    </Note>

    #### ExecuTorch Hardware Backends

    ExecuTorch supports several [hardware backends](https://docs.pytorch.org/executorch/stable/backends-overview.html) to
    accelerate model inference. Below are targets that are currently supported by Muna:

    | Backend   | Notes                                                                                                             |
    | :-------- | :---------------------------------------------------------------------------------------------------------------- |
    | `xnnpack` | [XNNPACK CPU backend](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html). Always enabled.          |
    | `vulkan`  | [Vulkan GPU backend](https://docs.pytorch.org/executorch/stable/backends-vulkan.html). Only supported on Android. |
  </Accordion>

  <Accordion title="LiteRT Inference Metadata">
    Use the `LiteRTInferenceMetadata` metadata type to compile a PyTorch [`nn.Module`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html) for inference with [LiteRT](https://ai.google.dev/edge/litert):

    ```py ai.py icon="python" focus={1,5-8,12-18} theme={null}
    from muna.beta import LiteRTInferenceMetadata
    from torch import randn, Tensor
    from torch.nn import Module

    # Given a PyTorch model...
    model: Module = ...
    # With some example arguments...
    example_args: list[Tensor] = [randn(1, 3, 224, 224)]

    @compile(
        ...,
        metadata=[
            # Use LiteRT for model inference
            LiteRTInferenceMetadata(
                model=model,
                model_args=example_args
            )
        ]
    )
    def predict() -> None:
        pass
    ```
  </Accordion>

  <Accordion title="TensorFlow Lite Interpreter Metadata">
    Use the `TFLiteInterpreterMetadata` metadata type to compile a TensorFlow Lite
    [`Interpreter`](https://ai.google.dev/edge/api/tflite/python/tf/lite/Interpreter):

    ```py ai.py icon="python" focus={1,4-6,10-16} theme={null}
    from muna.beta import TFLiteInterpreterMetadata
    from tensorflow import lite

    # Given a TFLite interpreter...
    model_path = "/path/to/model.tflite"
    interpreter = lite.Interpreter(model_path)

    @compile(
        ...,
        metadata=[
            # Use TensorFlow Lite for model inference
            TFLiteInterpreterMetadata(
                interpreter=interpreter,
                model_path=model_path
            )
        ]
    )
    def predict(...) -> None:
        pass
    ```

    <Warning>
      The TensorFlow Lite model file must exist at the provided `model_path` **within the compiler sandbox**.
    </Warning>
  </Accordion>

  <Accordion title="QNN Inference Metadata">
    Use the `QnnInferenceMetadata` metadata type to compile a PyTorch [`nn.Module`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html) to a [Qualcomm QNN](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/introduction.html?product=1601111740009302) context binary:

    ```py ai.py icon="python" focus={1,5-8,12-20} theme={null}
    from muna.beta import QnnInferenceMetadata
    from torch import randn, Tensor
    from torch.nn import Module

    # Given a PyTorch model...
    model: Module = ...
    # With some example arguments...
    example_args: list[Tensor] = [randn(1, 3, 224, 224)]

    @compile(
        ...,
        metadata=[
            # Use QNN for model inference
            QnnInferenceMetadata(
                model=model,
                model_args=example_args,
                backend="gpu",
                quantization=None
            )
        ]
    )
    def predict() -> None:
        pass
    ```

    <Note>
      The QNN inference backend is only available on Android and Windows devices with Qualcomm processors.
    </Note>

    #### QNN Hardware Backends

    QNN requires that a hardware device `backend` is specified ahead of time. Below are supported backends:

    | Backend | Notes                                      |
    | :------ | :----------------------------------------- |
    | `cpu`   | Reference `aarch64` CPU backend.           |
    | `gpu`   | Adreno GPU backend, accelerated by OpenCL. |
    | `htp`   | Hexagon NPU backend.                       |

    <Info>
      Learn more about [QNN hardware backends](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/backend.html?product=1601111740009302).
    </Info>

    #### QNN Model Quantization

    When using the `htp` backend, you **must** specify a model `quantization` mode as the Hexagon NPU only supports
    running integer-quantized models. Below are supported quantization modes:

    | Quantization | Notes                                                                         |
    | :----------- | :---------------------------------------------------------------------------- |
    | `w8a8`       | Weights and activations are quantized to `uint8`.                             |
    | `w8a16`      | Weights are quantized to `uint8` while activations are quantized to `uint16`. |
    | `w4a8`       | Weights are quantized to `uint4` while activations are quantized to `uint8`.  |
    | `w4a16`      | Weights are quantized to `uint4` while activations are quantized to `uint16`. |
  </Accordion>

  <Accordion title="OpenVINO Inference Metadata">
    Use the `OpenVINOInferenceMetadata` metadata type to compile a PyTorch [`nn.Module`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html) to [OpenVINO](https://docs.openvino.ai/2025/index.html) IR:

    ```py ai.py icon="python" focus={1,5-8,12-18} theme={null}
    from muna.beta import OpenVINOInferenceMetadata
    from torch import randn, Tensor
    from torch.nn import Module

    # Given a PyTorch model...
    model: Module = ...
    # With some example arguments...
    example_args: list[Tensor] = [randn(1, 3, 224, 224)]

    @compile(
        ...,
        metadata=[
            # Use OpenVINO for model inference
            OpenVINOInferenceMetadata(
                model=model,
                model_args=example_args
            )
        ]
    )
    def predict() -> None:
        pass
    ```

    At runtime, the OpenVINO IR will be used for inference with the [OpenVINO toolkit](https://github.com/openvinotoolkit/openvino).

    <Note>
      The OpenVINO inference backend is only available on Linux and Windows `x86_64` devices with Intel processors.
    </Note>
  </Accordion>

  <Accordion title="IREE Inference Metadata">
    Use the `muna.beta.IREEInferenceMetadata` metadata type to compile a PyTorch [`nn.Module`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html) for inference with [IREE](https://iree.dev/):

    ```py ai.py icon="python" focus={1,5-8,12-19} theme={null}
    from muna.beta import IREEInferenceMetadata
    from torch import randn, Tensor
    from torch.nn import Module

    # Given a PyTorch model...
    model: Module = ...
    # With some example arguments...
    example_args: list[Tensor] = [randn(1, 3, 224, 224)]

    @compile(
        ...,
        metadata=[
            # Use IREE for model inference
            IREEInferenceMetadata(
                model=model,
                model_args=example_args,
                backend="vulkan"
            )
        ]
    )
    def predict() -> None:
        pass
    ```

    <Note>
      The IREE inference backend is only available on Android devices.
    </Note>

    ### IREE HAL Target Backends

    IREE supports several HAL target backends that
    the `model` can be compiled against. Below are targets that are currently supported by Muna:

    | Target   | Notes                                                                                                           |
    | :------- | :-------------------------------------------------------------------------------------------------------------- |
    | `vulkan` | [Vulkan GPU backend](https://iree.dev/guides/deployment-configurations/gpu-vulkan/). Only supported on Android. |
  </Accordion>

  <Accordion title="MIGraphX Inference Metadata">
    *Coming soon* 🤫.
  </Accordion>
</AccordionGroup>

## Library Coverage

We are adding support for popular libraries, across tensor frameworks, scientific computing, and more:

<Accordion title="Supported Libraries" icon="landmark">
  Below are libraries currently supported by our compiler:

  <OperatorTable />
</Accordion>

<Tip>
  If you need a specific library to be supported by the Muna compiler, [reach out to us](mailto:hi@muna.ai).
</Tip>
