Making Realtime Predictions

Because Muna runs prediction functions locally, we can make realtime predictions which run one or more prediction functions repeatedly at interactive rates, often at 30 or 60 frames per second.

Using realtime mode on muna.ai

Making Predictions in Realtime

Before calling muna.predictions.create on every frame, you must first ensure that the predictor has been preloaded on the current device.

Failing to preload a predictor before using it in realtime will result in your Muna client making network requests every frame in an attempt to initially load the predictor. This will lead to your app hanging and crashing.

Preloading the Predictor

To preload a predictor, make a prediction and pass in empty inputs:

await muna.predictions.create({
    tag: "@vision-co/object-detector",
    inputs: { }
});

This works by forcing the Muna client to fetch and initialize the predictor. The empty inputs will cause the prediction to fail due to missing inputs, but it can be safely ignored.

Making Realtime Predictions

After preloading the predictor, you can then make predictions in realtime using your app’s update loop, or other similar mechanisms.

function preloadPredictor() {
    // Preload the predictor
    await muna.predictions.create({
        tag: "@vision-co/object-detector",
        inputs: { }
    });
    // Make predictions in realtime
    while (true)
        doPredictictions();
}

Performance Considerations

Muna automatically optimizes the runtime performance of predictors on a given device by leveraging aggregated performance data. While this means that developers have little control over performance, there are several ways to ensure a smooth user experience in your application:

Finding Similar Predictors

We are always working to find and bring newer and faster predictors to Muna. To this end, we are working on adding a ‘Performance’ tab to predictors on the Muna explore page. This tab will provide granular performance statistics collected from the millions of devices that use that predictor.

If you would like us to bring an open-source AI model or function to Muna, let us know.

Overriding the Acceleration Type

When loading a predictor, our platform informs the Muna client about the best hardware primitive to use for accelerating predictions:

Acceleration	Notes
`cpu`	Use the CPU to accelerate predictions. This is always enabled.
`gpu`	Use the GPU to accelerate predictions.
`npu`	Use the neural processor to accelerate predictions.

Muna currently does not support multi-GPU acceleration. This is planned for the future.

Some of our client SDKs allow you to override the acceleration used to power predictions:

import { Acceleration } from "muna"

await muna.predictions.create({
    tag: "@vision-co/object-detector",
    inputs: { },
    acceleration: "gpu"
});

You can opt to use multiple acceleration types using a bitwise-OR:
acceleration: Acceleration.GPU | Acceleration.NPU.

The prediction acceleration only applies when preloading a predictor. Once a predictor has been loaded, the acceleration is ignored.

The prediction acceleration is merely a hint, which the Muna client will try its best to honor. Setting an acceleration does not guarantee that all or any operation in the prediction function will actually use that acceleration type.

Specifying the Acceleration Device

First, you should absolutely (absolutely) never ever do this unless you know what the hell you’re doing.With that out of the way, some Muna clients allow you to specify the acceleration device used to make predictions. Our clients expose this field as an untyped integer or pointer. The underlying type depends on the current operating system:

OS	Device type	Notes
Android	-	Currently unsupported.
iOS	`id<MTLDevice>`	Metal device.
Linux	`int*`	CUDA device ID pointer.
macOS	`id<MTLDevice>`	Metal device.
visionOS	`id<MTLDevice>`	Metal device.
Web	`GPUDevice`	WebGPU device.
Windows	`ID3D12Device*`	DirectX 12 device.

int* cudaDevice = stackalloc[] { 2 }; // Use `cuda:2`
await muna.Predictions.Create(
    tag: "@vision-co/object-detector",
    inputs: new(),
    device: new IntPtr(cudaDevice) 
)

The prediction device only applies when preloading a predictor. Once a predictor has been loaded, the device is ignored.

The prediction device is merely a hint, which the Muna client will try its best to honor. Setting a device does not guarantee that all or any operation in the prediction function will actually use that acceleration device.

Concurrency with Threading

If your development environment exposes a threading model, it is often beneficial to maintain a dedicated thread to make predictions.Furthermore, you might benefit from making predictions at a lower rate than realtime. While this approach does not directly improve performance, it could alleviate system pressure, thereby enhancing the interactivity of your application.

Muna clients are not thread-safe. Never use a single Muna client across multiple threads.

Get Started

Making Predictions

Creating Predictors

Insiders

Making Realtime Predictions

Making Predictions in Realtime

Preloading the Predictor

Making Realtime Predictions

Performance Considerations

Get Started

Making Predictions

Creating Predictors

Insiders

​Making Predictions in Realtime

​Preloading the Predictor

​Making Realtime Predictions

​Performance Considerations

Making Predictions in Realtime

Preloading the Predictor

Making Realtime Predictions

Performance Considerations