Accelerating Predictions

Muna’s signature feature is allowing developers choose where inference runs, per-request.

Running Inference on Cloud GPUs

Use the muna.beta.predictions.remote.create method to run inference on a cloud GPU:

// 🔥 Run inference with an Nvidia B200 GPU
const prediction = await muna.beta.predictions.remote.create({
  tag: "@bytedance/depth-anything-3",
  inputs: { image },
  acceleration: "remote_b200"
});

Supported Cloud GPUs

Below are the currently supported cloud GPUs:

Acceleration	Notes
`remote_auto`	Automatically use the ideal remote acceleration.
`remote_cpu`	Predictions are run on AMD CPU servers.
`remote_a10`	Predictions are run on an Nvidia A10 GPU.
`remote_a100`	Predictions are run on an Nvidia A100 GPU.
`remote_h100`	Predictions are run on an Nvidia H100 GPU.
`remote_b200`	Predictions are run on an Nvidia B200 GPU.

If you want to self-host the GPU servers in your VPC or on-prem, reach out to us.

Running Inference Locally

Use the muna.predictions.create method to run inference locally:

// 🔥 Run inference with the local NPU
const prediction = await muna.predictions.create({
  tag: "@bytedance/depth-anything-3",
  inputs: { image },
  acceleration: "local_npu"
});

Supported Local Processors

Below are the currently supported local processors:

Acceleration	Notes
`local_cpu`	Use the CPU to accelerate predictions. This is always enabled.
`local_gpu`	Use the GPU to accelerate predictions.
`local_npu`	Use the neural processor to accelerate predictions.

Muna currently does not support multi-GPU acceleration. This is planned for the future.

Specifying the Local GPU

Some Muna clients allow you to specify the acceleration device used to make predictions. Our clients expose this field as an untyped integer or pointer. The underlying type depends on the current operating system:

OS	Device type	Notes
Android	-	Currently unsupported.
iOS	`id<MTLDevice>`	Metal device.
Linux	`int*`	Pointer to CUDA device ID.
macOS	`id<MTLDevice>`	Metal device.
visionOS	`id<MTLDevice>`	Metal device.
Web	`GPUDevice`	WebGPU device.
Windows	`ID3D12Device*`	DirectX 12 device.

The prediction device is merely a hint. Setting a device does not guarantee that all or any operation in the prediction function will actually use that acceleration device.

You should absolutely (absolutely) never ever do this unless you know what the hell you’re doing.

Get Started

Running Models

Compiling Models

Insiders

Accelerating Predictions

Running Inference on Cloud GPUs

Supported Cloud GPUs

Running Inference Locally

Supported Local Processors

Specifying the Local GPU

Get Started

Running Models

Compiling Models

Insiders

​Running Inference on Cloud GPUs

​Supported Cloud GPUs

​Running Inference Locally

​Supported Local Processors

​Specifying the Local GPU

Running Inference on Cloud GPUs

Supported Cloud GPUs

Running Inference Locally

Supported Local Processors

Specifying the Local GPU