One advantage with remote predictions is having access to orders of magnitude more compute than on your
local device. Muna supports specifying a RemoteAcceleration when creating remote predictions:
Copy
// Create a remote prediction on an Nvidia A100 GPUconst prediction = await muna.beta.predictions.remote.create({ tag: "@meta/llama-3.1-70b", inputs: { ... }, acceleration: "remote_a100"});
Below are the currently supported types of RemoteAcceleration:
Remote Acceleration
Notes
remote_auto
Automatically use the ideal remote acceleration.
remote_cpu
Predictions are run on AMD CPU servers.
remote_a40
Predictions are run on an Nvidia A40 GPU.
remote_a100
Predictions are run on an Nvidia A100 GPU.
Remote predictions are priced by the remote acceleration, per second of prediction time (i.e. prediction.latency).
See our pricing for more information.
If you want to self-host the remote acceleration servers in your VPC or on-prem, reach out to us.