POST
/
cloud
/
v3
/
inference
/
{project_id}
/
deployments
Python
import os
from gcore import Gcore

client = Gcore(
    api_key=os.environ.get("GCORE_API_KEY"),  # This is the default and can be omitted
)
task_id_list = client.cloud.inference.deployments.create(
    project_id=1,
    containers=[{
        "region_id": 1,
        "scale": {
            "max": 3,
            "min": 1,
        },
    }],
    flavor_name="inference-16vcpu-232gib-1xh100-80gb",
    image="nginx:latest",
    listening_port=80,
    name="my-instance",
)
print(task_id_list.tasks)
{
  "tasks": [
    "d478ae29-dedc-4869-82f0-96104425f565"
  ]
}

Authorizations

Authorization
string
header
required

API key for authentication. Make sure to include the word apikey, followed by a single space and then your token. Example: apikey 1234$abcdef

Path Parameters

project_id
integer
required

Project ID

Examples:

1

Body

application/json
containers
ContainerInSerializerV3 · object[]
required

List of containers for the inference instance.

Minimum length: 1
Examples:
[
{
"region_id": 1,
"scale": {
"cooldown_period": 60,
"max": 3,
"min": 1,
"triggers": {
"cpu": { "threshold": 80 },
"memory": { "threshold": 70 }
}
}
}
]
flavor_name
string
required

Flavor name for the inference instance.

Minimum length: 1
Examples:

"inference-16vcpu-232gib-1xh100-80gb"

image
string
required

Docker image for the inference instance. This field should contain the image name and tag in the format 'name:tag', e.g., 'nginx:latest'. It defaults to Docker Hub as the image registry, but any accessible Docker image URL can be specified.

Examples:

"nginx:latest"

listening_port
integer
required

Listening port for the inference instance.

Required range: 1 <= x <= 65535
Examples:

80

name
string
required

Inference instance name.

Required string length: 4 - 30
Examples:

"my-instance"

auth_enabled
boolean
default:false

Set to true to enable API key authentication for the inference instance. "Authorization": "Bearer ****\*" or "X-Api-Key": "****\*" header is required for the requests to the instance if enabled

Examples:

false

command
string[] | null

Command to be executed when running a container from an image.

Examples:
["nginx", "-g", "daemon off;"]
credentials_name
string | null
default:""

Registry credentials name

Examples:

"dockerhub"

description
string | null
default:""

Inference instance description.

Examples:

"My first instance"

envs
object

Environment variables for the inference instance.

Examples:
{ "DEBUG_MODE": "False", "KEY": "12345" }
ingress_opts
object | null

Ingress options for the inference instance

Examples:
{ "disable_response_buffering": true }
logging
object | null

Logging configuration for the inference instance

Examples:
{
"destination_region_id": 1,
"enabled": true,
"retention_policy": { "period": 42 },
"topic_name": "my-log-name"
}
{ "enabled": false }
probes
object | null

Probes configured for all containers of the inference instance. If probes are not provided, and the image_name is from a the Model Catalog registry, the default probes will be used.

timeout
integer | null
default:120

Specifies the duration in seconds without any requests after which the containers will be downscaled to their minimum scale value as defined by scale.min. If set, this helps in optimizing resource usage by reducing the number of container instances during periods of inactivity. The default value when the parameter is not set is 120.

Required range: x >= 0
Examples:

120

Response

200 - application/json

OK

tasks
string[]
required

List of task IDs

Examples:
["d478ae29-dedc-4869-82f0-96104425f565"]