How does the On-Premise AI solution scale?

On-Premise AI scales flexibly and incrementally: you can scale horizontally (add more servers) or scale vertically (more powerful hardware). Kubernetes clusters enable automatic scaling.

Scaling strategies

Horizontal scaling (more servers)

Advantages:

Servers can be added incrementally
No replacement of existing hardware required
Flexible expansion

Example scaling:

Start: 1× SMALL
Growth: +1× MEDIUM
Further growth: +1× LARGE

Important:

With consumer hardware, the weakest component determines overall performance
All servers must meet minimum requirements

Vertical scaling (more powerful hardware)

Advantages:

Higher performance per server
Less administrative overhead
Simpler management

Hybrid approach

Combination:

Base servers for standard workloads
Powerful servers for critical applications
Cluster management with Kubernetes

Kubernetes cluster

Automatic scaling

For larger setups:

Automatic scaling with Kubernetes
Load balancing for optimal resource use
Self-healing on failures
Central management of multiple servers

Advantages:

Automatic scaling when needed
Optimal resource use
High availability

Cluster management

Functions:

Central management of multiple servers
Automatic load balancing
Rolling updates without downtime
Self-healing on failures

Scaling without data loss

Modular architecture

Advantages:

Servers can be added without changing existing configurations
Models remain available on all servers
Data can be managed centrally
No data migration required

Docker Compose to Kubernetes

Migration path:

Start with Docker Compose (simple)
Gradually move to Kubernetes (when needed)
Seamless migration possible

Costs when scaling

Predictable costs

On-Premise:

Additional hardware only when needed
No usage dependency
Predictable costs

Cloud:

Each additional user = more token costs
Unpredictable costs

Next steps

Would you like to know more about scaling?

Contact us – Get advice on scaling options

Sources and further information:

Start small and scale up – Detailed scaling strategies
On-Premise AI for SMEs – Scaling and clusters

How does the On-Premise AI solution scale?

Horizontally via additional units, vertically via more powerful hardware. Flexibly expandable.

Scaling strategies

Horizontal scaling (more servers)

Vertical scaling (more powerful hardware)

Hybrid approach

Kubernetes cluster

Automatic scaling

Cluster management

Scaling without data loss

Modular architecture

Docker Compose to Kubernetes

Costs when scaling

Predictable costs

Next steps