On-Premise AI scales flexibly and incrementally: you can scale horizontally (add more servers) or scale vertically (more powerful hardware). Kubernetes clusters enable automatic scaling.

Scaling strategies

Horizontal scaling (more servers)

Advantages:

  • Servers can be added incrementally
  • No replacement of existing hardware required
  • Flexible expansion

Example scaling:

  • Start: 1× SMALL
  • Growth: +1× MEDIUM
  • Further growth: +1× LARGE

Important:

  • With consumer hardware, the weakest component determines overall performance
  • All servers must meet minimum requirements

Vertical scaling (more powerful hardware)

Advantages:

  • Higher performance per server
  • Less administrative overhead
  • Simpler management

Hybrid approach

Combination:

  • Base servers for standard workloads
  • Powerful servers for critical applications
  • Cluster management with Kubernetes

Kubernetes cluster

Automatic scaling

For larger setups:

  • Automatic scaling with Kubernetes
  • Load balancing for optimal resource use
  • Self-healing on failures
  • Central management of multiple servers

Advantages:

  • Automatic scaling when needed
  • Optimal resource use
  • High availability

Cluster management

Functions:

  • Central management of multiple servers
  • Automatic load balancing
  • Rolling updates without downtime
  • Self-healing on failures

Scaling without data loss

Modular architecture

Advantages:

  • Servers can be added without changing existing configurations
  • Models remain available on all servers
  • Data can be managed centrally
  • No data migration required

Docker Compose to Kubernetes

Migration path:

  • Start with Docker Compose (simple)
  • Gradually move to Kubernetes (when needed)
  • Seamless migration possible

Costs when scaling

Predictable costs

On-Premise:

  • Additional hardware only when needed
  • No usage dependency
  • Predictable costs

Cloud:

  • Each additional user = more token costs
  • Unpredictable costs

Next steps

Would you like to know more about scaling?


Sources and further information: