On-Premise AI scales flexibly and incrementally: you can scale horizontally (add more servers) or scale vertically (more powerful hardware). Kubernetes clusters enable automatic scaling.
Scaling strategies
Horizontal scaling (more servers)
Advantages:
- Servers can be added incrementally
- No replacement of existing hardware required
- Flexible expansion
Example scaling:
- Start: 1× SMALL
- Growth: +1× MEDIUM
- Further growth: +1× LARGE
Important:
- With consumer hardware, the weakest component determines overall performance
- All servers must meet minimum requirements
Vertical scaling (more powerful hardware)
Advantages:
- Higher performance per server
- Less administrative overhead
- Simpler management
Hybrid approach
Combination:
- Base servers for standard workloads
- Powerful servers for critical applications
- Cluster management with Kubernetes
Kubernetes cluster
Automatic scaling
For larger setups:
- Automatic scaling with Kubernetes
- Load balancing for optimal resource use
- Self-healing on failures
- Central management of multiple servers
Advantages:
- Automatic scaling when needed
- Optimal resource use
- High availability
Cluster management
Functions:
- Central management of multiple servers
- Automatic load balancing
- Rolling updates without downtime
- Self-healing on failures
Scaling without data loss
Modular architecture
Advantages:
- Servers can be added without changing existing configurations
- Models remain available on all servers
- Data can be managed centrally
- No data migration required
Docker Compose to Kubernetes
Migration path:
- Start with Docker Compose (simple)
- Gradually move to Kubernetes (when needed)
- Seamless migration possible
Costs when scaling
Predictable costs
On-Premise:
- Additional hardware only when needed
- No usage dependency
- Predictable costs
Cloud:
- Each additional user = more token costs
- Unpredictable costs
Next steps
Would you like to know more about scaling?
- Contact us – Get advice on scaling options
Sources and further information:
- Start small and scale up – Detailed scaling strategies
- On-Premise AI for SMEs – Scaling and clusters