Selecting the right LLM hosting strategy

When setting up AI Assistant, one of the key decisions is whether to use hosted models from providers like OpenAI or to host open source models locally. This guide will help you understand the tradeoffs between these approaches.

Key decision factors

Before choosing a hosting strategy, consider these fundamental questions:

Data privacy — Are you comfortable with your data leaving your infrastructure?
Cost model — Does pay-per-token pricing align with your usage patterns and budget?
Operational control — Do you need direct control over your infrastructure and model versions?

Based on your answers, you can choose between two main approaches, outlined below.

Hosted provider models

If you prioritize ease of use and quick setup, hosted providers like OpenAI, Azure OpenAI, or AWS Bedrock offer:

Quick setup — You only need an API key to get started.
No infrastructure management — The provider handles all compute resources and scaling.
Predictable costs — Pay per token for what you use.
State-of-the-art performance — Access to leading models like GPT-4.

Self-hosted solutions

If you need more control over your data and infrastructure, you have two options.

Before adopting any open source model, carefully review its license terms (typically found on Hugging Face). Not all open models permit commercial use, or they may have specific restrictions.

Provider-hosted open source models

Services like Anyscale, Together AI, or Hugging Face offer two deployment options.

Shared instances

Multiple customers share the same GPU infrastructure.
Pay per token used, similar to OpenAI’s pricing model.
Response times may vary based on overall platform load.
Ideal for development, testing, or low-volume production use.
Lower costs since you only pay for actual usage.

Private instances

Dedicated GPU resources exclusively for your workload.
Fixed hourly cost regardless of usage.
Consistent performance and response times.
Better for high-volume production deployments.
More cost-effective for constant, heavy workloads.

Local self-hosted models

Running models on your own infrastructure provides the following benefits:

Complete data privacy and control over information flow.
No external dependencies or vendor lock-in.
One-time infrastructure cost instead of ongoing API fees.
Predictable latency for consistent performance.

Key requirements for self-hosting:

Significant GPU resources for model computation.
DevOps expertise for deployment and maintenance.
Infrastructure management overhead for monitoring and scaling.

Choosing the right approach

Choose hosted providers when you want:

Quick time to market.
Minimal infrastructure management.
Flexibility to scale up/down.
Access to proprietary models.

Consider self-hosting when you have:

Strict data privacy requirements.
Sufficient GPU resources.
DevOps capabilities.
Consistent, high workload.

For most users starting with AI Assistant, hosted provider models offer the best balance of capabilities and operational simplicity. Consider self-hosting only when specific requirements around data privacy or cost at scale make it necessary.

Once you’ve decided on your hosting strategy, see our guide on configuring AI providers for detailed setup instructions for each supported provider.