OpenAI API-compatible provider

Information

While we provide these recommendations as a starting point, finding the optimal settings for OpenAI-compatible models often requires trial and error. Every use case is unique, and you may need to experiment with different configurations to achieve the best results for your specific needs. If you’re working on a custom implementation or need help fine-tuning these settings, our Support team is here to help — just reach out and we’ll guide you through the process.

For complete data security, it’s advisable to self-host models using your own infrastructure. Although we’d suggest using one of the cloud model providers for cost-effectiveness and ease of use, you may still host your own models and connect them to AI Assistant.

While this guide focuses on connecting inference frameworks (model hosting) to AI Assistant, self-hosting setups require specialized AI/machine learning infrastructure knowledge and experience. Refer to the vLLM documentation, Hugging Face TGI, or Ollama for in-depth guidance on setting up these environments.

Service configuration file

To connect an OpenAI API-compatible provider to AI Assistant, you’ll need to create a service configuration file, as explained in the model-provider configuration guide.

Currently, we suggest using the Llama 3.1 70B Instruct model. If this model isn’t available in your region of operation, you may also use Mistral AI Mistral Large.

For an embedding model, we suggest using Mixedbread large v1 model or the Nomic v1.5 model:

version: '1'

aiServices:
   chat:
      provider:
         name: 'openai-compat'
         baseUrl: 'your-inference-framework-url'
         apiKey: 'your-api-key' # Optional
      model: 'llama3.1:70b'
   textEmbeddings:
      provider:
         name: 'openai-compat'
         baseUrl: 'your-inference-framework-url'
         apiKey: 'your-api-key' # Optional
      model: 'mxbai-embed-large'
  • provider:

    • name: The name of the provider. Set this to openai-compat.

    • baseUrl: The base URL of your inference framework. For example, Ollama hosts its OpenAI API-compatible endpoint at http://localhost:11434/v1.

    • apiKey: An optional API key if your inference framework requires it. You can set it here or as an environment variable (OPENAI_COMPAT_API_KEY).

  • model: The name of the model you want to use. For example, llama3.1:70b for the chat service, or mxbai-embed-large for the embedding service.