Skip to content

HuggingFace LLM

Warning

Deploying Huggingface-LLM requires an existing Kubernetes cluster, with a GPU-enabled node group.

Introduction

A generative AI chatbot service backed by a HuggingFace model, exposed via a convenient web interface.

Launch configuration

To get started, in the Platforms tab, press the New Platform button, and select HuggingFace LLM.

You will then be presented with launch configuration options to fill in:

Option Explanation
Platform name A name to identify the HuggingFace LLM platform
Kubernetes cluster The Kubernetes platform on which to deploy HuggingFace LLM. If one hasn't already been created, check out the Kubernetes Overview.
App version The version of the HuggingFace LLM Azimuth Application to use.
Model The model to deploy from HuggingFace. vLLM is used for model serving, so any of their supported models should work.
Access Token HuggingFace https://huggingface.co/docs/hub/security-tokens which is required for some gated models
Instruction The initial system prompt, hidden from the user, which is used when generating responses
Page Title The title displayed at the top of the chat interface
Backend vLLM Version The version of vLLM to use from this list
LLM Sampling Parameters (Temperature, Frequency etc) See the vLLM docs
Max Tokens Maximum number of tokens to generate per response. Use this to moderate compute cost.
Model Context Length Override for the model's maximum context length