HuggingFace Models (Experimental)

Experimental Feature

HuggingFace integration is currently in beta and under active development. We're working on adding support for more model types and improving the integration. The API and configuration format might change in future releases.

Current examples showcase basic integration with three models (GPT-2, T5, and Phi-2). More examples and model types will be added soon.

Configuration

To use a HuggingFace model, add it to your config.yaml with type huggingface. Here's the basic structure:

- name: model-name
  type: huggingface
  description: Description of your model
  endpoint: your/desired/endpoint
  hf_model_name: "model-id-on-huggingface"
  hf_task: "task-type"
  hf_generation_kwargs:
    parameter1: value1
    parameter2: value2

Parameters

hf_model_name: The model ID from HuggingFace Hub (e.g., "gpt2", "microsoft/phi-2")
hf_task: Task type ("generation" or "seq2seq")
hf_generation_kwargs: Model-specific generation parameters
endpoint: Your custom API endpoint path

Example Models

GPT-2 Text Generation

This example serves the GPT-2 model for text generation:

- name: gpt2-model
  type: huggingface
  description: GPT-2 text generation model
  endpoint: models/gpt2/generate
  hf_model_name: "gpt2"
  hf_task: "generation"
  hf_generation_kwargs:
    max_length: 100
    temperature: 0.7
    top_p: 0.9

Usage Example:

curl -X 'POST' \
  'http://localhost:5000/models/gpt2/generate' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Once upon a time",
    "generation_params": {
      "max_length": 200,
      "temperature": 0.7
    }
  }'

T5 Translation

This example uses T5-small for text translation:

- name: t5-translator
  type: huggingface
  description: T5 translation model
  endpoint: models/t5/translate
  hf_model_name: "t5-small"
  hf_task: "seq2seq"
  hf_generation_kwargs:
    max_length: 50

Usage Example:

curl -X 'POST' \
  'http://localhost:5000/models/t5/translate' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "translate English to German: Hello, how are you?"
  }'

Microsoft's Phi-2

This example serves the Phi-2 model for efficient text generation:

- name: phi2-generate
  type: huggingface
  description: Microsoft's Phi-2 small language model
  endpoint: models/phi2/generate
  hf_model_name: "microsoft/phi-2"
  hf_task: "generation"
  hf_generation_kwargs:
    max_length: 200
    temperature: 0
    top_p: 0.9
    do_sample: true

Usage Example:

curl -X 'POST' \
  'http://localhost:5000/models/phi2/generate' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Write a short story about a robot:",
    "generation_params": {
      "max_length": 150,
      "temperature": 0.7
    }
  }'

Generation Parameters

Common Parameters

Parameter	Description	Default
max_length	Maximum length of generated text	Model specific
temperature	Controls randomness (0.0 to 1.0)	1.0
top_p	Nucleus sampling parameter	1.0
do_sample	Whether to use sampling	true

You can set default parameters in hf_generation_kwargs and override them per request using generation_params in the API call.

Configuration​

Parameters​

Example Models​

GPT-2 Text Generation​

T5 Translation​

Microsoft's Phi-2​

Generation Parameters​

Common Parameters​

Configuration

Parameters

Example Models

GPT-2 Text Generation

T5 Translation

Microsoft's Phi-2

Generation Parameters

Common Parameters