Links
Comment on page

Language Models

DataCrunch Inference LLM endpoints documentation
Our inference service provides Language Model endpoints compatible with TGI schema.
The LLM endpoints provided are both streaming and non-streaming. Both endpoints have one required parameter, inputs , corresponding to the prompt. The optional parameters are specified in the parameters object.

Curl examples

Non-streaming example, /generate endpoint:
curl -X POST https://<ENDPOINT_URL>/generate \
-H "Content-Type: application/json" \
-H <AUTH_HEADERS> \
-d \
'{
"model": "<MODEL_NAME>",
"inputs": "My name is Olivier and I",
"parameters": {
"best_of": 1,
"decoder_input_details": true,
"details": true,
"do_sample": false,
"max_new_tokens": 20,
"repetition_penalty": 1.03,
"return_full_text": false,
"seed": null,
"stop": [
"photographer"
],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": null,
"typical_p": 0.95,
"watermark": true
}
}'
Streaming example, /generate_stream endpoint:
curl -N -X POST https://<ENDPOINT_URL>/generate_stream \
-H "Content-Type: application/json" \
-H <AUTH_HEADERS> \
-d \
'{
"model": "<MODEL_NAME>",
"inputs": "My name is Olivier and I",
"parameters": {
"best_of": 1,
"decoder_input_details": false,
"details": true,
"do_sample": false,
"max_new_tokens": 20,
"repetition_penalty": 1.03,
"return_full_text": false,
"seed": null,
"stop": [
"photographer"
],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": null,
"typical_p": 0.95,
"watermark": true
}
}'
Note: the decoder_input_details parameter must be set to false for the streaming endpoint.

Parameters

List of optional parameters for TGI-based endpoints:
  • do_sample (bool, optional): Activate logits sampling. Defaults to False.
  • max_new_tokens (int, optional): Maximum number of generated tokens. Defaults to 20.
  • repetition_penalty (float, optional): The parameter for repetition penalty. A value of 1.0 means no penalty. See this paper for more details. Defaults to None.
  • return_full_text (bool, optional): Whether to prepend the prompt to the generated text. Defaults to False.
  • stop (List[str], optional): Stop generating tokens if a member of stop_sequences is generated. Defaults to an empty list.
  • seed (int, optional): Random sampling seed. Defaults to None.
  • temperature (float, optional): The value used to modulate the logits distribution. Defaults to None.
  • top_k (int, optional): The number of highest probability vocabulary tokens to keep for top-k-filtering. Defaults to None.
  • top_p (float, optional): If set to a value less than 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. Defaults to None.
  • truncate (int, optional): Truncate input tokens to the given size. Defaults to None.
  • typical_p (float, optional): Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information. Defaults to None.
  • best_of (int, optional): Generate best_of sequences and return the one with the highest token logprobs. Defaults to None.
  • watermark (bool, optional): Watermarking with A Watermark for Large Language Models. Defaults to False.
  • details (bool, optional): Get generation details. Defaults to False.
  • decoder_input_details (bool, optional): Get decoder input token logprobs and ids. Defaults to False.