Comment on page
Language Models
DataCrunch Inference LLM endpoints documentation
The LLM endpoints provided are both streaming and non-streaming. Both endpoints have one required parameter,
inputs
, corresponding to the prompt. The optional parameters are specified in the parameters
object.Non-streaming example,
/generate
endpoint:curl -X POST https://<ENDPOINT_URL>/generate \
-H "Content-Type: application/json" \
-H <AUTH_HEADERS> \
-d \
'{
"model": "<MODEL_NAME>",
"inputs": "My name is Olivier and I",
"parameters": {
"best_of": 1,
"decoder_input_details": true,
"details": true,
"do_sample": false,
"max_new_tokens": 20,
"repetition_penalty": 1.03,
"return_full_text": false,
"seed": null,
"stop": [
"photographer"
],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": null,
"typical_p": 0.95,
"watermark": true
}
}'
Streaming example,
/generate_stream
endpoint:curl -N -X POST https://<ENDPOINT_URL>/generate_stream \
-H "Content-Type: application/json" \
-H <AUTH_HEADERS> \
-d \
'{
"model": "<MODEL_NAME>",
"inputs": "My name is Olivier and I",
"parameters": {
"best_of": 1,
"decoder_input_details": false,
"details": true,
"do_sample": false,
"max_new_tokens": 20,
"repetition_penalty": 1.03,
"return_full_text": false,
"seed": null,
"stop": [
"photographer"
],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": null,
"typical_p": 0.95,
"watermark": true
}
}'
Note: the
decoder_input_details
parameter must be set to false
for the streaming endpoint.- do_sample (
bool
, optional): Activate logits sampling. Defaults to False. - max_new_tokens (
int
, optional): Maximum number of generated tokens. Defaults to 20. - repetition_penalty (
float
, optional): The parameter for repetition penalty. A value of 1.0 means no penalty. See this paper for more details. Defaults to None. - return_full_text (
bool
, optional): Whether to prepend the prompt to the generated text. Defaults to False. - stop (
List[str]
, optional): Stop generating tokens if a member ofstop_sequences
is generated. Defaults to an empty list. - seed (
int
, optional): Random sampling seed. Defaults to None. - temperature (
float
, optional): The value used to modulate the logits distribution. Defaults to None. - top_k (
int
, optional): The number of highest probability vocabulary tokens to keep for top-k-filtering. Defaults to None. - top_p (
float
, optional): If set to a value less than 1, only the smallest set of most probable tokens with probabilities that add up totop_p
or higher are kept for generation. Defaults to None. - truncate (
int
, optional): Truncate input tokens to the given size. Defaults to None. - typical_p (
float
, optional): Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information. Defaults to None. - best_of (
int
, optional): Generatebest_of
sequences and return the one with the highest token logprobs. Defaults to None. - watermark (
bool
, optional): Watermarking with A Watermark for Large Language Models. Defaults to False. - details (
bool
, optional): Get generation details. Defaults to False. - decoder_input_details (
bool
, optional): Get decoder input token logprobs and ids. Defaults to False.