Language Models
DataCrunch Inference LLM endpoints documentation
Our inference service provides Language Model endpoints compatible with TGI schema.
The LLM endpoints provided are both streaming and non-streaming. Both endpoints have one required parameter, inputs
, corresponding to the prompt. The optional parameters are specified in the parameters
object.
Curl examples
Non-streaming example, /generate
endpoint:
Streaming example, /generate_stream
endpoint:
Note: the decoder_input_details
parameter must be set to false
for the streaming endpoint.
Parameters
List of optional parameters
for TGI-based endpoints:
do_sample (
bool
, optional): Activate logits sampling. Defaults to False.max_new_tokens (
int
, optional): Maximum number of generated tokens. Defaults to 20.repetition_penalty (
float
, optional): The parameter for repetition penalty. A value of 1.0 means no penalty. See this paper for more details. Defaults to None.return_full_text (
bool
, optional): Whether to prepend the prompt to the generated text. Defaults to False.stop (
List[str]
, optional): Stop generating tokens if a member ofstop_sequences
is generated. Defaults to an empty list.seed (
int
, optional): Random sampling seed. Defaults to None.temperature (
float
, optional): The value used to modulate the logits distribution. Defaults to None.top_k (
int
, optional): The number of highest probability vocabulary tokens to keep for top-k-filtering. Defaults to None.top_p (
float
, optional): If set to a value less than 1, only the smallest set of most probable tokens with probabilities that add up totop_p
or higher are kept for generation. Defaults to None.truncate (
int
, optional): Truncate input tokens to the given size. Defaults to None.typical_p (
float
, optional): Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information. Defaults to None.best_of (
int
, optional): Generatebest_of
sequences and return the one with the highest token logprobs. Defaults to None.watermark (
bool
, optional): Watermarking with A Watermark for Large Language Models. Defaults to False.details (
bool
, optional): Get generation details. Defaults to False.decoder_input_details (
bool
, optional): Get decoder input token logprobs and ids. Defaults to False.
Last updated