-m, --model
: The model to use (e.g., llama2, mistral)-p, --prompt
: The input text prompt-s, --stream
: Stream the response token-by-token instead of waiting for the complete response-t, --temperature
: Controls randomness (0.0 = deterministic, 1.0 = maximum creativity)--top-p
: Controls diversity through nucleus sampling (0.0-1.0)