API reference
- class gseai.GSEAIServer(api_token, host='gseai.gse.buffalo.edu', port=11434)
Bases:
objectClient for the GSE AI LMStudio server.
- Parameters:
api_token (
str) – Bearer token for authentication.host (
str) – Hostname of the server.port (
int) – Port the server listens on.
- close()
- Return type:
None
- list_models()
GET /api/v1/models — list available models.
- Return type:
dict
- load_model(model, ttl=None)
POST /api/v1/models/load — load a model into memory.
- Parameters:
model (
str) – Model identifier.ttl (
int|None) – Idle time-to-live in seconds before the model is evicted.
- Return type:
dict
- unload_model(model)
POST /api/v1/models/unload — unload a model from memory.
- Parameters:
model (
str) – Model identifier.- Return type:
dict
- download_model(model)
POST /api/v1/models/download — start downloading a model.
- Parameters:
model (
str) – Model identifier to download.- Return type:
dict- Returns:
Dict containing a
job_idfor tracking download progress.
- get_download_status(job_id)
GET /api/v1/models/download/status/{job_id} — check download progress.
- Parameters:
job_id (
str) – Job ID returned bydownload_model().- Return type:
dict
- chat(model, input, *, system_prompt=None, temperature=None, top_p=None, top_k=None, min_p=None, repeat_penalty=None, max_output_tokens=None, reasoning=None, context_length=None, store=True, previous_response_id=None, ttl=None, integrations=None, stream=False)
POST /api/v1/chat — generate a chat response.
- Parameters:
model (
str) – Model identifier.input (
str|list[dict]) – A string prompt or a list of message dicts.system_prompt (
str|None) – Optional system message.temperature (
float|None) – Sampling temperature (0–2).top_p (
float|None) – Nucleus sampling probability (0–1).top_k (
int|None) – Top-k sampling limit.min_p (
float|None) – Minimum probability threshold (0–1).repeat_penalty (
float|None) – Repetition penalty.max_output_tokens (
int|None) – Maximum number of tokens to generate.reasoning (
str|None) – Reasoning effort — “off”, “low”, “medium”, “high”, or “on”.context_length (
int|None) – Context window size override.store (
bool) – Whether to persist the conversation server-side (default True).previous_response_id (
str|None) – ID of a prior response to continue.ttl (
int|None) – Model idle time-to-live in seconds.integrations (
list[dict] |None) – MCP servers or plugin configs.stream (
bool) – If True, return a generator of Server-Sent Event dicts.
- Return type:
dict|Generator[dict,None,None]- Returns:
Response dict, or a generator of SSE event dicts when
stream=True.
- list_models_openai()
GET /v1/models — list models (OpenAI-compatible format).
- Return type:
dict
- chat_completions(model, messages, *, temperature=None, max_tokens=None, stream=False, top_p=None, top_k=None, stop=None, presence_penalty=None, frequency_penalty=None, repeat_penalty=None, logit_bias=None, seed=None, response_format=None, tools=None, tool_choice=None, ttl=None)
POST /v1/chat/completions — OpenAI-compatible chat completions.
- Parameters:
model (
str) – Model identifier.messages (
list[dict]) – List of message dicts withroleandcontent.temperature (
float|None) – Sampling temperature (0–2).max_tokens (
int|None) – Maximum tokens to generate.stream (
bool) – If True, return a generator of SSE event dicts.top_p (
float|None) – Nucleus sampling (0–1).top_k (
int|None) – Top-k sampling limit.stop (
str|list[str] |None) – Stop sequence(s).presence_penalty (
float|None) – Presence penalty (-2 to 2).frequency_penalty (
float|None) – Frequency penalty (-2 to 2).repeat_penalty (
float|None) – Repetition penalty.logit_bias (
dict|None) – Token probability bias adjustments.seed (
int|None) – Random seed for reproducibility.response_format (
dict|None) – JSON schema for structured output.tools (
list[dict] |None) – Function definitions for tool/function calling.tool_choice (
str|None) – Tool selection mode — “auto”, “none”, or “required”.ttl (
int|None) – Model idle time-to-live in seconds.
- Return type:
dict|Generator[dict,None,None]
- completions(model, prompt, *, max_tokens=None, temperature=None, top_p=None, top_k=None, stop=None, frequency_penalty=None, presence_penalty=None, stream=False, seed=None, ttl=None)
POST /v1/completions — legacy text completions.
- Parameters:
model (
str) – Model identifier.prompt (
str|list) – Input text or list of texts.max_tokens (
int|None) – Maximum tokens to generate.temperature (
float|None) – Sampling temperature (0–2).top_p (
float|None) – Nucleus sampling (0–1).top_k (
int|None) – Top-k sampling limit.stop (
str|list[str] |None) – Stop sequence(s).frequency_penalty (
float|None) – Frequency penalty (-2 to 2).presence_penalty (
float|None) – Presence penalty (-2 to 2).stream (
bool) – If True, return a generator of SSE event dicts.seed (
int|None) – Random seed.ttl (
int|None) – Model idle time-to-live in seconds.
- Return type:
dict|Generator[dict,None,None]
- embeddings(model, input, *, encoding_format=None, dimensions=None)
POST /v1/embeddings — generate text embeddings.
- Parameters:
model (
str) – Model identifier.input (
str|list[str]) – Text or list of texts to embed.encoding_format (
str|None) – Output format — “float” or “base64”.dimensions (
int|None) – Target embedding dimensionality.
- Return type:
dict
- responses(model, messages, **kwargs)
POST /v1/responses — stateful chat responses (OpenAI-compatible).
- Parameters:
model (
str) – Model identifier.messages (
list[dict]) – List of message dicts withroleandcontent.**kwargs (
Any) – Additional parameters forwarded to the endpoint.
- Return type:
dict
- messages(model, messages, max_tokens, *, system=None, temperature=None, top_p=None, top_k=None)
POST /v1/messages — Anthropic-compatible messages API.
- Parameters:
model (
str) – Model identifier.messages (
list[dict]) – List of message dicts withroleandcontent.max_tokens (
int) – Maximum tokens to generate (required by the API).system (
str|None) – System prompt.temperature (
float|None) – Sampling temperature.top_p (
float|None) – Nucleus sampling (0–1).top_k (
int|None) – Top-k sampling limit.
- Return type:
dict