API reference

class gseai.GSEAIServer(api_token, host='gseai.gse.buffalo.edu', port=11434)

Bases: object

Client for the GSE AI LMStudio server.

Parameters:
  • api_token (str) – Bearer token for authentication.

  • host (str) – Hostname of the server.

  • port (int) – Port the server listens on.

close()
Return type:

None

list_models()

GET /api/v1/models — list available models.

Return type:

dict

load_model(model, ttl=None)

POST /api/v1/models/load — load a model into memory.

Parameters:
  • model (str) – Model identifier.

  • ttl (int | None) – Idle time-to-live in seconds before the model is evicted.

Return type:

dict

unload_model(model)

POST /api/v1/models/unload — unload a model from memory.

Parameters:

model (str) – Model identifier.

Return type:

dict

download_model(model)

POST /api/v1/models/download — start downloading a model.

Parameters:

model (str) – Model identifier to download.

Return type:

dict

Returns:

Dict containing a job_id for tracking download progress.

get_download_status(job_id)

GET /api/v1/models/download/status/{job_id} — check download progress.

Parameters:

job_id (str) – Job ID returned by download_model().

Return type:

dict

chat(model, input, *, system_prompt=None, temperature=None, top_p=None, top_k=None, min_p=None, repeat_penalty=None, max_output_tokens=None, reasoning=None, context_length=None, store=True, previous_response_id=None, ttl=None, integrations=None, stream=False)

POST /api/v1/chat — generate a chat response.

Parameters:
  • model (str) – Model identifier.

  • input (str | list[dict]) – A string prompt or a list of message dicts.

  • system_prompt (str | None) – Optional system message.

  • temperature (float | None) – Sampling temperature (0–2).

  • top_p (float | None) – Nucleus sampling probability (0–1).

  • top_k (int | None) – Top-k sampling limit.

  • min_p (float | None) – Minimum probability threshold (0–1).

  • repeat_penalty (float | None) – Repetition penalty.

  • max_output_tokens (int | None) – Maximum number of tokens to generate.

  • reasoning (str | None) – Reasoning effort — “off”, “low”, “medium”, “high”, or “on”.

  • context_length (int | None) – Context window size override.

  • store (bool) – Whether to persist the conversation server-side (default True).

  • previous_response_id (str | None) – ID of a prior response to continue.

  • ttl (int | None) – Model idle time-to-live in seconds.

  • integrations (list[dict] | None) – MCP servers or plugin configs.

  • stream (bool) – If True, return a generator of Server-Sent Event dicts.

Return type:

dict | Generator[dict, None, None]

Returns:

Response dict, or a generator of SSE event dicts when stream=True.

list_models_openai()

GET /v1/models — list models (OpenAI-compatible format).

Return type:

dict

chat_completions(model, messages, *, temperature=None, max_tokens=None, stream=False, top_p=None, top_k=None, stop=None, presence_penalty=None, frequency_penalty=None, repeat_penalty=None, logit_bias=None, seed=None, response_format=None, tools=None, tool_choice=None, ttl=None)

POST /v1/chat/completions — OpenAI-compatible chat completions.

Parameters:
  • model (str) – Model identifier.

  • messages (list[dict]) – List of message dicts with role and content.

  • temperature (float | None) – Sampling temperature (0–2).

  • max_tokens (int | None) – Maximum tokens to generate.

  • stream (bool) – If True, return a generator of SSE event dicts.

  • top_p (float | None) – Nucleus sampling (0–1).

  • top_k (int | None) – Top-k sampling limit.

  • stop (str | list[str] | None) – Stop sequence(s).

  • presence_penalty (float | None) – Presence penalty (-2 to 2).

  • frequency_penalty (float | None) – Frequency penalty (-2 to 2).

  • repeat_penalty (float | None) – Repetition penalty.

  • logit_bias (dict | None) – Token probability bias adjustments.

  • seed (int | None) – Random seed for reproducibility.

  • response_format (dict | None) – JSON schema for structured output.

  • tools (list[dict] | None) – Function definitions for tool/function calling.

  • tool_choice (str | None) – Tool selection mode — “auto”, “none”, or “required”.

  • ttl (int | None) – Model idle time-to-live in seconds.

Return type:

dict | Generator[dict, None, None]

completions(model, prompt, *, max_tokens=None, temperature=None, top_p=None, top_k=None, stop=None, frequency_penalty=None, presence_penalty=None, stream=False, seed=None, ttl=None)

POST /v1/completions — legacy text completions.

Parameters:
  • model (str) – Model identifier.

  • prompt (str | list) – Input text or list of texts.

  • max_tokens (int | None) – Maximum tokens to generate.

  • temperature (float | None) – Sampling temperature (0–2).

  • top_p (float | None) – Nucleus sampling (0–1).

  • top_k (int | None) – Top-k sampling limit.

  • stop (str | list[str] | None) – Stop sequence(s).

  • frequency_penalty (float | None) – Frequency penalty (-2 to 2).

  • presence_penalty (float | None) – Presence penalty (-2 to 2).

  • stream (bool) – If True, return a generator of SSE event dicts.

  • seed (int | None) – Random seed.

  • ttl (int | None) – Model idle time-to-live in seconds.

Return type:

dict | Generator[dict, None, None]

embeddings(model, input, *, encoding_format=None, dimensions=None)

POST /v1/embeddings — generate text embeddings.

Parameters:
  • model (str) – Model identifier.

  • input (str | list[str]) – Text or list of texts to embed.

  • encoding_format (str | None) – Output format — “float” or “base64”.

  • dimensions (int | None) – Target embedding dimensionality.

Return type:

dict

responses(model, messages, **kwargs)

POST /v1/responses — stateful chat responses (OpenAI-compatible).

Parameters:
  • model (str) – Model identifier.

  • messages (list[dict]) – List of message dicts with role and content.

  • **kwargs (Any) – Additional parameters forwarded to the endpoint.

Return type:

dict

messages(model, messages, max_tokens, *, system=None, temperature=None, top_p=None, top_k=None)

POST /v1/messages — Anthropic-compatible messages API.

Parameters:
  • model (str) – Model identifier.

  • messages (list[dict]) – List of message dicts with role and content.

  • max_tokens (int) – Maximum tokens to generate (required by the API).

  • system (str | None) – System prompt.

  • temperature (float | None) – Sampling temperature.

  • top_p (float | None) – Nucleus sampling (0–1).

  • top_k (int | None) – Top-k sampling limit.

Return type:

dict