API reference

class gseai.GSEAIServer(api_token, host='gseai.gse.buffalo.edu', port=11434, timeout=None)

Bases: object

Client for the GSE AI LocalAI server.

Parameters:
  • api_token (str) – Bearer token for authentication.

  • host (str) – Hostname of the server.

  • port (int) – Port the server listens on.

  • timeout (float | None) – Request timeout in seconds. None (default) means no timeout, which is recommended for slow models.

close()
Return type:

None

list_models()

GET /v1/models — list available models.

Return type:

dict

chat(model, prompt, *, system_prompt=None, temperature=None, max_tokens=None, stream=False)

Convenience wrapper for single-turn chat.

Parameters:
  • model (str) – Model identifier.

  • prompt (str) – User message as a plain string.

  • system_prompt (str | None) – Optional system message.

  • temperature (float | None) – Sampling temperature (0–2).

  • max_tokens (int | None) – Maximum tokens to generate.

  • stream (bool) – If True, return a generator of SSE event dicts.

Return type:

dict | Generator[dict, None, None]

Returns:

Response dict, or a generator of SSE event dicts when stream=True.

chat_completions(model, messages, *, temperature=None, max_tokens=None, stream=False, top_p=None, top_k=None, stop=None, presence_penalty=None, frequency_penalty=None, repeat_penalty=None, logit_bias=None, seed=None, response_format=None, tools=None, tool_choice=None)

POST /v1/chat/completions — OpenAI-compatible chat completions.

Parameters:
  • model (str) – Model identifier.

  • messages (list[dict]) – List of message dicts with role and content.

  • temperature (float | None) – Sampling temperature (0–2).

  • max_tokens (int | None) – Maximum tokens to generate.

  • stream (bool) – If True, return a generator of SSE event dicts.

  • top_p (float | None) – Nucleus sampling (0–1).

  • top_k (int | None) – Top-k sampling limit.

  • stop (str | list[str] | None) – Stop sequence(s).

  • presence_penalty (float | None) – Presence penalty (-2 to 2).

  • frequency_penalty (float | None) – Frequency penalty (-2 to 2).

  • repeat_penalty (float | None) – Repetition penalty.

  • logit_bias (dict | None) – Token probability bias adjustments.

  • seed (int | None) – Random seed for reproducibility.

  • response_format (dict | None) – JSON schema for structured output.

  • tools (list[dict] | None) – Function definitions for tool/function calling.

  • tool_choice (str | None) – Tool selection mode — “auto”, “none”, or “required”.

Return type:

dict | Generator[dict, None, None]

completions(model, prompt, *, max_tokens=None, temperature=None, top_p=None, top_k=None, stop=None, frequency_penalty=None, presence_penalty=None, stream=False, seed=None)

POST /v1/completions — legacy text completions.

Parameters:
  • model (str) – Model identifier.

  • prompt (str | list) – Input text or list of texts.

  • max_tokens (int | None) – Maximum tokens to generate.

  • temperature (float | None) – Sampling temperature (0–2).

  • top_p (float | None) – Nucleus sampling (0–1).

  • top_k (int | None) – Top-k sampling limit.

  • stop (str | list[str] | None) – Stop sequence(s).

  • frequency_penalty (float | None) – Frequency penalty (-2 to 2).

  • presence_penalty (float | None) – Presence penalty (-2 to 2).

  • stream (bool) – If True, return a generator of SSE event dicts.

  • seed (int | None) – Random seed.

Return type:

dict | Generator[dict, None, None]

embeddings(model, input, *, encoding_format=None, dimensions=None)

POST /v1/embeddings — generate text embeddings.

Parameters:
  • model (str) – Model identifier.

  • input (str | list[str]) – Text or list of texts to embed.

  • encoding_format (str | None) – Output format — “float” or “base64”.

  • dimensions (int | None) – Target embedding dimensionality.

Return type:

dict

responses(model, messages, **kwargs)

POST /v1/responses — stateful chat responses (OpenAI-compatible).

Parameters:
  • model (str) – Model identifier.

  • messages (list[dict]) – List of message dicts with role and content.

  • **kwargs (Any) – Additional parameters forwarded to the endpoint.

Return type:

dict

messages(model, messages, max_tokens, *, system=None, temperature=None, top_p=None, top_k=None)

POST /v1/messages — Anthropic-compatible messages API.

Parameters:
  • model (str) – Model identifier.

  • messages (list[dict]) – List of message dicts with role and content.

  • max_tokens (int) – Maximum tokens to generate (required by the API).

  • system (str | None) – System prompt.

  • temperature (float | None) – Sampling temperature.

  • top_p (float | None) – Nucleus sampling (0–1).

  • top_k (int | None) – Top-k sampling limit.

Return type:

dict

transcribe(model, file_path, *, language=None, prompt=None, response_format='json')

POST /v1/audio/transcriptions — transcribe audio to text.

Parameters:
  • model (str) – Whisper model identifier.

  • file_path (str) – Path to the audio file.

  • language (str | None) – Source language code (e.g. "en"); auto-detected if omitted.

  • prompt (str | None) – Optional context hint passed to the model.

  • response_format (str) – One of "json", "verbose_json", "text", "srt", or "vtt" (default "json").

Return type:

dict | str

Returns:

Parsed dict for JSON formats, plain text string otherwise.

translate(model, file_path, *, prompt=None, response_format='json')

POST /v1/audio/translations — transcribe audio and translate to English.

Parameters:
  • model (str) – Whisper model identifier.

  • file_path (str) – Path to the audio file.

  • prompt (str | None) – Optional context hint passed to the model.

  • response_format (str) – One of "json", "verbose_json", "text", "srt", or "vtt" (default "json").

Return type:

dict | str

Returns:

Parsed dict for JSON formats, plain text string otherwise.

speech(model, input, *, voice=None, speed=None)

POST /v1/audio/speech — synthesize speech from text.

Parameters:
  • model (str) – TTS model identifier.

  • input (str) – Text to synthesize.

  • voice (str | None) – Voice identifier.

  • speed (float | None) – Playback speed multiplier (default 1.0).

Return type:

bytes

Returns:

Raw audio bytes.

generate_image(model, prompt, *, n=None, size=None, steps=None, seed=None)

POST /v1/images/generations — generate images from a text prompt.

Parameters:
  • model (str) – Image generation model identifier.

  • prompt (str) – Text description of the desired image.

  • n (int | None) – Number of images to generate.

  • size (str | None) – Output dimensions, e.g. "512x512".

  • steps (int | None) – Diffusion steps.

  • seed (int | None) – Random seed for reproducibility.

Return type:

dict

edit_image(model, image_path, prompt, *, mask_path=None, n=None, size=None)

POST /v1/images/edits — edit an image guided by a text prompt.

Parameters:
  • model (str) – Image model identifier.

  • image_path (str) – Path to the source image.

  • prompt (str) – Edit instruction.

  • mask_path (str | None) – Optional greyscale mask (white = region to edit).

  • n (int | None) – Number of variants to generate.

  • size (str | None) – Output dimensions, e.g. "512x512".

Return type:

dict

image_variation(model, image_path, *, n=None, size=None)

POST /v1/images/variations — generate variations of an existing image.

Parameters:
  • model (str) – Image model identifier.

  • image_path (str) – Path to the source image.

  • n (int | None) – Number of variations to generate.

  • size (str | None) – Output dimensions, e.g. "512x512".

Return type:

dict