Running open-source models
GLBNXT Platform makes a broad range of open-source language models available within your environment through a managed inference layer. Open-source models are served through Ollama, which handles model loading, serving runtime configuration, and endpoint management on your behalf. Your team connects to a stable model endpoint and works with open-source models in exactly the same way it works with any other model in the Model Hub, without any additional infrastructure or configuration required.
This section explains how open-source models are made available on GLBNXT Platform, how to connect to them, and the considerations that apply when choosing and working with open-source models in production environments.
How Open-Source Models Are Served
Open-source models on GLBNXT Platform are served through Ollama, a managed inference runtime that supports a wide range of models from the open-source ecosystem. When a model is added to your environment, GLBNXT pulls the model, configures the serving runtime, allocates the appropriate compute resources, and exposes a stable API endpoint through the Model Hub. Your team does not interact with Ollama directly. It is a platform-managed component that operates transparently behind the model endpoints your applications call.
All open-source models run entirely within your platform environment, within EU infrastructure. Model weights are loaded and served on compute resources dedicated to your environment and do not pass through any external system. Inference requests and responses stay within your sovereign boundary at all times.
Connecting to an Open-Source Model
Open-source models are accessed through the same endpoint structure as all other models in the Model Hub. To connect to a model, navigate to the Model Hub in your platform console and locate the model you want to use. The endpoint URL and authentication requirements are listed in the model details.
Endpoints for open-source models served through Ollama are compatible with the OpenAI API format, meaning that applications and libraries built against the OpenAI API can connect to GLBNXT-hosted open-source models with minimal configuration changes. Replace the base URL with the endpoint URL from your Model Hub and provide the appropriate authentication credential. The rest of your application code requires no modification.
Choosing the Right Open-Source Model
Open-source models vary significantly in capability, size, and performance characteristics. The right choice for a given use case depends on the nature of the task, the quality requirements of the output, and the latency and throughput constraints of the application.
As a general guide:
Larger models produce higher quality outputs, handle complex reasoning tasks more reliably, and generalise better across a wide range of inputs. They consume more compute resources and have higher inference latency than smaller models.
Smaller models respond faster, consume fewer compute resources, and are well suited for high-volume use cases where a simpler task is being performed at scale. They are less capable on complex reasoning tasks but often perform well on focused, well-defined tasks such as classification, extraction, or summarisation with a clear prompt.
Instruction-tuned models are fine-tuned to follow natural language instructions and are the most appropriate choice for conversational assistants, question answering, and task-directed applications. Base models without instruction tuning are more suited for specialised fine-tuning use cases.
If you are unsure which model is the best fit for your use case, your GLBNXT contact can advise based on your specific requirements and the models available in your environment.
Model Availability and Requests
The open-source models available in your environment are configured during onboarding. If your team requires access to a model that is not currently in your Model Hub, submit a request to your GLBNXT contact. GLBNXT reviews the request, validates the model, and deploys it into your environment before making the endpoint available in the Hub.
GLBNXT maintains responsibility for validating that models added to your environment are appropriate for production use, correctly configured, and operating on compute resources that meet the performance requirements of your workloads.
Model Updates and Versioning
Open-source models are updated regularly by their maintainers. New versions may offer improved quality, expanded capabilities, or better performance characteristics compared to the version currently deployed in your environment. Model updates are managed by GLBNXT. When a new version of a model in your Hub is available and appropriate for production, GLBNXT can update the deployment following a review and validation process.
Because a model update may affect the outputs your application produces, version changes are communicated in advance so your team can test against the new version before it is promoted to the production endpoint. If your application has strict consistency requirements, discuss model versioning and update policies with your GLBNXT contact during onboarding.
Prompt Engineering for Open-Source Models
Open-source models vary in how they respond to different prompting approaches. A prompt that works well with one model may produce different results with another, even for the same task. When building applications on GLBNXT Platform that use open-source models, it is worth investing time in prompt design and testing across your target model to understand how it responds to your specific inputs.
Key prompt engineering practices that apply across most open-source models include being explicit about the task and the expected output format, providing examples where the desired behaviour is specific or nuanced, using system prompts to set consistent context and behavioural boundaries for conversational applications, and testing with a representative set of real inputs rather than idealised examples.
For a broader introduction to effective prompting practices, refer to the Anthropic prompting documentation at docs.claude.ai.
Observability and Usage
All inference requests made to open-source model endpoints are logged by the platform and visible through the Monitoring and Observability area of the platform console. Request volumes, response latency, token consumption, and error rates are available for each model in the Hub. This data supports capacity planning, cost management, and compliance reporting for your environment.
For guidance on how model usage data integrates with the broader observability and audit capabilities of the platform, see the Observability and Monitoring section.
Last updated
Was this helpful?