LLM Monitoring in UbiAI
Last updated
Last updated
Monitoring is a critical step in the lifecycle of a fine-tuned LLM. Even after successful fine-tuning, a model’s performance can drift over time due to changes in data distributions, new user inputs, or evolving requirements. To ensure that your model remains effective, UbiAI offers a comprehensive LLM monitoring feature that logs and evaluates every interaction.
The UbiAI Monitoring Feature is a comprehensive tracking and evaluation system that allows users to monitor, assess, and improve their fine-tuned LLMs. Every time you use a locally fine-tuned model in the UbiAI playground or via API requests, the monitoring feature automatically logs each interaction to enable users to supervise and bring changes to their models efficiently.
To access the monitoring section:
Navigate to the Model Library in UbiAI.
Select any fine-tuned model you have recently trained.
Click on More Details to open the model details page.
Locate and click on the Monitoring tab at the top left of the page.
The monitoring section logs each interaction with your model and presents it in a structured table format. Each row in the table represents a single request-response interaction and contains the following details:
Model Version: Indicates which version of the model was used for the request.
Input Data : The text input that was sent to the model.
Output Data: The generated response from the model.
Evaluation: An automated system-generated assessment that determines whether the model’s response was correct or incorrect (you can also decide to add your own keys such as factual or not factual etc).
User Rating: A manual rating where you can mark the response as Pending, Accurate, or Inaccurate.
Start Time: The timestamp of when the request was processed.
Latency: The time it took for the model to generate a response.
Clicking on any row expands the details, allowing you to inspect the full input and output data. You can also navigate to the Evaluation tab to see an automated assessment of the model’s response.
As the number of logged interactions grows, finding specific requests can become challenging. To streamline this process, UbiAI provides search and sorting functionalities:
Use the search bar at the top to find specific queries based on keywords or phrases.
Sort rows based on various attributes to analyze patterns and trends in model performance.
The UbiAI LLM Evaluation feature automates the evaluation of your fine-tuned models using an LLM as a judge. This system reviews each request and its corresponding output, providing a structured way to monitor performance.
To enable automatic evaluations, click on the LLM Eval Settings button to open the configuration panel. Here, you can configure two key components:
LLM Parameters: Here you can define the LLM that will act as the evaluator.
Evaluation Parameters: Here you can customize how the evaluation process works.
Within the Evaluation Parameters, you can:
Define the Evaluation Template, the template will act as the prompt for the judging system.
Set Output Rails, providing stricter control over acceptable responses.
Enable or disable Explanations, allowing the evaluator to justify its decisions.
Activate Function Calling, if needed for more advanced evaluations.
Once these settings are applied, UbiAI will analyze and rate model outputs based on your configured parameters.
Once you have collected evaluation data, the next step is to use it to refine and improve your model.
Review the Evaluations: Look through the monitoring table and assess the evaluation accuracy.
Select Key Data: Identify rows where the model’s responses were inaccurate or accurate .
Append Data to a Dataset: Use the Append button to add these interactions to any dataset.
Retrain Your Model: Incorporate the updated dataset into your next fine-tuning process.
Even if a model initially performs well, data drift can cause accuracy to degrade over time. User behavior evolves, new data patterns emerge, and expectations shift. By continuously monitoring, evaluating, and retraining based on logged interactions, you can ensure that your fine-tuned LLM remains accurate, reliable, and optimized.