Prompt-Response Datasets
The Prompt-Response dataset type in UbiAI is designed to streamline the creation and management of datasets for training text generation models, such as Large Language Models. This comprehensive page of the documentation outlines how to work with Prompt-Response datasets.
Getting Started with Prompt-Response Datasets
To create your Prompt-Response dataset, UbiAI provides you with two primary options:
Generate Synthetic Dataset Using UbiAI
By clicking on the "Generate Dataset" button, you can use UbiAI’s dataset generator to create a synthetic dataset from scratch. Here are the steps:

click on the "Generate Dataset" option when configuring your prompt-response dataset.
Add your "Dataset Details" such as Name, Language and description.
Upload existing variables or add new ones to be used in the prompt to generate text based on their variants.
Set the parameters like the model you would like to use and its Temperature, and write the prompts to generate your text.
generate your dataset.
This is a quick and efficient way to produce a dataset without manually collecting data, offering flexibility and scalability for various use cases.
Upload an Existing Dataset
If you already have a dataset, you can upload it in CSV format. Ensure that your CSV file contains the

System Prompt: The initial instruction or context provided by the system.
User Prompt: The input or query from the user.
Input: Additional context or data relevant to generating a response.
Response: The output generated by the model based on the provided prompts.
After uploading your CSV file, UbiAI will guide you through a column mapping process. This step ensures that the platform understands how to interpret your dataset.
During mapping:
Assign each column in your CSV file to one of the four required fields: System Prompt, User Prompt, Input, and Response.
Once mapping is complete, click on the "Finish" button to finalize the process.
The platform will then start processing your data, transforming each row in the CSV file into an individual document in your dataset.
Editing Your Dataset
Once your dataset is processed, you can review and refine it as needed. UbiAI offers robust tools for dataset refinement. After processing, each document is accessible for review and modification:
To edit a document, click the "Edit" button in the top-right corner. This allows you to make adjustments to any field, ensuring data accuracy and consistency.
If needed, access the integrated Response Generator to create or modify responses dynamically without having to rely on an external source.
UbiAI’s integrated Response Generator is a powerful tool relies on LLMs to suggest new responses based on your dataset. Here’s how you can use it:

Step 1: Select an LLM
In the left-hand parameters menu, choose the model you want to use for response generation. UbiAI supports a variety of pre-trained models, enabling you to select one that fits your requirements.
Step 2: Adjust the Temperature
The temperature setting controls the creativity and variability of the model’s output:
A low temperature (e.g., 0.2) results in more predictable and deterministic responses.
A high temperature (e.g., 0.8) generates more diverse and creative responses.
Adjust this setting based on your project’s needs.
Step 3: Generate New Responses
After selecting the model and setting the temperature, click the "Generate" button in the top-right corner. The model will generate a new response for the selected prompt, which you can review and edit further if needed.
Validating Your Dataset
Validation ensures dataset quality and readiness for training. UbiAI provides two methods for validating documents:

Individual Validation
Manually validate each document one at a time, reviewing the prompt-response pairs to ensure they meet your quality standards.
Bulk Validation
For larger datasets, you can validate multiple documents simultaneously:
Navigate to the Dataset Versions menu.
Click the drop-down menu next to the name feild and select "Select All".
Click "Validate" to approve all selected documents.
Adding to Your Dataset
You can continue to expand your dataset even after initial creation. UbiAI provides two options:

Upload Additional CSV Files: Simply upload new CSV files, map their columns as before, and merge them into your existing dataset.
Generate New Responses: Click on the "Generate New Response" button to use UbiAI to create additional documents. The new documents will be based on the data you have already provided, ensuring consistency and relevance.
Using Your Dataset
When your dataset is ready, you can either use it to finetune on the platform or export it for external use. Exporting your dataset is very simple:

Go to the Dataset Versions menu.
Click on the "Export" button.
The platform will generate a CSV file containing all validated data. This file is immediately downloadable and compatible with various tools and frameworks.
Using the UbiAI API for Prompt-Response Datasets
For easy integration with your applications, UbiAI offers an API that supports dataset management. With the API, you can programmatically upload datasets, retrieve validated data, and interact with projects. Below are examples of API usage:

Upload files with API
You can select File Type (only CSV files are available for the Prompt-Response Datasets) and Upload using this API code:
import requests
import json
import mimetypes
import os
url ="https://api.ubiai.tools:8443/api_v1/upload"
my_token = "Your_Token: you can find ths token simply by clicking on the API button in the dataset versions menu"
""" types : csv """
file_type = "/csv"
list_of_file_path = []
urls = []
files = []
for file_path in list_of_file_path :
mime_type = mimetypes.guess_type(file_path)[0]
if os.path.splitext(file_path)[1].lower() == '.csv':
mime_type = 'text/csv'
files.append(('file',(os.path.basename(file_path ),open(file_path, 'rb'), mime_type)))
# Put the columns index the index (start from 1) in this order (user_prompt, system_prompt, response, input(Optional column))
text_generation_columns = []
data = {
'filesUrls' : urls,
'textGenerationColumns': text_generation_columns
}
response = requests.post(url+ my_token + file_type, files=files, data=data)
print(response.status_code)
res = json.loads(response.content.decode("utf-8"))
print(res)
Export files with API
Select Export Type (only CSV files are available for the Prompt-Response Datasets) and Export using this API code:
import requests
url ="https://api.ubiai.tools:8443/api_v1/download"
my_token = "Your_Token"
response = requests.get(url+ my_token)
print(response.status_code)
res = response.content.decode("utf-8")
print(res)
Last updated