Prompt-Response Datasets

The Prompt-Response dataset type in UbiAI is designed to streamline the creation and management of datasets for training text generation models, such as Large Language Models. This comprehensive page of the documentation outlines how to work with Prompt-Response datasets.

Getting Started with Prompt-Response Datasets

To create your Prompt-Response dataset, UbiAI provides you with two primary options:

Generate Synthetic Dataset Using UbiAI

By clicking on the "Generate Dataset" button, you can use UbiAI’s dataset generator to create a synthetic dataset from scratch. Here are the steps:

  • click on the "Generate Dataset" option when configuring your prompt-response dataset.

  • Add your "Dataset Details" such as Name, Language and description.

  • Upload existing variables or add new ones to be used in the prompt to generate text based on their variants.

  • Set the parameters like the model you would like to use and its Temperature, and write the prompts to generate your text.

  • generate your dataset.

This is a quick and efficient way to produce a dataset without manually collecting data, offering flexibility and scalability for various use cases.

Upload an Existing Dataset

If you already have a dataset, you can upload it in CSV format. Ensure that your CSV file contains the

  • System Prompt: The initial instruction or context provided by the system.

  • User Prompt: The input or query from the user.

  • Input: Additional context or data relevant to generating a response.

  • Response: The output generated by the model based on the provided prompts.

After uploading your CSV file, UbiAI will guide you through a column mapping process. This step ensures that the platform understands how to interpret your dataset.

During mapping:

  • Assign each column in your CSV file to one of the four required fields: System Prompt, User Prompt, Input, and Response.

  • Once mapping is complete, click on the "Finish" button to finalize the process.

  • The platform will then start processing your data, transforming each row in the CSV file into an individual document in your dataset.

To get started, you can download publicly available datasets in CSV format from platforms like Hugging Face.

Editing Your Dataset

Once your dataset is processed, you can review and refine it as needed. UbiAI offers robust tools for dataset refinement. After processing, each document is accessible for review and modification:

  • To edit a document, click the "Edit" button in the top-right corner. This allows you to make adjustments to any field, ensuring data accuracy and consistency.

  • If needed, access the integrated Response Generator to create or modify responses dynamically without having to rely on an external source.

UbiAI’s integrated Response Generator is a powerful tool relies on LLMs to suggest new responses based on your dataset. Here’s how you can use it:

Step 1: Select an LLM

In the left-hand parameters menu, choose the model you want to use for response generation. UbiAI supports a variety of pre-trained models, enabling you to select one that fits your requirements.

Step 2: Adjust the Temperature

The temperature setting controls the creativity and variability of the model’s output:

  • A low temperature (e.g., 0.2) results in more predictable and deterministic responses.

  • A high temperature (e.g., 0.8) generates more diverse and creative responses.

Adjust this setting based on your project’s needs.

Step 3: Generate New Responses

After selecting the model and setting the temperature, click the "Generate" button in the top-right corner. The model will generate a new response for the selected prompt, which you can review and edit further if needed.

Validating Your Dataset

Validation ensures dataset quality and readiness for training. UbiAI provides two methods for validating documents:

Individual Validation

Manually validate each document one at a time, reviewing the prompt-response pairs to ensure they meet your quality standards.

Bulk Validation

For larger datasets, you can validate multiple documents simultaneously:

  • Navigate to the Dataset Versions menu.

  • Click the drop-down menu next to the name feild and select "Select All".

  • Click "Validate" to approve all selected documents.

Adding to Your Dataset

You can continue to expand your dataset even after initial creation. UbiAI provides two options:

  • Upload Additional CSV Files: Simply upload new CSV files, map their columns as before, and merge them into your existing dataset.

  • Generate New Responses: Click on the "Generate New Response" button to use UbiAI to create additional documents. The new documents will be based on the data you have already provided, ensuring consistency and relevance.

Using Your Dataset

When your dataset is ready, you can either use it to finetune on the platform or export it for external use. Exporting your dataset is very simple:

  • Go to the Dataset Versions menu.

  • Click on the "Export" button.

  • The platform will generate a CSV file containing all validated data. This file is immediately downloadable and compatible with various tools and frameworks.

Using the UbiAI API for Prompt-Response Datasets

For easy integration with your applications, UbiAI offers an API that supports dataset management. With the API, you can programmatically upload datasets, retrieve validated data, and interact with projects. Below are examples of API usage:

Upload files with API

You can select File Type (only CSV files are available for the Prompt-Response Datasets) and Upload using this API code:

If you would like to pre-annotate your files, check the "Auto annotate while uploading" box and select the method of pre-annotation as shown below:

import requests
import json
import mimetypes
import os

url ="https://api.ubiai.tools:8443/api_v1/upload"
my_token = "Your_Token: you can find ths token simply by clicking on the API button in the dataset versions menu"
"""  types : csv  """
file_type = "/csv"

list_of_file_path = []
urls = []
files = []
for file_path in list_of_file_path :
    mime_type = mimetypes.guess_type(file_path)[0]
    if os.path.splitext(file_path)[1].lower() == '.csv':         
        mime_type = 'text/csv'
    files.append(('file',(os.path.basename(file_path ),open(file_path, 'rb'), mime_type)))

# Put the columns index the index (start from 1) in this order (user_prompt, system_prompt, response, input(Optional column))
text_generation_columns = []

data = {
  'filesUrls' : urls,
  'textGenerationColumns': text_generation_columns
}

response = requests.post(url+ my_token + file_type, files=files, data=data)
print(response.status_code)
res = json.loads(response.content.decode("utf-8"))
print(res)

Export files with API

Select Export Type (only CSV files are available for the Prompt-Response Datasets) and Export using this API code:

import requests
url ="https://api.ubiai.tools:8443/api_v1/download"
my_token = "Your_Token"

response = requests.get(url+ my_token)
print(response.status_code)
res = response.content.decode("utf-8")
print(res)

Last updated