UBIAI LLM Fine-tuning
  • Welcome to UbiAI
  • Getting Started with UbiAI
  • Create Your Dataset
    • Prompt-Response Datasets
    • Text-Based Datasets
    • Document-Based Datasets
    • Image-Based Datasets
    • Supported File Format Details
    • Setting Up Assisted Annotation
  • Fine-Tune Your Models
  • Playground
  • LLM Monitoring in UbiAI
  • Collaborations on UBIAI
  • Inter-Annotator Agreement (IAA)
Powered by GitBook
On this page
  • Getting Started with Text-Based Datasets
  • Defining Labels
  • Uploading Files
  • Upload and Pre-processing
  • Object Detection
  • Export or Expand Your Dataset
  • Using the UbiAI API for Imange-Based Datasets
  • Upload Files with API
  • Export Files with API
  1. Create Your Dataset

Image-Based Datasets

PreviousDocument-Based DatasetsNextSupported File Format Details

Last updated 5 months ago

In this dataset, you can assign classes to individual images for image classification purposes.

Getting Started with Text-Based Datasets

Creating an Image-Based dataset in UbiAI is straightforward, with clear steps to guide you through the entire process.

Defining Labels

You will first need to define labels for your dataset. This step is optional but highly recommended.

classification labels help categorize images into different classes. There are different types of classification you can choose from:

  • Binary Classification: Used when classifying images into two categories, such as "Positive" and "Negative." For example, classifying images as either "Cat" or "Not Cat."

  • Single Classification: Applied when each image fits into only one category (e.g., classifying images as "Beach," "Mountain," "Desert," etc.).

  • Multi-Classification: Used when an image can belong to multiple categories at once. For example, an image might be classified as both "Nature" and "Landscape."

Uploading Files

Once you've defined your labels, it's time to upload your files.

The supported file formats include:

  • PNG

  • JPEG

  • ZIP

Maximum File Size: The maximum size for each uploaded file is 500MB.

Upload and Pre-processing

Click on the "Upload" button and wait for the platform to process your files then start annotating. Since this is an Image-Based dataset, the main focus will be on classification, so there’s no need for entities or relation labels.

There is no pre-annotation for the Image-Based dataset type.

Object Detection

With object detection, you can draw a bounding box around an image to assign a label. This is useful to supplement your OCR annotation with non-textual entity such as a signature, figures and images.

To enable object detection simply check the box Object Detection, draw a bounding a box around the area of interest and assign a label from the labels list. Object export will be included in OCR JSON format.

Export or Expand Your Dataset

Once your dataset is validated and ready, you can choose to either add more documents or export your annotated dataset.

  • Add More Data: If you wish to expand your dataset, simply upload new documents and add them to your existing dataset.

  • Export the Dataset: To use your annotated dataset outside of UbiAI, click on Export. You can filter the data based on specific labels and select a split ratio.

Supported formats include:

  • Amazon Comprehend

  • JSON

  • Spacy

  • Image Classification format

  • Relations format

  • Stanford CoreNLP

  • IOB format

A zip file containing the annotation along with the documents used during annotation will be downloaded, you will need to unzip the file before using the annotation to train a model.

For MacOs users, it is recommended to unzip the file using Winzip in order to preserve file names.

Using the UbiAI API for Imange-Based Datasets

Upload Files with API

To upload files using the UbiAI API, you can use the following code:

If you would like to pre-annotate your files, check the "Auto annotate while uploading" box and select the method of pre-annotation as shown below:

import requests
import json

url ="https://api.ubiai.tools:8443/api_v1/upload"
my_token = ""
"""  types :  json, image, csv, zip, text_docs  """
file_type = "/image"

""" use this variable when uploading an csv file to select column """
column_number=1

list_of_file_path = ['']
urls = []
import mimetypes
import os
files = []
for file_path in list_of_file_path :
    files.append(('file',(os.path.basename(file_path ),open(file_path, 'rb'),mimetypes.guess_type(file_path)[0])))
data = {
    'train_type' : 'Normal',
    'autoAssignToCollab' :False,
    'taskType' :'TASK',
    'nbUsersPerDoc' :'',
    'selectedUsers' :'',
    'column_number': column_number,
    'filesUrls' : urls
}
response = requests.post(url+ my_token + file_type, files=files, data=data)
print(response.status_code)
res = json.loads(response.content.decode("utf-8"))
print(res)

Export Files with API

To export files using the UbiAI API, you can use this code:

import requests
import json
url ="https://api.ubiai.tools:8443/api_v1/download"
my_token = "/10d8df0c-ce93-11ef-9f1a-0242ac110009"
#('aws', 'Lists')
#('spacy', 'Json')
#('DocBin_NER', 'Json')
#('spacy_training', 'Json')
#('classification', 'Json')
#('ocr1', '') ('ocr2', '') ('ocr3', '')
#('stanford', '')
#('iob', '')
#('iob_pos', '')
#('iob_chatbot', '')
file_type = "/json"
split_ratio = ""
params = {'splitRatio': split_ratio}
response = requests.get(url+ my_token + file_type, params=params)
print(response.status_code)
res = response.content.decode("utf-8")
print(res)