Image-Based Datasets

In this dataset, you can assign classes to individual images for image classification purposes.

Getting Started with Text-Based Datasets

Creating an Image-Based dataset in UbiAI is straightforward, with clear steps to guide you through the entire process.

Defining Labels

You will first need to define labels for your dataset. This step is optional but highly recommended.

classification labels help categorize images into different classes. There are different types of classification you can choose from:

  • Binary Classification: Used when classifying images into two categories, such as "Positive" and "Negative." For example, classifying images as either "Cat" or "Not Cat."

  • Single Classification: Applied when each image fits into only one category (e.g., classifying images as "Beach," "Mountain," "Desert," etc.).

  • Multi-Classification: Used when an image can belong to multiple categories at once. For example, an image might be classified as both "Nature" and "Landscape."

Uploading Files

Once you've defined your labels, it's time to upload your files.

The supported file formats include:

  • PNG

  • JPEG

  • ZIP

Upload and Pre-processing

Click on the "Upload" button and wait for the platform to process your files then start annotating. Since this is an Image-Based dataset, the main focus will be on classification, so there’s no need for entities or relation labels.

Object Detection

With object detection, you can draw a bounding box around an image to assign a label. This is useful to supplement your OCR annotation with non-textual entity such as a signature, figures and images.

To enable object detection simply check the box Object Detection, draw a bounding a box around the area of interest and assign a label from the labels list. Object export will be included in OCR JSON format.

Export or Expand Your Dataset

Once your dataset is validated and ready, you can choose to either add more documents or export your annotated dataset.

  • Add More Data: If you wish to expand your dataset, simply upload new documents and add them to your existing dataset.

  • Export the Dataset: To use your annotated dataset outside of UbiAI, click on Export. You can filter the data based on specific labels and select a split ratio.

Supported formats include:

  • Amazon Comprehend

  • JSON

  • Spacy

  • Image Classification format

  • Relations format

  • Stanford CoreNLP

  • IOB format

A zip file containing the annotation along with the documents used during annotation will be downloaded, you will need to unzip the file before using the annotation to train a model.

Using the UbiAI API for Imange-Based Datasets

Upload Files with API

To upload files using the UbiAI API, you can use the following code:

If you would like to pre-annotate your files, check the "Auto annotate while uploading" box and select the method of pre-annotation as shown below:

import requests
import json

url ="https://api.ubiai.tools:8443/api_v1/upload"
my_token = ""
"""  types :  json, image, csv, zip, text_docs  """
file_type = "/image"

""" use this variable when uploading an csv file to select column """
column_number=1

list_of_file_path = ['']
urls = []
import mimetypes
import os
files = []
for file_path in list_of_file_path :
    files.append(('file',(os.path.basename(file_path ),open(file_path, 'rb'),mimetypes.guess_type(file_path)[0])))
data = {
    'train_type' : 'Normal',
    'autoAssignToCollab' :False,
    'taskType' :'TASK',
    'nbUsersPerDoc' :'',
    'selectedUsers' :'',
    'column_number': column_number,
    'filesUrls' : urls
}
response = requests.post(url+ my_token + file_type, files=files, data=data)
print(response.status_code)
res = json.loads(response.content.decode("utf-8"))
print(res)

Export Files with API

To export files using the UbiAI API, you can use this code:

import requests
import json
url ="https://api.ubiai.tools:8443/api_v1/download"
my_token = "/10d8df0c-ce93-11ef-9f1a-0242ac110009"
#('aws', 'Lists')
#('spacy', 'Json')
#('DocBin_NER', 'Json')
#('spacy_training', 'Json')
#('classification', 'Json')
#('ocr1', '') ('ocr2', '') ('ocr3', '')
#('stanford', '')
#('iob', '')
#('iob_pos', '')
#('iob_chatbot', '')
file_type = "/json"
split_ratio = ""
params = {'splitRatio': split_ratio}
response = requests.get(url+ my_token + file_type, params=params)
print(response.status_code)
res = response.content.decode("utf-8")
print(res)

Last updated