About annotation in machine learning

Hello, this is RS.
I am engaged in annotation work, which is essential for AI technology.
Over the past few years, I feel that AI technology has shifted from being a "developing cutting-edge technology" to a "cutting-edge technology actively used in daily life." Many people have likely seen or heard about various AI-powered products and services, such as self-driving car technology and robot vacuums that automatically avoid obstacles. Some may also be using ChatGPT for work or personal use.
As AI technology continues to advance, I have noticed an increase in annotation service providers and more job postings for annotation-related work, indicating that annotation tasks themselves are becoming more widespread.
This time, I would like to introduce the world of annotation by explaining what annotation work involves and showcasing free annotation tools. At the end, I conducted a time attack using multiple annotation tools, and I will share my findings from that experience.
What is annotation? Up to this point, we've built up the anticipation, but...
So, what exactly is annotation? Annotation means "notes" or "comments." In AI machine learning, there are various methods, one of which is called "supervised learning." This involves training AI using data that has been tagged with metadata, such as text, audio, and images. The process of creating this training data is called annotation. Simply put, annotation is the task of creating correct data for supervised learning in machine learning.
For example, if you want AI to learn about "flowers," you prepare various images of flowers. You then tag each image with the label "flower" to help the AI recognize the correct answer. This process involves providing both the items you want the AI to recognize and the corresponding correct answers.
However, if you mistakenly tag an image with "nose" instead of "flower," the AI will learn that a flower image represents "nose." Therefore, a large amount of accurate "correct data" is needed. In annotation work, it is crucial to follow standards accurately and complete a large amount of data correctly.
Types of Annotation and Their Relationship with AI Machine Learning Before explaining the types of annotation, let's first discuss the types of data involved. In machine learning, the data subject to annotation varies depending on the content being taught. The main types include:
- Image and video data
- Audio data
- Text data
This time, we will explain the types of annotation for image data.
A. Image Classification
This is the task of tagging what is depicted in a single image.
B. Object Detection
The task of enclosing objects, etc., appearing in images or videos with rectangles and adding tags.
This allows obtaining information about the location of the object within the image.
C: Region Detection
The task of enclosing and tagging objects or areas within an image or video.
Instead of using rectangles, objects are enclosed in their actual shape, allowing for more precise identification of the subject.
D: Coordinate Detection
Mainly used for the human body, this process involves marking parts such as the face and body.
It is used for applications like facial recognition and posture estimation.
Challenges of Annotation Work: Machine Learning Requires a Large Amount of Correct Data
We've explored various types of annotation, but in supervised machine learning, a large amount of "correct data" is required. Therefore, improving work efficiency is crucial. In my experience with annotation work, I have mostly used specialized annotation tools. However, even with dedicated tools, if their usage is complicated, the efficiency may not improve significantly.
Additionally, there are many annotation tools available. Each tool has its own features, and it's important to consider aspects such as workflow, ease of use, and overall user experience.
One Indicator of Usability: Trying a Time Attack!
Since annotation work involves handling large amounts of data efficiently, speed is just as important as accuracy. To evaluate this, I will introduce the actual workflow and conduct a time attack under the same conditions to test how much different tools impact work speed.
Now, let's see what kind of results we get—let's start measuring!
Tools to Be Used
For this test, I selected three tools that have similar characteristics:
- Free to use
- Capable of annotating various types of data
- Supports progress management
- Web-based, allowing easy trial use
The tools are:
- Labelbox
- ANNOFAB
- CVAT
Annotation Target
In regular tasks, the goal is to create "correct data" based on the client's intentions. This involves communicating with the client through guidelines and inquiries to clarify the criteria before proceeding.
For this test, I prepared images of flowers. The task is object detection, identifying "flowers" and "buds" in 10 selected images and measuring the time required to complete the annotation work.
Annotation Work Process and Time Measurement Method
In regular tasks, annotation work involves preparation before the task and a review process afterward. During preparation, standards and data content are checked to prevent issues during execution. The review process ensures that the annotated data aligns with the standards. This is an essential step for annotation, where "accuracy is everything." By reviewing the data, either by oneself or by another person, the risk of introducing inconsistencies can be prevented. This time attack was conducted following the same procedure as regular work.
Time Measurement Method
Since different tools have varying workflows and settings, the time taken for each task was measured in three categories to make it easier to understand how long each step takes.
Preparation Time was measured from image upload to the completion of annotation preparation. This step involved uploading images, creating label names, assigning work to team members, and setting up the necessary configurations to start annotation work.
Execution Time was measured for the actual annotation work. This time, "object detection" was performed.
Review Time was measured from the start to the end of the review process after annotation work was completed. Once annotation is finished, checking for mistakes is mandatory. This is also an important step in actual tasks.
Results and Tool Usability
The tool with the fastest total time was CVAT.
Work Preparation Among the three tools, ANNOFAB took the most time due to having more configuration options compared to the other two.
Work Execution CVAT was the fastest to complete the annotation process. The simple and intuitive operation seemed to be the key factor. In ANNOFAB, there was one additional step required, and skipping it made annotation impossible. I often made the mistake of skipping this step, which resulted in a longer completion time.
Data Review In ANNOFAB, the above mistakes led to a sense of urgency, which in turn caused more errors compared to the other tools. As a result, reviewing the data took longer. Although the same data was being annotated, the accuracy fluctuated depending on the timing and human factors.
Now, I will share my experience with each tool individually.
Labelbox When uploading images, some vertical images were displayed horizontally. I checked whether this could be fixed within the tool, but I couldn't find a way to do so. Therefore, I left them as they were, but depending on the data and situation, it might be necessary to correct image orientation. It is a bit of a dilemma on how to handle such cases.
Other than that, the tool was intuitive to use. Shortcut keys were displayed in the menu when performing operations. Since it takes some time to get used to the workflow when starting for the first time, having shortcuts displayed frequently helps in learning them quickly.
ANNOFAB ANNOFAB offers various features for both progress management and annotation work. It allows detailed monitoring of work time and statistical analysis. While these features seem useful at first glance, they also make the settings more complex, making it harder to locate necessary functions. If mastered, the tool can be very powerful, but since I prefer simplicity, I found it challenging to use effectively.
CVAT Compared to the other two tools, the annotation preparation and annotation process felt simpler. This contributed to CVAT being the fastest tool in a time attack test.
Overall Impressions of the Three Tools If I were to summarize the impressions of the three tools in one sentence each:
- If you want detailed progress management and insights to improve work speed → ANNOFAB
- If you prefer simple operations and management for annotation work → CVAT
- If you want something in between ANNOFAB and CVAT → Labelbox
All three tools had various features designed to improve efficiency.
For example:
- When creating rectangles, all three tools displayed vertical and horizontal guide lines to help place bounding boxes accurately in one attempt.
- When copying and pasting a bounding box, Labelbox and CVAT slightly offset the copied rectangle to prevent multiple bounding boxes from overlapping on a single object.
- All three tools displayed shortcut keys in the menu for easier accessibility.
Some might think, "Is such a small feature really that important?" However, annotation work is fundamentally repetitive. In the case of object detection, for example, the process consists of "finding an object in an image → creating a bounding box (or copying an existing one) → adjusting the size to fit the object." Once familiar with the process, this can be done in just a few seconds, but since this is repeated throughout the day, even small usability improvements can eliminate a few seconds of wasted time and reduce minor frustrations, ultimately impacting overall progress.
Further Efficiency Improvements – Automatic Annotation To further improve efficiency, instead of manually annotating everything from scratch, tools offer "automatic annotation" features. In this method, the tool automatically annotates objects, and the user simply verifies and makes corrections if needed. The more accurate the automatic annotation, the less time is spent on corrections, allowing workers to process more images in a day.
All three tools mentioned in this comparison had an automatic annotation feature, but since some of them required a paid version, I did not test them. If you are interested, I encourage you to try them out.
In Conclusion
We have explained the annotation process—were you able to get a clear image of it? As we have seen, annotation is the process of adding tags or metadata to data that we want AI to learn. Depending on the necessary data, various types of annotation work are performed, and in the case of images, we introduced four types of annotation tasks. If this has sparked even a little interest in annotation for you, we would be delighted.
Additionally, there are various tools available for annotation work, so we conducted a time attack challenge to provide insight into their usability. In this test, we learned the minimum required operations and shortcut keys for object detection using three different tools and then attempted the time attack. As a result, we obtained the outcomes seen here, but the time required may change depending on one's familiarity with each tool. However, one new discovery was that different tools result in variations in work time. Even when performing the same task, small differences in steps can lead to differences in efficiency. We also reaffirmed that the ease of use of a tool significantly impacts progress.
We hope this test serves as a useful reference for you in selecting the annotation tool that best fits your needs!
Thank you for reading until the end.
This article was originally published in Japanese on Sqripts and has been translated with minor edits for clarity.