How to Training EasyOCR Custom Dataset

Ali Mustofa
4 min readJan 24, 2023
https://github.com/JaidedAI/EasyOCR

The EasyOCR package is created and maintained by Jaided AI, a company that specializes in Optical Character Recognition services. EasyOCR is implemented using Python and the PyTorch library. If you have a CUDA-capable GPU, the underlying PyTorch deep learning library can speed up your text detection and OCR speed tremendously. As of this writing, EasyOCR can OCR text in 80+ languages, including English, German, Hindi, Russian, and more! The EasyOCR maintainers plan to add additional languages in the future.

I just share a tutorial to train with our dataset step by step.

  1. Prepare dataset

the dataset used is custom data, so you have to do labelling, I have an EasyOCR label for that. EasyOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, with built-in EasyOCR model to automatically detect and re-recognize data. It is written in Python3 and PyQT5, supporting rectangular box, table, irregular text and key information annotation modes. Annotations can be directly used for the training of EasyOCR detection and recognition models.

to install

# clone repository
git clone https://github.com/Alimustoofaa/EasyOCRLabel.git
# move to directory
cd EasyOCRLabel
# install requirement
pip3 install -r requirements.txt
# run
python3 EasyOCRLabel.py

--

--