How To extract Text from a Video and image in pytesseract opencv in python

 How to extract text from image and video in OpenCV Python





Make sure you have install pytesseract before running this code

Note : this line of code define location of pytesseract app in my laptop so that we can use it while executing our command

pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'


Image as input 



First, we will see how to extract text from an image having English language

import cv2
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('scanned.png',cv2.COLOR_BGR2GRAY)
print(pytesseract.image_to_string(img))
cv2.imshow('Result',img)
cv2.waitKey(0)

Ouput : NEW DELHI: The Central Board of Secondary Education (CBSE) and Council for the
Indian School Certificate Examinations (CISCE) on Thursday submitted the
assessment system for class 12 students in the Supreme Court and said the results

will be declared by July 31.

Attorney general (AG) K K Venugopal placed the scheme for CBSE which said
performance of students in class 10, 11 and 12 examinations will be considered.


Extracting non-English text from image. In this case, I will extract Hindi text from an image

Before we start it you need to make sure you have installed the language which you want to extract in my case it is Hindi.Here you noticed something  (pytesseract.image_to_string(Image.open('erw.jpg'), lang='hin')) that i have type lang = 'hin' here lang means language and hin means hindi .But what about other language how you know what you need to type for your language for that just type this line print(pytesseract.get_languages(config='')) it will show you all the language you have installed make sure you have installed your desired language else this will not work

Image as input




import cv2
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('erw.jpg')
print(pytesseract.image_to_string(Image.open('erw.jpg'), lang='hin'))

cv2.imshow('Result',img)
cv2.waitKey(0)

Output : मकड़ियों की चौंकाने वाली तस्वीरें: ऑस्ट्रेलिया के बाढ़ग्रस्त
इलाके में मकड़ियों ने जाल की चादर डाली, पानी से बचने के
लिए दूर तक बड़ा और ट्रांसपेरेंट जाल बनाया



you can also extract a car no. plate details with this code easily here is example



Extracting text from a webcam :
#Warning if your system is weak you will see a laggy output from this

 
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
cap = cv2.VideoCapture(0)
while True:
    _,frame = cap.read()
    imgH ,imgW,_ = frame.shape
    x1,y1,w1,h1 = 0,0,imgH ,imgW
    imgchar = pytesseract.image_to_string(frame)
    imgboxes =  pytesseract.image_to_boxes(img)
    for boxes in imgboxes.splitlines():
        boxes = boxes.split(' ')
        x,y,w,h = int(boxes[1]),int(boxes[2]),int(boxes[3]),int(boxes[4])
        cv2.rectangle(frame,(x,imgH-y),(w,imgH-h),(0,0,255),3)
        cv2.putText(frame,imgchar,(x1 +int(w1/50),y1+int(h1/50)),cv2.FONT_HERSHEY_COMPLEX,0.7,(0,0,255),2) 
        
        cv2.imshow('text',frame)
        if cv2.waitKey(2) & 0xFF ==ord('q'):
            break
cap.release()
cv2.destroyAllWindows()


smaller version of this code :
import cv2
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('scanned.png',cv2.COLOR_BGR2GRAY)
print(pytesseract.image_to_string(img))
cv2.imshow('Result',img)
cv2.waitKey(0)

Post a Comment

0 Comments