Image to Text - OCR - Tesseract - Linux - Tutorial

Image to Text - OCR - Tesseract - Linux - Tutorial


http://filmsbykris.com
For help: http://filmsbykris.com/irc
FaceBook: https://www.facebook.com/pages/Films-By-Kris/225113590836253
Closed Caption:

today we're gonna be looking at the
command line and we're gonna be looking
at OCR applications OCR I believe stands
for optical character recognition
basically this allows you to take images
that have text in them and convert them
to actual text you can highlight copy
and search through
it's not perfect you know solution
because so many different fonts out
there and text can be so artistic in
some ways with different fonts but if
you have a very simple text you can get
a halfway decent output with some of
these programs are going to get through
them today I look at ways of getting
better results at them and then pick one
that I think is best and from i read
people seem like the best
all three are open source and all three
are in the repositories for probably all
debian-based distros right now i'm
working in linux mint an older version
I'm camera on 10 or 11 right now some a
little bit behind on updating my desktop
but I've already got them installed just
use your package management manager to
install them if you have not already
installed them first of all look at is G
OCR and that believe there is also a
graphical interface for this one
let's have a quick look at the man file
for this just glance through real quick
this one right here it says that file
should be in a pnm format but it also
says depending on your system that will
actually get it works with $TIME p.m.
PGM PBM ppf RPM and PCX performance i'm
not too familiar with but luckily with
imagemagick which is installed on most
linux distros by default if not it's
definitely in the repositories for
almost all armed you can convert to
those formats but also says this program
if you have the right dependencies
installed you can also work with more
common formats such as gift jpeg tiff
and--but Matt PNG so forth so on and
just looking at this quick little
Samantha synopsis here we got the name
of the program and the options you want
cash
I for input and then the input file
let's go ahead and give that trial quick
first off I'm gonna show you the image i
have it's a not very high resolution
actually rather low will advertise when
I got off facebook for free sundays over
at chick-fil-a which are delicious
so let's see what we can get out of this
image here it's called jpg one in this
case let's go g OCR and will say dash I
for input and the image and there we go
we got not too good of results
ok not great completely unreadable but
in minute we'll look at things after
look at the other two at increasing our
output accuracy next one look as OCR ad
i'm not sure how you pronounce that
oh crap maybe or maybe you just realize
i don't know anyway let's look at the
man file for that basically we got new
programming options and the filename
looking through here it says it only
reads p.m. files so let's go ahead and
use convert which is part of the image
magic package so if you already have
installed just installed imagemagick but
you probably already have it installed
will say the original file and then
we'll say the output file will be one .
p.m. i think we said it was and am I
think and if we display that out you'll
see that the image pretty much looks the
same
this is now in a format that this
program can read so we'll just do that
and we'll say one not PM it enter and we
still get rather poor results
next one is called test Iraq and it's
test Iraq dash OCR i think in the
repositories once you install is just
test Iraq let's look at the man file for
that this one-armed says it works with
tiff files which is more common formats
but probably not as common as PNG or jpg
so we're still probably converted most
time and as it says here and also look
at this little page on Google codes here
is that it was a proprietary program
back in the mid-nineties and rated as in
the top three engines back in 95 for
doing this converting images to text
since then I it kinda its development
kind of slowed down when open source and
kinda became live again that happens
quite often look at blender blender was
a proprietary program that was dead i
forget what was originally called i
believe was owned by neo geo one open
source and now it's growing like crazy
so let's see what results we can get
from this so first also on on this page
here i just want to mention arm
somewhere somewhere
let's go
that right here it says it can read a
wide right of image formats and convert
them to text the version i have I
believe only reads too if that's what
the man file says and i think i've tried
other ones that didn't work once again
not a problem we'll just convert our
original jpg image to a tip form at once
again we could display that out you'll
see that it looks pretty much the same
as it did before but we'll use test
track and I think it's a look at the man
files i get this wrong but I think it's
the input file and i'm not sure if
there's a way to put the standard output
but i'll just put output as an output to
enter here and now let's cat out they
put into this file and we got em okay we
got some readable words here free bum
sunday when you say 61 semi from you and
phrase so we got a little bit better
results with this already but still not
really usable so what can we do about
this i found that if you simply which
this doesn't make too much sense me you
increase the resolution of the image you
get better results even though you're
working you just taking a lower-quality
increasing the resolution
I don't see why that makes difference
but it seems to and of course we can use
convert for this too
let's go convert and we'll give it the
original file of j 1 jpg and will say
resize that's dash resize and we'll just
give it 1000 start off so we're just
making increasing its size which
amounted to ensure with the sizes but
not 100 probably like two or three
hundred as far as the width and height
and and giving it just one number will
increase the size without distorting the
Act aspect ratio of it and we'll just
give it an output of
one . to automatically overwrite the
last one which means we can run the same
output again so test track one . if
output and we can cat out our output
file and you can see a we're getting
some better results so here it was at
the original resolution somewhat gettin
some words there and here it is at the
higher resolution free brownie sundae
when you say the still get a little
something
no secret phrase think it's a new line
there you do get some excess characters
here and these are from non-text things
in the image secret phase phrase and
then also here we almost have top all
sundays they put in 0 there and 70 and
then we kind of lose it here a little
bit so let's let's try it again let some
increase the resolution 5,000 little bit
longer to convert their well now run the
output again and we will cat it out
good get it get taco we got top correct
here top all sunday start from the
brownie up one person from three to
seven pm
so what's a display our original image
again and lo and behold
besides the extra little things in here
that are because it's trying to read
things that aren't text in here we got
pretty decent results there you can now
script through this remove anything that
aren't actual words are you can also
crop the excess stuff out the image but
depends what you doing at me five whole
purpose of this point of this
so now that we've increased the
resolution let's try that with the other
applications and see if we get as good
results as test iraq so what I'll do
here is i'll say convert 1.jpg
dash resize 5,000 and i'll make a 2 dot
jpg and then let's go to GOC are ok run
same command sword is RI CJ g OCR and
dash
I and we'll give it a new jpg image ya
see still not as greater results okay so
let's do this again will convert the
original and this time we'll put two .
p.m. and we will pose
see our baby and this time we'll give it
the new higher resolution see how that
outputs slightly better results than we
originally with that program but as you
can see so far tests Iraq seems to be
the best as far as accuracy by far so I
recommend checking this out
it is open source i want to say it's
under the apache license no GPL oh no no
wait no that's a different program is
mentioning their yes it's under the
apache to license its open-source not
GPL but still good thing
check it out and as I showed you
increasing the resolution using convert
which your private converting if it only
takes tips which it seems to gives you
great results so if you have any other
open-source OCR programs x I think
there's a few other but those are the
top three I came across i'll go ahead
and post your response in the comments
also you know if you have any other
ideas of using imagemagick or something
from command line that you can automate
this and get better results that be
great but right now I think that's the
way I'm doing it I'm getting all the
words just with some excess as far as
the rest of the image so definitely
could take a whole library full of
images and do a word search through them
so i hope you found this tutorial useful
if you're looking for something like
this i recommend you give all three
programs look at but for me test track
and from what i've read is usually the
best out of the three
I think you for watching please visit my
website its films by Chris . com that's
chris mckay should be a link in the
description there you can find videos
scripts and programs music by me
also some funny images they're under
photos also if you have any comments
feel free to comment below if you have
any questions I don't put them in the
comments below is or at least I won't be
answering maybe someone else will answer
you go two films like Chris calm /i RC
orphans
it's almost like risk on our main page
there's a Help button that will take you
to the IRC channel there's a great place
to ask me and other people questions
don't come in there ask question expect
an answer right away sometimes takes a
little bit for one of us to answer so
don't come in and exit come and stay
around while ask some questions and also
if you like my videos if you enjoy them
and you like my site and everything I
have on there I help support the site
there is a donate button on my website
thank you for watching and I hope that
you have a great day

Video Length: 13:26
Uploaded By: Kris Occhipinti
View Count: 44,582

Related Software Products
Image for Linux
Image for Linux

Published By:
TeraByte, Inc.

Description:
Image for Linux is a reliable and easy to use drive image backup software package. Completely backup, restore, or copy your operating systems, including Windows or Linux. It supports any partition type, including FAT, FAT32, NTFS, EXT2/3, Reiser, and XFS. You can even easily create a self-booting disk to use for backups or recovery.


Related Videos
Beaglebone: Video Capture and Image Processing  on Embedded Linux using OpenCV
Beaglebone: Video Capture and Image Processing on Embedded Linux using OpenCV

In this video I look at how you can get started with video capture and image processing on the Beaglebone. It is an introductory video that should give people who are new to this topic a starting point to work from. I look at three different distinct challenges: - How do you capture video from a USB webcam under Linux - How do you capture image frames from a USB webcam under Linux - How do you use OpenCV to capture and image process frames so that you can build computer vision ...
Video Length: 31:40
Uploaded By: Derek Molloy
View Count: 186,099

Install Ubuntu Linux on VirtualBox in Windows 7
Install Ubuntu Linux on VirtualBox in Windows 7

In this video, I will show you how to install Ubuntu Linux on VirtualBox in Windows 7. You download either Ubuntu Linux 13.04 or 12.04. Make sure you have a newer computer if you are downloading the 64-bit version. Required files for this project: - Ubuntu Linux ISO image - VirtualBox for Windows 7 hr / bClosed Caption:/b hello and welcome to sake tech in today's video I will show you guys how to install ubuntu linux on a ...
Video Length: 14:41
Uploaded By: sakitech
View Count: 171,098

Using KDEnlive for Linux AND Windows!
Using KDEnlive for Linux AND Windows!

Check out my new Linux channel: http://bit.ly/linux-nixie Why buy Sony Vegas when it's open source counterpart KDEnlive will do the trick? http://www.twitter.com/nixiepixel Download your IMG file here: http://www.kdenlive.org/user-manual/downloading-and-installing-kdenlive/live-demonstration-dvd-or-usb-storage Get Image Writer for Windows here: https://launchpad.net/win32-image-writer Need more help? Check out my website for more instructions! Llama picture courtesy of ...
Video Length: 03:23
Uploaded By: Geekbuzz
View Count: 87,234

Grokking Linux - A very brief intro to some terms for Linux Newbies
Grokking Linux - A very brief intro to some terms for Linux Newbies

We've been working on some Linux for Beginners videos lately, and have had some questions about the terms. This video is meant for newbies, and will hopefully give you an idea of what some of the words mean when working with a Linux-based operating system on a desktop or laptop computer. It will also give you a very basic overview of how your system is organized, and you can learn what to search for when you want to customize your system. Be sure ...
Video Length: 19:11
Uploaded By: Tek Linux
View Count: 83,221

Backing Up / Imaging your Windows / Linux Hard Drive using CloneZilla
Backing Up / Imaging your Windows / Linux Hard Drive using CloneZilla

Backing Up / Imaging your Windows/Linux Hard Drive using CloneZilla For imaging your MAC hard disk please see the following video: http://www.youtube.com/watch?v=vPCBjn-Mq54 hr / bClosed Caption:/b in this video we're going to take a look at how to make a backup image of your entire drive in a windows or linux based system in order to do this we need three things the first thing is obviously you need your system so i'm ...
Video Length: 14:53
Uploaded By: iftibashir
View Count: 53,793

Darktable Review and Tutorial - Linux RAW Photo Editor
Darktable Review and Tutorial - Linux RAW Photo Editor

Darktable is a RAW photo editor that I recently discovered. This tutorial shows you the basics for how to use the program. hr / bClosed Caption:/b font color="#CCCCCC"alright i'm/fontfont color="#E5E5E5" going to do a tutorial for/font thefont color="#E5E5E5" program/fontfont color="#CCCCCC" darkroom this is a raw/fontfont color="#E5E5E5" photo/font editing program in linuxfont color="#E5E5E5" that/fontfont color="#CCCCCC" ...
Video Length: 07:31
Uploaded By: TutoriaIGeek
View Count: 48,148

Raspberry Pi Tutorial - Putting an image onto the SD card (Mac/Linux)
Raspberry Pi Tutorial - Putting an image onto the SD card (Mac/Linux)

Tutorial showing you how to put on image onto the SD card on Mac and Linux Helpful Links: http://elinux.org/RPi_Easy_SD_Card_Setup http://www.raspberrypi.org/downloads hr / bClosed Caption:/b hi this tutorial will show you how to put an image onto a SD card to use the raspberry pi this tutorial will be carried on the mac but it will also work on linux as well now the first thing we want to do is tobr ...
Video Length: 06:13
Uploaded By: BubblegumBalloon
View Count: 30,532

Howto: Burn an ISO Image to a Disk (CD/DVD) - on Linux-Based Operating Systems!
Howto: Burn an ISO Image to a Disk (CD/DVD) - on Linux-Based Operating Systems!

In this Tutorial/Howto, I show you how to burn an ISO Image to Disk (CD/DVD), using a utility that is available (and even comes pre-installed with a lot of Linux-Based Operating Systems) for Linux-Based Operating Systems, this program is known as 'Brasero'. It is a Disk Burner. I demonstrate how to use Brasero using the ISO Image of Linux Mint 11 (DVD) - Gnome 64-bit Edition. hr / bClosed Caption:/b hello and that i'm going to be showing you how to ...
Video Length: 05:03
Uploaded By: TheGeekSquadron
View Count: 30,167

Red Hat Linux 6 Installation in Virtual Box
Red Hat Linux 6 Installation in Virtual Box

Required Software: 1) If virtual Box is not available. Need to download from Oracle here https://www.virtualbox.org/wiki/Downloads 2) If RHEL6 64bit Version is not available. Need to download that from here Can be searched in Google and get this downloaded. Preferably any iso images as Virtual Box can use .iso files as input files to create OS. RHEL 6 on Virtual Box OS : Linux Version : RedHat 64bit Memory : 1GB ...
Video Length: 03:14
Uploaded By: Kiran Kumar
View Count: 24,951

Copyright © 2025, Ivertech. All rights reserved.