Complex PDF Table Data Extraction and XML Export on ChronoScan

Complex PDF Table Data Extraction and XML Export on ChronoScan


On thi video we will learn how to extract data from a complex table on a PDF file using several grids with custom fields and triggers. And how to export the captured data into the xml format.

http://chronoscan.org/
https://twitter.com/chronoscan
https://www.linkedin.com/company/chronoscan-capture
https://www.facebook.com/Chronoscan

Chronoscan@chronoscan.org
Support@chronoscan.org
Closed Caption:

hello everyone can go here and today
we're going to learn how to extract the
data from that kind of complex media
team first thing we can see here is that
we have several records here so that's
one record and it starts with that kind
of tax appear and then that kind of tax
and then here you have another one and
to make everything a little bit more
complex it ends on the order page and
then we have tutored record here and it
also ends on the order page so we're
going to learn how to extract that they
are each record in each of those tables
we're going to the divided into tree
table so we can extract data correctly
and we are also going to to learn how to
export all the data into XML first thing
is open and we have to create a new
badge and a new job for the patch as
name it
complex their ego and click Next
we don't need a teletype we don't need
barcode reading so we just this elected
tell it again if you have markets
elected justice selected and then click
Next
we have a motive image documents and
we're going to split them manually so we
select those children then click Next
in here on the fields we are not going
to using force fields let's just use
generic fields will delete all of those
and that's lived only that one if we
want try to delete it
it will give an error let's match to
those fields appear those are the
constant data and we need you need them
that's how you do it let's just match
the same kind of layout here and since
we are exporting to XML we want to
replace all the spaces with underscore
so we just have here to underscore tidy
where you just don't use special spaces
driver named
we can make sure that instead of type
alphanumeric we have killed type date
that will reduce the possibility of
errors and we have origin and
destination already and that's creating
a new one now we're ready to click on
Next that won't matter because we want
XML export so it just click Next
let's make sure we have several images
manners plate correct no barcode because
we don't need barcode their ego our
sales are all corrects announce lets
just click Finish let's name our first
batch complex baylor won let's create
the bench now we are ready to import of
files the first thing we have to know
here is we don't want to import all of
our fires we just want to import a
simple one or two of them so we just
drag and drop from Windows Explorer
click yes and here we have to make sure
we have covered pages to image
and the resolution around 200 300 or 400
that's the optimal resolution and we
have to make sure we have extract text
from pdf selected and then click OK and
that's not forget that should be only
for the first one or two documents after
whole batch
because we just want to set things
before we start importing a reading of
the other documents the first thing is
easy thing to do is to capture those
fields up here all you gotta do is we
have to make sure we have all CIS owns
mode selected here and then we select
the field on the right here and we
create the area that is showing up their
ego and driving them since names can be
very lengthy you might just want to
create a little bigger of an area you
and for all other names dates same thing
you might want to make it a little
bigger than the actual date here because
you might have two digit digits for the
month and now or in the nation's same
thing you might want to be generous with
the area you create for those two year
ago and you can see everything was right
now the other thing we have to do is in
order to reach although records
separately we have to actually split
them and currents can can split them
based on text and it will create several
documents with only the parts of the
fires of the text that you need to do
that we have to go up here and click on
the OCR triggers but here you can see
new OCI trigger we're going to use that
CEO to dot thanks to tell Chronos can
every time a new record appear so every
time it read CEO
it will get that part of the document
before or after the CEO and another one
appears to isolate that's doing it since
we just want to see your let's just
create an area here if we look at the
other documents the layout is similar so
a call or not it doesn't need to be
large here because the tax always
appears on this same place on the same
line here that we know we will appear
just click OK
property spending here that just opened
we name it
rapport late we wanted to show on all
pages and we wanted to run on all types
down here on the action we want to
change the action to report split text
we have to make sure we have the use
constants reports plates key s true
because that's the only kind of tax we
are going to use to split our reports
now on text exists on trigger conditions
here you have to make sure that on text
access to an issue click it you'll see a
drop down menu you click it and we have
this year so every time that colonel
scan reads that CEO as you can see here
already read that one it will split the
document to see that we go to the scan
input
appear and then we select all documents
and then click on the process selection
here we have to make sure we have all
see our triggers selected so we just
make sure it's selected and then click
OK
now you're going to see that we have
instead of one document we have several
times you can see
the first one and you can see those
grades fields here you should take a
look let's look at this documentary
instance this page of this document and
click on the View text here you can see
that the program is ignoring everything
from the other report so we know that it
was correctly split and if we go back to
the previous page on that later error
here you can see that it is ignoring all
of that part of the document it it is
only reading that part of the dog now we
know we're good to go
the first thing is we go to the first
page to set things further now the first
thing we're going to do is to make sure
the first page and we don't want the tax
showing up here we just click again all
the objects here and now it's much
easier to work we're going to set up
three grades we have three great seller
so we go up here and the data panels but
that's just activate the first three of
them closed now against you can see we
have a great control here so we have
three great controls one for each of the
great you're going to use the first
information we went to expect is that
information here and in order to even
start creating are agreed we have to set
up to reference lines so on the OCR
intruders function we want to create a
little bird calling here in another one
down here to tell printers can wear
table start where it ends and we have to
make it four page so it come up here
let's make sure it is on that line here
and we just want we can create the tree
outside of the document let's create a
box here and we don't want to set up
anything here let's just click OK let's
call it they will start we wanted to
show on all pages and also to run on all
types that might not be your case but
here we wanted to run on all types and
down here on the action we want to
change it to create horizontal help her
first line you know you can see the
green jade has a line here
green-line here and we want to do the
same thing down here and that's just
create a box around here should be good
click OK and now that's rename it they
are ago we wanted to show on all ages
and to run on all types year ago and
down here again we change it
the action to create horizontal
reference like our reference line here
and now we are finally ready to start
setting up our agrees the first fame is
go to the more you spend on down here we
make sure we have the first great
control type selected here and let's
click on the options but here we want to
make sure our greed is if we look closer
to that goal of our table we can see we
have the type of the center we have the
name of the center on the second line
and they are the lines I just the
address of the same so we need three
details lines here to recapture fields
here so we are only going to be using
one because that's only a single column
what we have to do is click on Delete or
click Yes thats create
fields we want because that's the
information we want to extract let's
call it underscored
and this has to be a custom field
because we want to get only the first
like so it click Here own custom and you
can see all those options show up here
and we are so we don't want to remove
carriage return because otherwise the
program we read everything as a single
line and we don't want that on any of
the fields we go down here and on the
function we want to cut first line of
work are you in the column we're going
to be using is the last one let's just
because they're still here we don't need
sender type that's just said earlier
point we have the less column said we
can use the other ones but you're going
to use use this address because that
will be the the column that you read
everything again here we have
underscored it will also be custom and
we don't want to remove carriage return
here we need to the first line again so
function we will also cut the first line
you can see that the first one was
already cut now it is cutting the second
one and the third one that had one more
in let's call it center
other s we're going to leave it on our
numeric and we don't want to remove
carriage return once again otherwise the
whole information extraction and now we
can just set the parameter here for the
first two custom columns we just click
on the drop down menu and then we use
now we can see the sender address yet
that's you know that's just click OK and
then options again so it reloads
only parameter is down here there we go
now we have senator address we have we
wanted for the first and the second one
here and there functionary have
parameters let's just use the community
and their address now we are ready to
create our agreed here let's click on
Edit Great Creator agreed here we just
need that part so we just go from here
until the last line to the table and
reference like and we just want to match
the same column size and you can see it
can read that those kinds of tax but he
doesn't reach that that's one because
it's not part of the document so the
problem just ignores it we have to make
sure things are ok we want that
information here so we want to push that
header size little bit ago we want to
right click it and make it a master
click on the split rows based on this
column so now it's green so it's good
it's already set to send their address
so it's also good we right clicking in
we want to select effect carried to
return the tax
detects so you can see it's looking good
already you also want to split over now
it's looking even better and we want to
to to make sure that it's reading all
that area so we have to react left or
right click here and then they value
must meet a regular expression and we
want a program to every time it reads
two characters into points it splits so
it's a cell every time that happens we
click on use for line and let's make
sure that happens we have square
brackets A to Z so we have kept talking
letters and curly brackets that will
happen two times and we close the
curtain brackets and then we had two
dots here that there was a problem that
every time it treats to capital letters
into dots it will split now we know that
everything will work
the other thing we have to do is to go
up here and they create control button
we can last click right click it doesn't
matter and we want to use it on all
pages click on it and the upper limit we
want to be on the table start in the
lower limit we wanted to be on table end
and one other thing since you can see
here it starts on the bottom of the page
so we also want to go and a great many
to make sure the open position is
relative to the reports place for now
you can see we have they trigger here
now let's read it to make sure we have
everything right and you can see
everything is perfect we have the sender
time we have the sender name and we have
to send address
if we go to the next next document
actually we read you can see there are
some information that's because when the
next page you can see there is a problem
that that sign here shows there is a
problem with the trigger its not
triggering so the program doesn't know
where the table and what we have to do
here is click again on the Edit grid
button here in the right thats click on
the grid menu again in the position
relative to trigger you can see here it
doesn't it is not relative to not think
so it just click on background now if
you read now you can see everything was
perfectly read we have the name and we
have to address so we have to make sure
all those attention signs are attended
to and we have to make sure it's always
a relative to some kind of tree now that
we have our first grade set up and ready
to go we're going to go to the second
table second grade going to start
working on the second greet we have to
go down here to the grid control tied to
the second grade control and we're going
to use and that's part of the table you
have to see there is a lot of
information but it's not hard to make it
work
the first thing you can do is go to the
Options button here and make sure you
are on the great control to type here go
to the options and let's just do it off
use click Yes then let's make sure feels
match and then we're going to have a
custom fields that information here that
you can see there are two lines and you
have the container number and
container serial number so we have to
separate those two let's create the
first one it will be called
not being married to be number of
packages packages and don't forget two
years always the the underscore it off
in america it's alright well I had
another one in here we have to create
the first one and then someone we are
going to use and that's just create the
second one and call it a number and we
know that we went to either the first
line orkut the last line here let's just
cut we said we have here at three blank
texts here let's just go to the first
one and then read the rest we are going
to use the container number of America
and we don't want to remove carried
returned in on the container sooner
let's make it instead of offering
america let's make it custom we don't
want to remove carriage return their ego
and we want to make sure the function is
set to cut last line but here if you set
to last line on here it will only get
that last one but if you want to the
first one that would make more sense we
would have to make the container number
custom here like this and make sure the
function this cut first line and then
let's make the container see you again
always making sure they carried returns
not removed
and appear on the perimeter of the
function we want to cut from the
container soon number so it will reach
the field here that we work we will have
here is the container same number now
that everything is just good with us and
other thrilled to be scription
can just leave it as it is
deposit their eggs he could make mirrors
so three should be alright and we have
gross weight in kilograms rules cagey
year ago and we just click OK
now we are ready to create our agreed so
has the hardest step we come here on the
edit button and you can see we have new
capture read cursor here that's just
select from here
here to here we just met again they
already know table size in the reference
lines and release it first thing is we
want to move the header as you can see
here it's taking up some of our data so
let's just get our own house around here
and then push it here on those two
fields we want to make sure that's let's
first they'll eat those useless once
left or right click and delete column
and the late call them again and let's
make sure it so we have crews wait here
it's good you might just push it a
little bit further signs here
deposit again here
and we always want to match the original
we have container number and container
sooner if we look at their options
our custom fields as they contain a
number that one just often Americ they
are to be 1st line so we we weren't here
it's containers your number so let's
just a little container number we go
here to that later on a rear left click
it and the late Carl just make sure
measures and as the other ones we are
going to use that last one is master you
can see it already rained so it it is
the master but we have to make sure we
have a few things right let's make sure
you have to click on their here you have
to make sure it's plates or you can see
everything is placed correctly we also
want to make sure that tax carriage
return that now we go to the left after
great here in the great menu and then
let's make sure we use it on all pages
again
limit is that start the lower limit is
the table and end the proper position is
relative to reports so now that's also
relative to that and now we make sure
everything is working by clicking on the
red button lets make sure everything's
good we can see here there is actual
line but that shouldn't worry 54 bales
in their ego and you can see here since
we only the first line that shows up
here in here we have that extra
character that
do you know what it is but if you wanted
here you know how to do it and if you
want it here
you also know how to do that and you can
see everything else was perfectly
exported and you can also see that this
those cells here they were imported as a
single line of text you should be able
to get each of the lines but I don't
know how useful that would be now we are
ready to go to the third volume in order
to create the table we have to make sure
we are on the grid control tab here so
we selected and click on Options to get
the data we want to extract we have here
the beer number we have the bill date
and we have the bill type so we just
delete all the fields that we have here
click yes and that's creating the
Triffids we need first one will be built
and the score number second one will be
there when this court date and the third
one will be an escort right now we want
to we have wine here and we need to make
sure we extract only a little bit of
that line here does that sell here the
first one will be a custom all of them
we weren't paying customs but we don't
want to remove their carriage return
every go and they function we want on it
here we want to make sure you could the
first line the beauty that will be the
header that will be showing up here
second one or so because time and here
is where we are going to use bags
we don't want to remove their carriage
return function and then read texts
between me and you can see here it asks
for it start and end tank we just go to
the parameters you can select our column
here you know we add a comment and you
can see it starts with dated and end
with type so we have to type texts in
here let's just type they too thats
great / in type with capital T
into that and that should do it their
way up here it is working
also down here let's just leave it for
numeric just click OK we won't be
messing with that right now so we just
make sure you're great control click on
Edit create
draw power grid here just follow what we
have been doing since the beginning you
can see bill type is here but it is not
the MasterCard so let's make it master
with click on the header here and then
we make sure it's great roles based on
this column is selected now it is matter
that's pushed is a little bit like
splits overtaxed collects attacked a
carriage return test if a carriage
return the pacts to make sure we have a
correct extraction let's make sure they
are you must meet a regular expression
so the program we want to program to
every time it rains foreigners and six
digits separated by a slash
knows it as a new so so we need dollars
four times
/ hyphen and then we want to change its
that's how you pay pin numbers six types
and then we don't want to use the phone
line so it just click OK
now we have everything set up for days
part there's just go here just get out
of here on the grid menu let's make sure
we use it on patients and that the upper
limit is stable start the lower limit is
stable and position is relative to read
but split and you can see new anger has
been great now if we click to rate you
can see we have a problem here because
they left field
the bill types not correctly so we just
gon options let's go to our last field
let's make it we don't want to remove
the carrier returned in function lets
make it cut all this line in the column
will be itself we are good to go just
check everything is good here everything
is looking good
everything's looking good so click OK
and click read now we can see it's
correctly read but once again if we go
to this second document where you have
documents that change the data that
change pages you can see it's actually
correctly read because it has the other
day on this page but if it were to go to
the next page
he would see an error here to make sure
we don't see that error we click on Edit
great making sure we are great control
tree and it's great we go to the great
menu here and then we make sure the
proper position is relative to the back
now no more errors that means we're good
to go now would be the time where you go
to this can import you could relate
those documents just come here quick
select all and you can be late selection
and you would just import your fires all
of them click Yes make sure you have
covered pages two images 300 extract
text and click OK and that would be the
time and then you would just select all
again and process selection will have to
make sure you have OCR to your selected
you have read field so that you have
read great control selected and then you
click OK and that would be a matter of
sitting in the waiting for the program
to read all your data that shouldn't
take long we only have one document that
shouldn't be hard and when you go to the
margins panic until we have the
information here now the easy part 2
exported we go up here in the exports
patch but you can see we have a PDF
export the file system lets just click
on the PDF export and deleted we don't
need another PDF I the latest on that
but I'm here and let's create a new one
we go on the button here the green
button to an output let's make it
XML report to the file system and then
click No here
you can select 11 X amount per batch one
XML per page we just want one per pack
because I will probably would 12 same
and here we can make sure our fields
being exported you have documented a
truck driver denny and I'm here you have
agreed control module if you had that
set to NO two false you're great day
wouldn't be exported so make sure
everything is true here and if you have
some day that you don't want to export
you just don't selected make it force
everything is true so we just click OK
in here we can control the directory
structure the extension we just live as
it is we click on the file system here
and that exporting and click on export
now but let's click on the model here
the experimental module here and click
on the file and then open and we can see
that everything was imported correctly
you can see the sender time you can see
they named the sender and if you go to
the document you can see that they has
been imported correctly in that case I
hope you enjoyed it and if you have any
questions about that type of data
extraction you can just contact our
support team they will be ready to
answer all of your questions but I see
you on the next order

Video Length: 37:04
Uploaded By: ChronoScan Advanced Scan & OCR Software
View Count: 1,302

Related Software Products
A-PDF Form Data Extractor
A-PDF Form Data Extractor

Published By:
A-PDF.com

Description:
A-PDF Form Data Extractor is a simple utility program that lets you batch export PDF form data to CSV or XML file format. It provide a visual form fields extraction rule editor to verify and define what form fields to be gathered conveniently and automatically. p A-PDF Form Data Extractor does NOT require Adobe Acrobat, and produces documents compatible with Adobe Acrobat Reader Version 5 and above.p Features:p The complete set of PDF form data extraction features includes tools ...

A-PDF Data Extractor
A-PDF Data Extractor

Published By:
A-PDF.com

Description:
A-PDF Data Extractor is a simple utility program that lets you batch extract certain text information within the PDF to XLS, CSV or XML file format. It provide a visual PDF data extraction rule editor to verify and define what data fields to be gathered conveniently and automatically.


Related Videos
PDF form data to Excel
PDF form data to Excel

extracting data from a fillable pdf form to excel hr / bClosed Caption:/b I want to show you what happens when we made a form in the advanced in design class a fillable PDF form and then what happens after you send that and then somebody fills it out then they send it back to you via email how do you get the information out of it so here i have the form we build in class i filled it out and then I send it back ...
Video Length: 06:09
Uploaded By: Mark Elliot
View Count: 7,761

How to extract PDF data to create an Office Excel Spreadsheet file by using A PDF to Excel
How to extract PDF data to create an Office Excel Spreadsheet file by using A PDF to Excel

This video shows you how to extract the data from the PDF file and save it as an Office Excel spreadsheet. See more at: http://a-pdf.com/faq/how-to-create-an-excel-spreadsheet-from-a-pdf-file-by-using-a-pdf-to-excel.htm
Video Length: 01:55
Uploaded By: pageflipbookmaker Maker
View Count: 2,945

Extracting from a PDF Data Source
Extracting from a PDF Data Source

Visit us online at http://learn.objectiflune.com Learn about what we do on our blog http://blog.objectiflune.com For more industry stories, follow us on twitter http://twitter.com/objlune OL is a trademark of Objectif Lune Inc. All registered trademarks displayed are the property of their respective owners. © 2015 Objectif Lune Incorporated. All rights reserved. hr / bClosed Caption:/b so here ...
Video Length: 10:42
Uploaded By: OL Learn
View Count: 701

Copyright © 2025, Ivertech. All rights reserved.