Extracting from a PDF Data Source

Visit us online at
http://learn.objectiflune.com

Learn about what we do on our blog
http://blog.objectiflune.com

For more industry stories, follow us on twitter
http://twitter.com/objlune

OL is a trademark of Objectif Lune Inc.
All registered trademarks displayed are the property of their respective owners.
© 2015 Objectif Lune Incorporated. All rights reserved.

Closed Caption:

so here we have a PDF data mapping
configuration
with the boundaries already set so my
first source
record has one single page
and my second record
has two pages
as far as data mapping is concerned it
makes
0 difference whatever font whatever
spacing
and whatever icons and colors you use in
a PDF
so it can be as dollars this one or it
can be as flashy as any in which you
received in the mail
so let's start by extracting at the
record
data which is basically the data that is
only ones
for each record to the address customer
information et cetera
there is a couple of ways by which you
can extract
multiple lines of data the easiest one
is to simply do a big data selection
and then hit extract
and by default this actually extract one
single field
where each underlines is separated with
the line return
if we were to use this data selection
inside
love a template it would actually
display
using the line returns so you would have
as many lines in your template
as you have lines in the data here
however if we want to do anything with
the individual line so
check the address cheque postal code
print the phone number in a different
way
then we need to separate them
if you have one data selection with
multiple lines
all you need to do is to go in the step
properties
and for that field you can actually
change it
to split the selection into
many fields
what this does is create one feel for
each line
and the field name is determined in
to different sections so the name be
extracted feel
at the top
and the name of the individual fields at
the bottom
now there's another way to extract data
and it is to do it line by line in that
way you have
a full control over the fieldname
instead of having a two-part like we
have here
now here's a trick when you do a data
selection
on a PDF you can actually move that data
selection whenever you want and it
doesn't
actually affect the extraction
so if I do a selection on the first line
here I can move it afterward
without affecting any of the extractions
let me let me show you how this is
useful
my extracting this first line: moving
the selection
and extracting the second line now you
may have realized that I used two
different buttons
between the first extraction up the
whole
Anders block and the extraction of the
individual fields
in the first one I clicked on add
extract step
what the Addicks track step does is
create a new step
into which it adds the extracted data
whereas in the two individual feels on
the top right
I actually use the had extract field
why do we have these two options well
here's the thing the more steps you have
in your process the less optimized it is
and actually easier to look at
insider the steps because the more steps
you have the more complex to look at
as well as on optimized speaking of
steps
how one thing you can do is rename them
there's two ways you can do that
the easiest is to simply click on the
step name
inside the steps pain
and then you can change that here or if
you gonna step properties at the top
you can change it here also such as
correcting the small type or just but
you can also add a comment to you expect
rank
another thing that's important in
regards to PDF and this applies to text
files as well
is that if you click on a selection
in the data model pain you can actually
modify this data selection directly
so if you move the actual selection
it will change the extract step to
reflect the change that you did
so here I went from customer to invoice
information
and that's the same data selection I
moved
now if you want to avoid moving existing
extractions
there's an option at the top of the data
viewer that lets you lock
all the selections by doing this are
your existing selections
can't be modified
but you can however add new ones so
let's say I want the full address block
to be won this election as I have here
but I also want to have the email
separate
simply do an extraction above the email
and it doesn't touch
my existing selection
and through the magic
reading we now have all the information
properly named
and extracted except for the
transactional data
so let's do that now so the first step
is to tell the Data Mapper where to
start
extracting information so here what we
do is to select
where the extraction will start
and use the go to Step now the go to
Step is pretty versatile
in this instant it jumps to a specific
location
but as we'll see in a couple of seconds
in the loop it can actually jump a small
amount any number of times
or jump a variable amount love lines
depending on how we set it up so in this
case it just jumps
to the first line it'll always jump to
that location the next thing
we want to do is to create our loop in
order to do that
we need to determine when the loop is
going to stop
as far as this data source is concerned
it's pretty easy
when we see the word subtotal in the
middle of the page
we know that all the line items are
finished so
all I need to do is to select the sub
total
and click on the loop now the lubes
also versatile Aitken loop until the
statement is true so
until we see the word sub-total it can
loop while a specific statement is true
so while you see something on the line
it can loop for every element on the
page depending on the day to find you
have
so when we create the loop it
automatically creates a go to Step
within it
and in this case it jumps to the next
line with content
so it it can detect that there's many
lines
between the start and and that the loop
and it automatically just goes one after
the other
and now that we have a proper loop we
can start extracting data
with in that loop so here if I click on
the beginning of the loop
than I know it's going to be added
between the beginning in the loop
and to go to
so all the fields for the needy
tables are extracted what we're missing
now the
only thing that we didn't extract is the
sum total tax and total
I'm in the end thankfully this
information
always appears at the and/or right after
the last line a transactional data
because of the way the Data Mapper
actually shifts
its current selection to where it's
extracting data
when we get to the ended the last line
all we need to do
is the Select these three lines
make sure we're actually at p and love
the
repeat or after the repeat by selecting
it and then hitting extract
also split these into three fields
so the feels are actually added to the
record
not the detail lines because they're
only extracted once
they're not part of the loop besides we
only want them once
so now we know that the extraction
process works for the first source
record:
let's take a look at another
so what we can see on this second page
is that
all the detail lines are extracted
properly
and that the tolls at the and are also
extracted properly
however
we have a bunch of extra lines in
between so what we need to do is to
filter them out
how do we do that we're going to use a
condition step
so the condition needs to be based on
something that
only appears on our detail lines
and because on the detail lines we have
currency or
prices then the dot always appears
at a specific location so basically all
I need to do is to select the dot
and add a condition to it making sure
that the beginning
%uh my repeat Step is selected and that
will create a condition
where any line that is a detail line has
a small green checkmark
any that is not so that will be ignored
is actually has a red X so we know is
not extracted
however our extraction is not currently
within the condition
so if we just drag it in
then we have the proper number love
detail lines
and our extraction a process is
complete and functional for every single
one of our records

Video Length: 10:42
Uploaded By: OL Learn
View Count: 701

Related Software Products

A-PDF Data Extractor

Published By:
A-PDF.com

Description:
A-PDF Data Extractor is a simple utility program that lets you batch extract certain text information within the PDF to XLS, CSV or XML file format. It provide a visual PDF data extraction rule editor to verify and define what data fields to be gathered conveniently and automatically.