Key Requirements for OpenStack Backup and Recovery
With increasingly complex and critical IT environments, companies are looking for ways to fully protect their business, while at the same time providing easier, faster and more reliable recovery. One of the biggest challenges when deploying an OpenStack cloud in an organization is the ability to provide a policy-based, automated, comprehensive backup and recovery solution. This session, co-presented by Murali Balcha, one of the creators of Raksha and co-founder and CTO of Trilio Data, and Brian
Closed Caption:
like
brian Davis come from a come from a
workout on exact price solutions vice
president for cloud services cloud
delivery been here about seven months or
so and I come from a large financial
institution I want to talk today about
OpenStack and backup for the first time
you've ever had those two words in the
same sentence so the questions are back
up data assurance data protection do we
care
this is the cloud after all if you think
about the things that we've been doing
in OpenStack cloud delivery back it was
never a consideration the community and
corporate itplease workloads and the
cloud should be last thing they should
be ephemeral the community been to any
previous summits you didn't hear things
about back up here
car workloads are federal we do not need
backups has one more time
a conversation that I was actually a
part of when we were being first
introduced to cloud when we being first
introduced to backups that hey we're
going to redo all of our applications
there are gonna be allowed their other
mccloud where there is a couple of Si
sit around the table saying do we have
any clout where applications ugh I don't
think so I don't think they even knew
what that was but they say what the mood
to move to the cloud backup the data
that there's no backups out there are
guys stupid this the cloud we don't need
back ups so I T organizations have this
question is is do we need a backup
solution no course not all the apps that
land in our cloud are allowed to wear
their elastic they are resilient people
agree with that
how many people in here in this room
have large enterprise environments
Johanns a few of us mid mid size small
ok so all of you that have environments
every at the you move into the cloud are
aware cloud resilient elastic they know
how to grow and shrink right if that's
the case you doing great your program 1%
I know my experience when we brought our
cloud up when we brought up OpenStack we
moved in all of our legacy stuff from
our legacy environments and we were
still living by hey we don't need back
ups
well one day we had a business impact
business partner comes to ask the
customer says hey Brian we need to
restore this store this data and all of
us in the room just looked at each other
like hopes
cloud and virtualization in the
enterprise requires data protection
period I know said data protection back
up okay most private cloud environment
still host legacy applications and
legacy workloads they have no ability to
take advantage of cloud elasticity cloud
in OpenStack lack the virtualization
features that some of our in a legacy
hosting environments had for going with
this thing here sorry no questions back
of versus data assurance our legacy
thought his backups were modeled around
how to capture and store point in time
data captures you saw an agent backup
some data if you have a failure to go
back our to you can restore the the data
or the VMware whatever well an hour to
think about then some environments may
cost millions of dollars ok so current
thought is we need to be looking at data
protection deprotection is modeled
around availability and recovery so
things are going to talk about today's
gonna we're gonna look at images snaps
flexibility fault-tolerant scalability
and instead recover of instant recovery
of a bit of avium or an application or
or your data
so the question is how do we solve the
enterprise and the for data protection
in our OpenStack clouds those questioned
and I had in my previous role this
question that I work with my customers
today when introduced to lead a team
early here from Chile or data and we're
gonna talk about exactly how we do that
popular daytime you know thanks everyone
for sharing this one this is the first
session in the OpenStack I'm very glad
this is one of the first one so I was
going to offer good yeah I think I've
been coming here like Daniel by serious
growing leaps and bounds
probably product I just like eight nine
thousand very good we will discuss the
backup and recovery on enough and you
know there are some kind of raised their
hands thing that they have some
protection really Butler started running
in the OpenStack how many ppl dabble
with the challenges of the backup and
recovery image our hands so how was your
experience of good talent so there are
europeans are available now center
depending how you lay out your what he
learned you may be able to put together
some solution put together some solution
purely based on the stories based
snapshots but in the end of the day when
it comes to real enterprise backup and
recovery putting together solution just
made an appearance at the centre APS R
william natcher themself it's going to
be challenging right so this is a
problem that we are trying to solve here
so when you look at the stack right I
think I'm getting you please excuse me
in your country
backup and recovery for your client
would do the thing that comes to your
mind like what do things like what the
backup should look like that is kind of
different from what you've been doing
for our may be different than what we
have been doing with right you know I
could this question for various people
like me and when we go to the customer
side and say okay they need about one of
the men and they said look I need to
back up the contractor obviously the
first impression that target and then
suddenly been be striding down deep at
the american america discussion that
goes back to his ok they really need to
backup tenant business well years and
most other controller database obviously
you can recreate that scripts and any
snapshot you take at the database level
in the country that may not reflect
what's happening in the tenant space
because the moment you take the snap
shared some time and may have created a
new research I deleted a new research so
the snapshot that we have may not
reflect what is our debt
even though you have a record of what is
there in the tenant space you don't have
all the information so that other ways
well try out some billion solutions are
there right back up in the car is not
something new like contenders but this
is something that I D has implemented
based on the kind of running so let's
start with what they have right now and
then try to try to do a lot of scripting
and try to basically put together a
solution that will but something changed
write something changes otherwise we
won't be embracing OpenStack something
radically change compared to what we've
been doing for the last 20 years and I T
and that is forcing us to take a
completely new look how the data is
product right there are two things one
is a little bit about it it is about
highly distributed nature of the workers
right it's not like you want a
high-performance database and then you
basically by the baddest and the biggest
around there and then and then tried
backing up right skill out that means
you basically provision not 111
provision multiple times multiple
volumes and then your day is productive
this Williams right so what good is to
back individual files with every year
when you don't have the complete picture
and the second one is operating model
right company operating model you have a
multi-time and see how the ideality and
you have you have the elasticity so when
I was talking to my mom I i different
right whose day job is to make a backup
of ASAP and Oracle and SAP running in
there
and then for the complex near day had to
go and test whether the backups are
running an idea to six months so I asked
him how he tested right so the first
thing they do is we will identify a
server back the standby server that has
identical conflagration and then take
six months or copy and then load it up
and make sure that the application comes
online right well the need to test the
backup need to make sure that you are
complaint is still there right by there
is a better way to do because you don't
need to basically stand by itself it is
cloud and it is a challenge elastic so
that means you have all the resources
out there that you can be a better job
at making your own plans right you don't
you don't need you know you don't need
to do that already so these are the two
two things that are there are
predominantly like forcing us to take a
completely new look at how you do under
the protection in the cloud right so
let's let's dig down deep little bit
into those two aspects one is allowed to
Switzerland right you know this is a
typical Florida heat and play some of
their artistry that as application
template essentially put blueprint so
you have what we use to provision would
flee reference to provision what network
to configure what stories vol street and
then once you have those things and then
you have some puppets generating in each
of the 3 I'm student contribution is
meant to tweak the application
configuration tube is killing styles and
packages so you start with the basic
provisioning and then for the period of
time you basically fine-tune your
deployment to make sure that the
application they're running is meeting
all your operation parameters right now
up and running and then after some time
obviously a lot you know it's not going
to stand there probably is killing
himself
himself now how do you define a point in
time in this industry partners right
it's not a list of files it is when you
look at the point of time real you know
if you want to go back to some time
there and then you want to basically
quickly be a test of whether you got
right back up right for various reasons
so the data is there with the
traditional backup you can backup your
files to the hard content with whatever
things that make up your application
right it basically difference here
application context you know what the
CPU power the network configuration if
you how open some ports baghdad remember
those those security group that you
apply to those Williams at the time
right so going back to point in time
includes training of the whole thing so
that you can verify your back so but if
you want to use your traditional
mattered and then do that testing to
make sure that your backups running what
do you need to basically concerned that
a lot of files that you back devote
enough time right and depending on how
the back of certain other things I meant
a lot of our full backups and how
frequently you take the full backup the
cancellation in a little bit of
consolidation of all the files and then
you remember you know what a major study
of water from some of the intima get it
from the heat and played that used and
the next one is obviously another
williams and then farm art farmer does
things to the right file systems and
then applied necessary security groups
so all these things to make sure that
you regard that point in time so what is
the guarantee that you get everything
right and I what do you miss even a
simple details right night night having
a right back in it
not applying the right security guards
other things matter a lot right
especially when you are under pressure
to basically recording application to a
point in time right
any little things will basically affect
you know how much time it takes to
recover your application and and then
how confident are you that this recovery
is going to work right whether this is
repeatable process a proven repeatable
process so there is a there is a matter
of doing it manually
some of them are dominating those things
and then basically copying the net so in
this kind of workload what would be the
right solution for you write an ideal
solution would be like you know
obviously I don't want backup policies
and if I bases are on a while I want my
backup policies under application and
interpretation could be one albums are
multiple Williams multiple values
multiple networks no matter what it is
like it
application requires your data
protection policy has to be a plane and
then you and your backup solution need
to basically keep track of what is being
changed and then run went so so you
always have it right point in time and
your Mac abbreviation not include
backing up lot of individual tanks but
it should provide some kind of wine AP
no matter what your complex job like
what the complex data replication is
right and the other thing is obviously
the history is the most important thing
you can backup your heart's content but
they started and what would you say so
you're restore has to be as simple as
that you should be able to restore the
entire bag multi multi multi multi vol
application to that point in time with
the years I'd rather it's 21 Apr with
one click into the grid and obviously
been when you simplify how you back up
and recover you have high confidence in
how you can repeat the process right so
this is one of the dominant thing that
is affecting how are you protected and
whatever the clothing fracture so the
traditional applications there sometime
estranged likely have we have a central
administration setting setting up the
solution and some agents in your house
and then backing up your files now with
multi-tenancy model in the cloud
it is all about empowering your tenant
you know the cloud answer to how much
you can keep track what applications
each tenant is raining it's not possible
it's not even cloud model for unity
basically I'm putting our tenants with
right into the cloud so that they are
they want their responsibility for the
data protection so they because they
understand what application that they're
running with the application boundaries
are right what Williams corresponds to a
particular application so what we are
staying pretty as part of their data
protection policy so next one is
agent-based file backups it works but
they don't really understand what the
infraction underneath that application
is right unless your application is
obtained fracture level unless it
understands the VMAs and the flavors and
the glands and center to center types
and networking whether it has a private
network between the teams and then
public interface applications so all
these things a very hard to maintain
somehow your backup solution need to be
skinny understand regardless that that
information so you can go straight the
whole recovery in a single step
kill so proud is all about hard not to
scare you may start with 200 williams
and then going to
composing Williams start with the
candles and the new may go up to 120 not
right and also what is how do you make
sure that your backup solution scales
with clouds so most other solutions out
there are like 600 terrified I did our
plans right so what if you're going
beyond I tend you want to grow it now
you can buy one more plans and then now
you have to our plans that you want to
make it right so it's not really can
skills so your solution is to scale with
your Richard Hannon's whether you grew
from ten 10 and 4 2010 compared to
everything that goes as part of the
cloud management rate and then they were
very very critical part right so I just
like you must be kidding but that they
destroyed like it develops is a
relatively new phenomenon but most other
applications have been developed like
it's been there for 20 years and
obviously DevOps paradigm doesn't really
fit into what would the traditional
treatment solutions are dead
your backup solution need to be part of
your cloud right just like you are
commenting your computer and while you
are the stories and the network you
should be able to deploy I upgrade and
automate your backup solution just with
their doctors you don't need to think
twice I have a different management plan
for managing your backup and recovery so
those two goals in mind back in 2003 we
propose to specification culture that we
did that and then realize I think we are
clearly proven track and when we are
when we have been talking a lot of
people and the reality is that it was
very very early and OpenStack not many
people really deployed OpenStack and
then
backup and recovery is not really at the
top of their mind so we pulled back a
little bit on that and then we
commercialize our product based on the
specification we put together a double
stack so the company founder like two
years back to interfere back we have a
pretty good background under the
watchful eyes Asian backup and disaster
recovery and the cloud in fracture my
see what has numerous I people and
acquisitions so that's a brief
discussion about what our companies but
we'll talk about like what our solution
is right so just like any other act like
the computer networker emissaries so our
says backup and recovery as it has the
same look and feel any other service has
a pattern-based Arab Arabia is it has a
RESTful API two men is to define your
backup Johnson then Minister backup jobs
and the fun factor for AS IS IT security
measures that we ship it's been too busy
make it there so it includes scalable
backup engine you can scalable in the
sense obviously each William comes with
finite capacity of backup but if you are
growing your cloud you can instantiate
multiple of the same image and then and
then skill your solution it is truly
multi multi tenant will just turn into
the Keystone as as a backup service and
then we we are afraid cervical cancer
and then we performed on behalf of each
tenant and deploying our solution is
pretty much a non-destructive
already have it on Cloud three are
starting from the ground it's a drop in
solution and talk about our culture and
what the company and none of the
companies are destructive
so this is what I've been preaching
right like in the last two slides a
forum for sites they're back
it has to be beyond file backup are
welcome back of it has to be done run
mandalite better I'd like if you if you
if it understands your application and
then if it basically discuss what the
research says that your application is
using and then pull the strings as part
of the back of policies and then in the
back of those things that's an ideal
solution so we could do that for
applications like a hundred MongoDB
where we can discover what to what we
observed there that that that
applications are running but even
otherwise you can group other related
Williams together as one backup job and
then we discover what resources are
mapped to each region for example what
network from Apple and what senator
Williams our map and we take the backup
of the International Shinra college
enrollment to snap shirt so so that
includes everything about your
application so we support incremental
backups initially to full backup backup
Olivia majors and the center Williams
and other network intrusion so in the
subsequent backups we different we
basically calculate the difference what
changed between between then run and say
we have a new center will be added to
the elements of a backup that as a full
and then the rest of the existing
research as an incremental backup so we
we we take care of what changes which
injured for the environment and the back
of those changes so our vision is to
basically lever is what is out there in
the cloud right leverage the cloud
capabilities like a nice city and
agility and we want to redefine what are
the backup and data protection is done
in the cloud
obviously we need to implement support
their minimal options like back in the
pavilion and undercutting of you are
backing regarding single files are
multiple files within the back but since
we are adding fractions of we understand
how you're doing solid hour
regions are laid out so we we we support
additional cases like for example
restoring it and you available this done
so you may have a production running in
one available design and you may set up
some testing on a different available
design and now you want to basically
test your connection by taking a copy of
your production and then we started
doing the new available for test
purposes that we take care of a
different kind of network out whether
you have a different set up there and we
translate the backup image relating to
what is available under multiple design
and the new available to them and then
we still really starting to application
to the new available so what about
disaster recovery so what if you had two
different routes and you have the
capability to keep the back of spared
you want to picket outside but when
needed you may want to leverage the
remote rural and industrial applications
from there so we support that use castor
the turn the 40th consistently Easter
how many people use the guest fresh
I think anyone with similar so it's a
nice tool right no matter what you are
and what Williams are created so you can
basically use the guest list we explore
more into what the composition of the
room is whether whether it has Albion
volumes whether it has a different file
systems you can explore other things and
you can also trigger fix any country
changes that happen using the guest
which is a very popular what if you're
kind of provide that kind of to rate you
have less attention about tobacco right
and you want to run something you want
to basically we spend that as a as a as
a worker quickly so what if you can you
can you can login and and and basically
explore little things are so instantly
Easter which is which we're working on
essentially you don't need to copy the
data back to your production
you don't need to restart the entire
thing you can quickly spin albums out of
their backup image and then exploded so
for example if you have data encrypted
in the regular tools you can just open
it up right so that we can and then you
can explore more into the into into your
point-in-time copy of it and then in a
blink migration use cases so it it goes
beyond just doing a file backups it is
more about taking life can snap shirt
and then playing so our architecture I
briefly mentioned like you know we have
a cucumber image which can be deployed
on a stand-alone KVM box are into the
glands and then and then recreated under
one particular talent but the most
popular deployment is basically have a
stand-alone KVM and then how this we
created out of this image you can spend
as many times as you want based on your
side of your cluster and then we have
just like know the sender scheduler that
basically does some kind of round robin
to choose the right post which is the
way we had honorable based scheduler
right now to choose to basically load
balance is back up just among these
multiple variants and since our himself
like completely straight less you can
crash and burn but instead of a new
album and then the backup server now so
I played with some of the European that
are available in the OpenStack said that
they depending on what the country
creation of the weather in your building
up the local discredit you are putting
out their server rebooting of the center
of the EPA is a snapshot operations what
little differently so to put together a
complete solution with the CPS are very
very difficult there are some gaps in
the
define when it comes to backup and
recovery so in and the other three are
basically good pictures
gaps we defined by extension it is a
small Python module that need to be
installed on your computer and the basic
functionality of the novel extension is
take a backup of the VMware is running
on the computer and then create a backup
image and then copy to the backup media
in this case an officer and a director
and during the restore operation it does
the same thing in the reverse and takes
take that copy from the back of millions
and restore the country's to the to the
computer so you can elect angela is
responsible for managing the backups are
instances that are running on their
computer so if that the skill is
important read when you are deploying a
solution and OpenStack obviously you
don't want to introduce any bottlenecks
but erratic results should enable you to
basically scale applications so in our
case if if you're if you're going to get
one computer backing instances running
on a computer nerd you can essentially
skill your solution without introducing
a lot about Alex because the backup
engine doesn't do much of my work
separating the metadata to the backup
engine but most of the data data
transfer is happening at the extension
so you say you can't kill this
application solutions out there now so
we can I backup job back there is
nothing but a collection of Williams so
and we have we have a backup engine that
basically identifies all the resources
and then invokes constitute the back of
this thing so for example the backup job
and it has william 1 p.m. 49
so frustrating basically involves the
right hooks into the future of this
country attended sorry to miss clark
estate downturn backup job for that
group of Williams
other important thing is we don't use
any proprietary format for storing are
back we stand as everything and the cue
card images right that is the very
popular and very standard we are
creating backup that not only the images
but also the disk images and there are
thousands of tools out there that
basically helps you manage to capture
images so our base images a cue card and
all the incremental cute images and
other backup images there's a Ralphs
armed guard images so for example the
latest you can't imagine it has a back
reference to the previous incremental
progress in so you can always run in for
the latest backup and and basically walk
through the oil change and you can you
can get fancy can also mounted cue card
image to look at the content so that he
and other one is once you take a full
backup you never had to take a full
backup again because we can't say the
full backup to the back and it's it's as
simple as using the image comedy block
of income and 22 basically club to
adjusting to one right so that so we
move your attention when I feel like 30
the retention window that you want to
keep a backup copies the first day we
basically combined the last two backup
images into one right that's how we do
that retention policy and our store
sells a pretty easy because we don't
have to aggregate incrementalism the
full back up into one big in the staging
area and then restarting it we can
always take any any any point in time
within the chain and then secondly that
into a while you mark opportunity
remembered so our restore surrounds a
very strong operation is also very few
and lastly like we all support an
instant mount snaps so think these are
cute images we can
we can moan them as a device and then
discover the file systems in it and then
and then expose expose the files in that
particular point in time so we do
support and backup media business and
Williams we are actually working on
integrating with other third-party
storage arrays single backup and
recovery and we have arisen plugin that
basically helps tenant to manage their
own backup backup backup jobs and the
rest will appear and then we also have
answerable playbooks to the plan man is
your backup solution and it is a drop-in
solutions are non-destructive second
economic decision on running running
OpenStack the only thing we do is
basically since we register an extension
to the Nova we need to strengthen our
ups and downs but usually people how to
our tree instance of Norway beatrice
rebooting one after under doesn't cause
much distraction so quickly like the
screen charges to give a flavour of like
what our solution look like someone she
basically so credentials so we register
into the kitchen as as a backup and
recovery and point basically goes truth
country or solution and one once you
have a solution and then you install the
plugin we introduce the type of backups
and then we have a popular tax and you
can create a new backup job as a tenant
creating your workload and then at the
williams and certain policies and how
frequently you want to take it back up
and adult thing so once we have a backup
for the details of what the backup
job intense and then it has a list of
backups it has taken and then it is a
developer how frequently the back of his
being performed once you dig down into a
particular snapshot it also gives him
more information about what is backed up
as far as part of the backup job in this
case it to 4 p.m. back up just to keep
it interesting how various flavors of
the rims that are created some of them
putting up the surf and some of them
putting up the local these are some of
them have volumes mounted some of them
have multi multi network interface
adapter charger so we capture everything
and if you if you want to basically
since we have hundreds of potential
backup images so what if you want to
retrieve only few files within a
particular point in time right you don't
have to go and restore the entire back
so we support among snap shirt once you
choose among snapshot we provide it
exploded view into that particular point
in time and you can you can download
file a bunch of files for Firefly level
recovery and then if you were to
basically
restorative and you available right we
have something else like Tourister where
you have more control about what the
backup job is and how you want to
restore it whether you want to restore
to the same elaborated on different
Availability Zone if you want to map out
a new network so it doesn't interfere
with your production you can do that are
you can choose a different William time
than the one we backed up from you can
do that and then if you want to exclude
him from your back from your Easter you
can choose if you want to change the
flavor of 5 p.m. for restoring that you
can you can we give more control about
how you can restore your backup job from
we went through and we are introducing
trigger want as a backup as a backup we
do take environmental snapshots not
justified while you we have one take
backup and restore support it's all
completely turn in German it is horrible
so just like any other service that you
are doing is one more severe
degenerative basically plan and planning
to play questions so there is one more
democrat envision that happening that
there is going to happen on Wednesday my
my colleague greta is going to drive
that presentation but we have a booth 22
assists and stop by for more information
if you have a question can you please go
to the line there so we compared to snap
shirt block by block and then calculate
the difference we really want to
integrate tightly with the street
vendors and then try to calculate the
difference to the APN for your question
about because we will leverage the same
mechanism that normal users accessing
the world so whether it's a cute
medicine and that type in a fast-paced
while Jamar asked us if we use the same
connection string that report uses to
basically read the contents so the
product doesn't matter
so it is more about taxing those two
snapshots and calculating the difference
so can you going to be the detail around
how you provide disaster recovery in
other words replicating the backup
images decided then recovering at the
secondary site so you know it I'm going
back to like taking the logical and run
and does not sure right so we have the
context we basically captured the
context so that means we capture the
flavor is what grandma just booted from
what network types are other things so
for every time we take a back we capture
all the information you know if your
application that in turn back up mister
the remote side right we not only get
what the what the makeup application is
but also the snapshots individual
snapshots so we would see something
important functional essentially what it
imports the back of a limited and then
once you put that through the visa you
can run the selector Easter and then and
then basically we discovered what is out
there on your new cloud right whether
it's the networks are whether it's a
different kind of stories so once you
choose how you want to map your backup
resources to new resource types we're
going to start the day we don't do the
application but we expect that the
stories underlying stories does the
replication
thank you very much
Video Length: 42:42
Uploaded By: OpenStack Foundation
View Count: 609