Episode #123

Business Continuity & Disaster Recovery Explainer

With Jed Fearon
Solution Advisor at Integris

Anthony and Jed talk through a cyber insurance questionnaire with insight and examples from real-world situations.

Check out the transcript below and listen along with the embed, Spotify, Apple Podcasts, or your favorite podcast app.

Transcript

Intro

Jed Fearon: Anthony, how are you?

Anthony DeGraw: I’m doing well, sir. How you doing?

Jed Fearon: Doing great.

Anthony DeGraw: Awesome. You know, going into this Jed, this is my favorite session every week. And it’s perfect that it happens on Friday. Allows us to unwind a little bit.

Maybe we can even share this with the audience. I got my Madrid mug. You see Madrid there. So Tyler Daniels on our team who handles social media, got me this mug when he visited Spain, I think in 2018, 2019. Got a cappuccino, I’m ready for the cyber insurance conversation.

Jed Fearon: Well, I’m glad we had talked about doing Disaster Recovery and Business Continuity. And I was going to pull out five questions on that topic from Measured Insurance questionnaire.

Business Continuity vs. Disaster Recovery

Jed Fearon: And I’ll give you this heads up in advance because it looks like the first question they are combining Business Continuity with Disaster Recovery.

And what I really think they mean is Business Continuity, which is the uninterrupted continuations of operations when something major happens and then the Disaster Recovery part, just as an example, applies more to data.

Anthony DeGraw: So I’ll quickly read for the audience because I think this often gets confused, right? And I think there’s just small tweaks in the language. So I loved what you shot over to me. It was titled “The Differences Between Business Continuity and Disaster Recovery” and, in our world, we all use this interchangeably, we’ll even call it BCDR, we’ll call it DR, we’ll, you know, we have all these short terms for it. But what, in what you sent over, I’ll read it for our audience right now. A Business Continuity strategy can ensure communication methods such as phones and network servers continue operating in the midst of a crisis.

Meanwhile, a Disaster Recovery strategy helps to ensure an organization’s ability to return to full functionality after a disaster occurs. To put it differently, Business Continuity focuses on keeping the lights on and the business open in some capacity, while Disaster Recovery focuses on getting operations back to normal.

So there’s a simple example that I’ll use for the IT space that I’ve seen. I think of Business Continuity more around like a blimp on the radar, for instance. Your local power to your office goes out for an hour, a little blimp on the radar. How are we going to continue normal business operations?

On the other side with Disaster Recovery is, you have a big cyber incident, a ransomware attack that encrypts the entire network. That’s a disaster. You are completely shut down in that instance. How are we going to get you back up and running as quickly as possible? So I love that you shot that over to me because it was something even that I don’t think through, to be honest with you, I just use it.

And I think it’s really good for our audience to know. So thank you for shooting that over.

Jed Fearon: Absolutely. What I thought it would be really important to highlight how an MSP could not only help their clients fill out this application. And certainly you would hope anyone who reads the question, do you have a Business Continuity or disaster recover recovery plan? That would be yes. But maybe you could outline just in sketch form what a Business Continuity Plan might look like.

Sketch of a Business Continuity Plan

Anthony DeGraw: Yeah. So from a very high level, right? Cause everybody can probably Google these things and even get templates. I know we can share templates and things from a Business Continuity Disaster Recovery, but the way I actually initially was described this was from our founder and CEO Rashaad Bajwa and he actually brought out a whiteboard.

It was a whiteboard in his office and he said and we were going over basic concepts and this is one and it was around backup, but it involved this content. And we were talking about Recovery Time Objective, Recovery Point Objective, which plays into this conversation.

Recovery Time Objective being, how long are you going to be down? And how quickly can we get you back up and running? And is that recovery time, is it four hours? Is it six hours? Is it 12? Is it 24? What is. And is that comfortable for your business?

The Recovery Point Objective being, how much data are we going to lose in that stretch of time? So if I have a backup running daily, every single night. That means if it is – I’ll give you a good example and say, the backup runs at 12:00 AM every single day and at 11:59 PM, the next day, I go completely down. That means I lost 23 hours and 59 minutes worth of data before that next backup ran. That RPO, that Recovery Point Objective, therefore is 23 hours and 59 minutes.

In most of our examples that’s down. Most of our organizations are doing hourly snapshots. So it’s 59 minutes. That being said, Rashaad expanded out from that view. And he said, you actually have an RTO and RPO, which folds into the Business Continuity Disaster Recovery conversation around everything in your business.

So he said, let’s take the managed services business that we operate in, and that we know. You have different departments, right? You have finance and finance uses different systems. QuickBooks, Sage Intacct.

You have our engineering department, that’s servicing our calls. How do they service our clients, our phone systems, how important is our phone system? They get support requests via email. How important is our email system? How important it is for them to log tickets? And the point of this is that you should look at when you’re looking at these two plans and designing them for your organization, you should really list out every single department and every single application they use.

And then really think through how long could those teams be down. In our case, obviously we service clients, they call us and email us. So our recovery time on our phone systems and our email systems is basically zero minutes. We don’t want those things to ever be down, right? Okay.

Now that we’ve defined. How do you go and protect against them. Once again, on the Business Continuity to keep the lights on a simple outage, versus a Disaster Recovery when you’re really done. And then you design those plans around those two situations.

I’ll give one more example before we’ll continue. But he mentioned our finance team, right? We’re an MSP. And he said our finance team, if you really think about the systems that they use. They actually, you know, this was when we were a smaller business, they actually could be down a little bit longer and it wouldn’t really hurt us too much.

We’re not invoicing people in real time. We’re not a transactional e-commerce shop that needs our products listed in the web applications to be running. Our finance team would be okay if they were down for a day and they lost that day of work.

So overall overarching, you’re looking at, what are the teams? What are the systems and how do I service my clients to come up with the formulas on how you think through designing each one of these plans?

Jed Fearon: Well, I tell you, I was definitely a little challenged, trying to think of personal examples, whereas an MSP employee going on 20 years now, I ever recall being down in the middle of a disaster.

And I guess we’re blessed because it really is the nature of our business. We can’t be down, but I would say one of the biggest drivers of that continuous uptime and lack of drama has to be that most, well, a hundred percent of our applications are cloud-based.

Anthony DeGraw: Yeah. That’s the –

Jed Fearon: phone system where the phone system just shifted over to another data center, whether it’s RingCentral at some point, or I’m forgetting we’ve changed. Tiva.

Business Continuity in action

Anthony DeGraw: I have a couple of different examples that actually happened at legacy Domain in the Northeast, one of them was our office power. We lost office power and at normal you think that’s just going to be like, hey, 10 minutes, 20 minutes, 30 minutes. All of a sudden, I remember we were all in the office and somebody got, our operations manager got a call from the township.

“You’re going to be down for six plus hours right now. That’s our estimate.” I don’t know if somebody hit a transformer outside of the office park. I don’t remember the exact situation, but because we’re an MSP, because we’ve thought through these things, we had a full blown generator on a trailer, massive scale, that could get us back up and running.

And that’s following that Business Continuity Plan, of if you’re going to lose power over X amount of hours, this is the step we take next. And in that case, we had some of our engineers go out there, fill that bad boy up with gas. And get that thing rolling, because it was crucial enough for us to have that happen.

The next big example is with clients, obviously, right. When Hurricane Sandy came through in 2012, up in the Northeast that took a lot of clients, especially in downtown Manhattan. Their buildings flooded upwards of 10, 12, 20, 20 floors and –

Jed Fearon: I didn’t realize it was that high.

Anthony DeGraw: Yeah. So they, and they had downtown Manhattan.

Hadn’t been flooded like that and probably however long. And they had some critical infrastructure in the basements of these facilities. And people couldn’t locate their offices. Or get to their offices and work for, upwards of months for some people. So quickly moving servers and infrastructure and trying to get out of there is very important.

“People think of the Hurricane Sandy’s”

Anthony DeGraw: And then the one thing that I always like to stress here when having this conversation. Actually the problem of the example I just brought up. People think of the Hurricane Sandy’s, right. They think of those massive storms and, oh, that’s not going to happen again. Or it doesn’t happen that often.

The reality is that what I explained about the power outage. Happens way more often and catches so many people off guard. I’ve known of small, not small, decent size thunderstorms. That just based on the way they rolled through New Jersey, took an entire county down and for a couple of days and just in the middle of that, not up by the water, not by anything crazy, but a nice set of thunderstorms rolled through.

And it would take out a county and you’d hear about it on the news. And that’s the situations that people don’t ever think about or plan for. And that’s where Business Continuity would really come in. Is those blimps on the radar? Because at the end of the day, most organizations, biggest costs is their salaries and wages for their employees.

That’s their biggest costs for almost every organization. And if you think about this in terms of ROI, return on investment or really, in the insurance world, that business interruption, what is that cost you? That leading driver is what is the salary? Every single hour that you have to continue to lay out that your people are no longer productive.

I’ll pause there.

Testing your Business Continuity

Jed Fearon: No, that’s a good lead in because I’m going to dig deeper into this for some statistics. But the next question related to Business Continuity is when was it last tested? Less than a year ago, more than a year ago, more than 24 months ago, never. Do you know what the stats are on that? On how many businesses actually might stage whether or not their plan has sea legs to withstand major flood, a storm like Sandy, et cetera.

Anthony DeGraw: Yeah. I don’t have the statistics on it. I’m sure if we did quick Google search in the background we may get an answer to it.

But this is you and I had talked about this in the past about even just backups and testing them and making sure you can restore from them, which I even think is a question on this that we’ll get to as well to recap. Is at the end of the day, from what we see at least on the backup systems is the answer is it’s not often or rarely and nor do they ever run through the exercise of actually practicing it in the cyberspace.

Jed Fearon: I’m for that, I seriously, your average, $5 million, $10 million a year business. I can’t see any attorneys or CPAs, or money managers go on, all right, guys let’s spend four hours doing this.

Anthony DeGraw: No, it comes and it comes back to the person who owns cybersecurity, the person who owns this, and sometimes it’s operations.

Most of the time that all reports up into somebody in the financial position. And like you said, their time, you know, the other day we were seeing that most IT organizations report into finance. Most HR organizations report in the finance. Sales and marketing, in smaller businesses, they don’t have the ability to silo all of these.

And normally it flows up there and the answer is they don’t have time. And this is where I kind of go against the grain on a lot of the larger consulting companies where they’re so focused on policies and procedures. And the answer is exactly what we know it is. Even if I get the policy up and running, I never look at it, or I don’t even know where it is, or I never even tested or update that policy or procedure to today’s standards.

So even if an emergency or a blip on the radar happens, there’s no chance that I’m going to, I’m going to go look at that and it’s going to be completely outdated. And I may have Jed as the next person to contact in a situation and Jed doesn’t even work for my company anymore.

So yeah, the answer I don’t have the stats, but my gut, I rely a lot on my gut. My gut says that many are not updating those policies and many are not testing those policies in real life.

Jed Fearon: Well, I’d certainly think being inclined to believe that cloud would be a nice, safe guard against really having things by the farm when there’s a major event. And that’s why, as I mentioned, 10 minutes ago, the whole idea of being down.

Wow. That’s never happened to me, and that’s because I worked for companies, technology companies, that are largely cloud. Now you did a nice little lead in, and this is about maintaining regular backups. And they ask, do you maintain regular backups, at least monthly? Which I think that is way too wide, a timeframe. And are they encrypted?

Anthony DeGraw: Yeah. And I’m glad you mentioned that. The fact that, and this is a wonderful reason why you’re on our marketing team and your industry experience, that many folks wouldn’t even pick up on that. And you picked up on it as you –

Jed Fearon: There’s two things going on there.

Anthony DeGraw: Yeah. So yes, you don’t back up monthly. That is way too long of a period, because most likely if you’re backing up monthly and something goes wrong, you’re probably going to lose about six months worth of data by the time you actually figure it out. So yeah, the recommendation should be hourly. There should be hourly, hourly snapshots at minimum with daily off-sites is the rule of thumb to start at.

So a lot of times you’ll see, and I think even in the question they talk about weekly, monthly, and stuff like that. But, you should be doing at minimum hourly snapshots and daily offsite backup, and they all need to be encrypted.

Jed Fearon: Yeah, exactly. And I’m sure the viewers of this podcast know what encryption is. So just in case there are a few newbies, could you lay out just a quick description of encryption for them?

Anthony DeGraw: Yeah, very briefly for the Jed’s and Anthony’s of the world. If we went and found a USB storage device or or another storage device of any size and we plugged it in we would see a random mix of multiple characters that we would not be able to read on the screen, nor would anybody else.

And you would need an encryption key. Basically, it’s a passcode of some sort, right? An encryption key to make that, that hashed attacks, that crazy text look like normal words of what it actually is.

Jed Fearon: Oh, that’s a great definition. Thank you very much. The next question they ask are your backups segmented from and inaccessible, through the organization’s network? Yes or no. The significance of that, I’m sure you can explain.

Anthony DeGraw: Yeah. Your legacy backup systems were done in the server closet, right? You had a server. You’re running a physical appliance in that server clients in that network right directly in that room.

And that’s where your backups were being done. You could open the closet door, you could see that the backup was green. You could log into the system and everything is all good. There’s a couple of problems to that. One, that cyber threat actors, more public than others.

But number one is back to Disaster Recovery. What if I can’t get to my office? Couple of examples of that. I have a fire in my office, or even just my office building. Maybe my neighbor has a fire but because of that, the police and fire department are giving no access to my facility. I can’t get in. I can’t go restore from backup, which is the whole point of having a backup is being able to duplicate that in another environment somewhere else, to be able to get access to it.

A storm system, a Hurricane Sandy, and then finally what threat actors were doing was, even via cloud systems, they would get into your systems. They would take full control over your backup system, but then they would even go a level deeper and they would go to multiple backups. So say I’m backing up in that one example monthly.

Well, the threat actor is going to get in and then they’re going to keep going backwards and get full access. So the ability, and I’m going to just reword it the way they do that it’s that the backup system is segmented and inaccessible through the organizations network, is extremely important.

So there’s a couple of different ways you’ll see that done the most familiar is you’ll do an offsite backup to a cloud system somewhere else. So it’s completely located somewhere else. And then from that system, you can also have a physical backup from there. That’s completely segmented from the network. Even offline. A lot of recommendations are to go the third step further, where you have a backup. That is a physically offline, so nobody can access it from any type of internet.

Jed Fearon: Hey, you’re reminding me of one of my favorite solutions. I know that y’all have used it for years: Datto. So the whole idea where you’d have network attached or storage that you can completely remove, have it onsite, you can remove it, but then that’s backed up frequently throughout the day, 50 minute increments, if you want.

There’s nightly tests restores. It’s backed up in one cloud data center. Another one, if you so desire, for an extra cost and you had a major meltdown, you could run under their contract term your business in the cloud for no extra charge. I think it was 90 days, the last time I checked. So that’s a pretty incredible Backup and Disaster Recovery.

Anthony DeGraw: Absolutely. All the providers we work with, you have Datto, you have StorageCraft, you have Veem. There’s a bunch of different solutions out there.

Hot Sites & Warm Sites

Anthony DeGraw: You can get pretty complex on the backup solution, the main systems. Then you can get to a system called there’s something called a warm site, and then there’s something called a hot site and I’ll go all the way to hot sites just to prove the difference.

A hot site which show you your RPO, your Recovery Point Objective, and your Recovery Time Objective on a hot site would be zero minutes on each, basically instantaneously. And what that really means, and where’s that practical? It cost a lot of money to do that. But it’s practical in like the financial services markets, where they’re trading stocks in real time.

There’s a lot of practical applications of having a, what they call a hot site and a hot site is your environment is running in parallel. The same data is running here. That’s running here and it’s live. Going back and forth. So there’s at any moment in time, we can shut this down and this can get up and running. Shut this down, this can get up and running. They’re both live at any one time. Extremely expensive to do. We actually have clients that have that type of setup based on the nature of their business.

And then there’s a middle ground of a warm site that’s a split between each. It’s not a hundred percent live, but it’s fairly close.

Frequency of backups

Jed Fearon: Well, the final question that I would suspect a lot of businesses can’t answer without an assist from their MSP or their IT department is the frequency at which data is backed up and test restored. I mentioned, three minutes ago, data, how it can be tested nightly. But they ask here, is it done at least daily, weekly, monthly, and even has none. Hopefully no one says none. They’re not going to get the policy if they say none.

Anthony DeGraw: Yeah. So the frequency of backing up data and do you test a restoring from backup so interesting. They don’t even have a yes, no option. So I don’t know if that means, if you answered this question, you are, or you aren’t doing it.

But yeah, as we mentioned before daily, weekly and monthly are on the check boxes here. If you’re at a monthly and a weekly you’re far behind in today’s capabilities. That made sense years ago as the cost of backing up and doing that process, the software, the hardware to make you able to accomplish that, was more expensive. With today’s pricing and costs to do this it should be at minimum daily backup.

As Jed and I have mentioned multiple times our standard solution that I know we go to market with is an hourly, hourly snapshot. Every single hour, we’re taking a snapshot and then we’re doing daily offsites. Jed had mentioned 15 minute increments. And we, you could go as low as zero. You could go all the way to that hot site. But yeah, my gut would say that you should be considering at minimum daily backups. And then in terms of test restores from backup, we find a lot of people are not actually doing this. And when do you find out when you had one of those Business Continuity issues or those disasters, and you’re going to restore for the first time and you’ve never been.

Another reason why you want to back up more frequently than less frequently, right? If I’m trying to restore from a month’s worth of data or a over a snapshot from a month ago. That takes a lot more horsepower to get up and running than a snapshot that was just taken as recently as 59 minutes ago.

And yes, you should one make sure backups are running. Number two, go into the backup file and make sure it’s actually, the data that you wanted backed up and not a blank file. Cause that happens. And then number three, yes, you should be, and we try to, quarterly test a restore from those backups. That’s a very good rule of thumb there, standard.

How larger orgs can focus on Backup & Disaster Recovery

Jed Fearon: Well, I’ve been through our ticketing system to look at different ticket matters from our various clients.

And I’ve seen a lot of entries for these automated backup in testing. That was very reassuring, but I’m not as sure what companies are doing that don’t have an MSP helping them. Mainly smaller entities who might just have something makeshift in-house with an IT person that might fulfill other functions.

Anthony DeGraw: Absolutely. Even small businesses that have an MSP, we find a lot of organizations and the Southeast does this very well that have current providers in place. And there’s nothing wrong with them. They just may be smaller. Maybe they’re, one to 10 people, a MSP provider or partner, and they’re just wearing so many hats.

And one of the hats is what we’re talking about and it goes to the bottom of their list. And one of the difference in working with a provider that has the size and scale that we do is we have teams of people that are built out on their specialties. And one of the specialties obviously is Backup and Disaster Recovery, how important that is. And they’re doing everything that we just discussed over the last 30 minutes to make sure our customers are protected. The last thing that we want. And I tell customers all the time, you should be holding us accountable for Backup and Disaster Recovery. And we need to be executing on that.

And if you go down, we should be there to make a difference and hopefully surprise you in how fast we’re able to serve your needs and your company to get you back up and running your team productive again, and make sure you’re servicing your clients. Extremely important conversation.

As Jed brought this conversation up to me, he had multiple insurance applications cross his desk, multiple clients that are starting to ask about their cyber insurance renewals, which is gearing us up, but depending on when this gets released in mid February to late February, we’ll do a cyber insurance webinar with our friends over at Measured Insurance. And the whole point of that is the cyber insurance market is changing. Multiple customers are coming back to us now saying, my cyber insurance carrier won’t let me renew my policy unless I get X, Y, Z in place. We’ve never seen that before.

And my gut tells me that’s because the cyber insurance carriers are getting rocked in the terms of the amount of claims that they’re paying out or examples of claims that they’re seeing. And they’re getting pretty serious about these applications and the questions that they’re actually asking.

And probably as the Measured Insurance folks told us even denying coverage or denying renewables, because you don’t have minimum standards in place. And once again it’s, I keep telling people it’s 2022 right now, we’ve been dealing with this. Jeff has been seeing this as whole career.

It’s been 10 years of significant breaches, ransomware, everything that gets more sophisticated. It’s 2022. It’s not 2000 where this stuff didn’t, wasn’t top of mind or SMBs weren’t dealing with us.

It’s 2022, insurance marketplace is getting smarter. Managed Service providers are getting smarter and at the end of the day, clients are getting smarter too. We’re here for you.

Jed. Thanks for doing this with me, man.

Jed Fearon: Absolutely. Have a great weekend.

Keep reading

Multifactor Authentication Breakdown

Multifactor Authentication Breakdown

Nick and Susan's monthly episode is joined by Lexie Nelson, a vCISO at Integris. Today's topic is multifactor authentication. We're going through a full breakdown into MFA: how much it really protects you and your organization, the things to look out for when...

“Anything We Can Do to Make It Right Is Our Thing”

“Anything We Can Do to Make It Right Is Our Thing”

Scott sits down with Jared Nolan, CEO of Norman & Young, a full service media company serving real estate agents. Jared talks about the highs and lows the pandemic has brought the industry, the new technology and standards raising the bar in the industry, and how...

How Companies Fail Vulnerability Management

How Companies Fail Vulnerability Management

Susan and Nick talk about Nick's must-haves for vulnerability management programs, and the best practices for whoever owns that process in an organization. Check out the transcript below and listen along with the embed, Spotify, Apple Podcasts, or your favorite...