2.0 repair or recover IT infrastructure after a failure

2.0 Literature Review2.1 IntroductionTheutilization of ICT has become the norm for organizations, with so much emphasison technology, businesses must find ways to ensure that their IT infrastructureof hardware, connections and software will continually be available foroperations.

Crocetti and Wigmore (2016) defines disaster recovery(DR) as “an area of security planning that aims to protect organizations fromthe effects of significant negative events”. They further state that “DR allowsorganizations to maintain or quickly resume mission-criticalfunctions following a disaster”. DR is a very important aspect of business operationshowever, most plans for disaster recovery are often “inadequate and for somecompanies, non-existent”. (Britton, 2017) To this end, this research will investigateways of utilizing existing and available ICT to produce a comprehensivedisaster recovery procedure for Company X.  2.2 Defining critical assetsA business impact analysis (BIA) identifies “the processes,systems and functions that are critical to the survival of your company”.(Wrenn, 2008) To repair or recover IT infrastructure after a failure or disaster,businesses must have detailed information on assets.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

This can be achieved bydefining critical assets, the threats to them and the different scenarios thatbring about these threats. Rosenberg (2010)determines that “DR should start with the identification of a business’s keyassets and the impact of the loss of these assets”. The term ‘Recovery Window’is introduced which is the length of time a business can operate without accessto a resource. The level of protection of assets will then be determined by itsbusiness value and how long a business can operate without it.

OBoyle (2017)however, believes that the first step should be conducting an inventory of allIT assets. Then a map of “where each asset is physically located, which networkit is on, and any dependencies” of the asset should be documented. This will befollowed by an assessment of external and internal threats and the probabilityof them occurring. Bucki (2016) on the other hand, offers the idea thatthe process of creating a DR plan includes making a list of critical jobs thatare needed for the business to operate and which may be relocated.

Then the creatingof an inventory of necessary office equipment and supporting equipment isimportant which includes the hardware, software and furniture needed.  2.3 Determining threats and risksBetan (2010) states the first step to IT riskassessment (RA) is to identify threats applicable to the organisation, estimatethe probability of each occurring, then assess the impact on the organisationand define measures to be taken when this occurs. Kirvan (2011) also advisesthat a RA and business impact analysis (BIA) will identify threats and theirimpact on “critical business activities”. OBoyle (2017) however advises to listinternal and external threats to assets then “imagine the worst case scenario”.After this is done the probability of this threat occurring and its impact isassessed. This is done to determine how the probability and impact of a threat willaffect business continuity giving DR team members a better idea of how todevise a DR procedure.

Mishra (2012) determines that “a comprehensive riskassessment activity” is essential to organisations to identify threats and theextent of damage they can cause and the measures to be taken to prevent them.There is also firm belief in the benefit of conducting a “good risk analysisand assessment” and how it will assist in “prioritizing recovery plans based onthe criticality of the functions.”  2.4 Defining the RPO and RTOTwo important measures which should be taken intoaccount when formulating a DR strategy is the Recovery Point Objective (RPO)and Recovery Time Objective (RTO). Harnedy (2016) defines RPO as “the point in time to go back to andretrieve clean versions of files” and RTO as “the minimum amount of time itwill take to get an organisation back up and running following a disaster.

” EvolveIP (2015) also agrees that RPO is the “point in time” from which data is to berecovered from storage for a business to continue operating which can rangefrom minutes, hours or days and reflects the amount of data a business canlose. This determines the frequency of performing backups or how data isstored. While RTO is the “maximum length of time” a system can be inoperableafter a failure.

This will be dependent on how critical this asset orapplication is to the business. Hertvik(2015) declares that “anRPO is a design spec, not a statistic” because a DR solution should be built tomatch the RPO. If the RPO is 30 seconds, then if the system fails the solutionselected must at max only allow 30 seconds of data to be lost. He also sees RTOas a statistic and should be used as a “goals for restarting the system on abackup machine or partition”. Achievingan RTO and RPO as close to zero as possible is ideal but the cost for this”would be extremely expensive and might not be worth the effort” (Puricica,2017) He advises that “dividing applications and services into different tiers”then setting the RTO and RPO based on “the service-level agreements (SLAs) theorganization committed to” is a common method. These tiers will consist ofmission, business and non-critical applications with each tier having a differentRTO and RPO. Effort and resources will be placed on applications according tothe tier they are categorized in.

Kirvan (2011)advises combining the values obtained for assets, threats, RPO and RTO andcreating a table to determine the organisation’s DR strategiesFigure 2.1 Showing a method for determiningDR strategies  2.5 Types of ICT usedin DR2.5.1 BackupSullivan (2015) defines backup as the”copying of files to anotherdisk” which can be done “through atape backup, a secondary computer or server, or a cloud hosted backup solution”.

Oracle (2015) further states that automated backupsof databases and applications allows for the DR site to “keep standby versionsof the main production environment running”. This can be done for “reportingand other read-only workloads such as ad-hoc queries and data extracts”. Online Tech (2016) however offers a morecomprehensive solution noting the choice of the right hardware and softwarebackup procedures, also scheduling and implementing, and testing for accuracyis what is involved in a data backup plan. It is also stated that storingbackups offsite is considered a best practice as well as replication anddeduplication “(elimination of duplicate or repeating data)”. If an offsitebackup provider is utilized then a business can take advantage of the”investment in capital, technology and expertise” offered.

Many backup methods while being “generallyinexpensive and convenient”, will not enable quick data recovery when a failureor disaster occurs. Backup “only ensures that the data is stored somewhere andcan be accessed – eventually”. Sullivan (2015) Therefore any backup proceduremust be combined with a restore method for it to be effective in recoveringsystem data. 2.5.2 Cloud “More businesses are using cloud services for storageand backup” and thus reducing costs and taking “advantage of the resourcesavailable from these providers”. (Dubois, 2011) Cloud backup can come in two forms which is localbackup that replicates the data to the cloud using software, this creates a copylocally for “fast restores while older data is sent to the cloud”.

 The other form are “backup applications thatsend data directly to the cloud through software installed on the server”. This’direct-to-cloud’ backup however does not keep a local copy of data. (Crump,2015)LeBlanc (2016) provides the option of determining acloud solution by one that “provides public cloud infrastructure, cloud basedrecovery and the needed expertise.” Automation of the process is deemed themost important because it provides a “combination of low costs, reliability andpredictability” and also eliminates the probability of human error occurring.

 Testing of the process is also essential toensure data is recoverable. Cloud recovery solutions however, require backed updata to be downloaded from the cloud. If this is done on slow internetconnections, then this process can take a long period of time. If cloud ischosen then, “it’s imperative a solution is found that enables data to be   recovered from the cloud in a few minutes”.(Alam, 2015)With the adoption of Microsoft SharePoint to hostdata for Company X a recovery solution can entail “putting a fully operational datafarm into production using computer resources that are located in a data centre”that would not be affected by the failure of SharePoint data centres. (TechNet,2015) Therefore there must be a data centre with the capacity to hold andrestore the data contained in Microsoft SharePoint.MicrosoftSharePoint is valued as a “business critical platform” in the Company X and itis advised that agreements between organisations and the provider where DR isconcerned “should be very strict in regards to response time of an incident”.

(O’Connor, 2015) 2.5.3 VirtualisationVirtualisation is defined by Tsai (2017) as the “technologythat allows the simultaneous executing of multiple operating systems on asingle, physical computer”. Resources are divided into different VMs withprograms that can communicate directly to the computer hardware or throughapplications.VPN can also be used for accessing data andapplications after a disaster by placing some PCs at a Disaster Recovery site,but having the majority of users connect to the system by VPN connections fromhome computers, laptops or remote sites. (Rosenberg, 2010)Tarzey (2014) states that “disaster recovery is notpossible without backup”.

Virtualisation is seen as a great way to do this andincreases the number of options. Data will be “easily backed-up as part of animage of a given virtual machine (VM).” This will eliminate the need toreconfigure or rebuild a physical server since “the VM can be recreated in anyother compatible virtual environment.

” Which may be a local solution or thirdparty provider which will eliminate the costs of having redundant systems.2.5.4 ReplicationReplication is the “copying of data from a host computer toanother computer at a remote location.

” This establishes “redundant copies” ofdata and when combined with “deduplication, the cloud or virtual servers thistechnology can fulfil its role in recovery.” (Potts, 2013)Rosenberg (2010) further advises that a majorconsideration with replication is bandwidth requirements, this should be takeninto account because “when data is being copied from site to site significantbandwidth is consumed.” It is important to plan and ensure that DR replicationdoes not consume “Internet access, bandwidth or WAN links and compete withexisting applications.” Replication solutions for DR fall into two categoriesSynchronous which means that “changes made on disk at the primary site can’tproceed until that data is changed on the disk at the DR site.” Asynchronousdata by contrast means that “the disk at the primary site can acknowledgechanges before they are replicated to the DR site”. (Oracle, 2010) It isfurther stated that DR requirements can be met with asynchronous datareplication if it is performed at a time that is less than the RPO.

It isfurther stated that synchronous replication can be costlier than asynchronous,these two methods however, can be combined to provide an adequate DR solution.Another option for replication is full replication inthe Cloud as suggested by Chan (2017) which allows an organisation to” havemultiple environments actively running on-site and in the cloud.” The localdata is replicated to the cloud and if there is a failure then employees canthen be redirected to the cloud.

The cost of this method however, depends “onthe amount of cloud infrastructure deployed and how much traffic is routed toit”.  Recovery Sites With replication the main consideration is what type of DRsite is manageable by an organisation. Rosenberg (2010) refers to these sitesas “Emergency Operations Centres, where DR personnel will gather to execute theDR plan. Mention is made of three type of sites which include a “Hot Site”containing live communication links, working systems and real data, ready to beoccupied. A “Warm Site” contains live communication links and some hardware,but requires the retrieval of software or data from tape or other forms ofmedia. A “Cold Site” is a facility where staff can go to when a disaster isdeclared and may not have any software, hardware or data.

The different type ofsites determines the ease and speed of recovering original systems with hotsites being the most ideal and cold the least.Online Tech (2016) also make mention of the differenttypes of recovery sites but also emphasizes that the strategic design andlocation of any site is important, “the geographical location should not be tooclose to avoid both sites experiencing the same disaster” also placement ofservers, available bandwidth and site access should also be considered.Chapple (2012) determines that the “more spent, thegreater the capability of a site to quickly resume operations.” This means thatcost will often determine what DR site an organization can invest in. SegueTechnologies (2013) also agrees that a hot site mirrors the datacentreinfrastructure and the main data centre and the production (DR) site must “runconcurrently”.

Whereas a cold site is “essentially office or datacentre spacewithout any server-related equipment installed”. A warm site on the other handis seen as the “middle ground” where it combines what is offered by the coldsite and some of the features of a hot site. 2.5.5 Disaster Recovery as a Service (DRaaS)Hefner and Gibilisco (2017) definesDRaaS as the “replication and hosting of physical or virtual servers by a thirdparty to provide failover” in the event of a failure or disaster. Therefore allservers are replicated by a provider so that they can be utilized if servers atthe primary site fails.  Taylor (2017) however believes thatcloud based failover is the main benefit of DRaaS whereby “failover may beautomated or manual, and it may be based in the public cloud or the serviceprovider’s cloud.” This means that the “hot site” is now in the cloud.

A mainadvantage as expressed by Taylor (2017) is that applications can failoverimmediately and users can reconnect “via VPN or Remote Desktop Protocol (RDP),and failback to rebuilt servers in the customer’s data centre.”  Ohlhorst (2016) provides a checklist that business owners shouldexpect as the basic minimum requirements of DRaaS providers which includes theability to automatically backup critical systems and data, the ability torecover from a disaster quickly and with minimal user interaction. Flexibilityin recovery options such as restoring the whole system or single applications.

Businesses must know what they are paying for upfront with no hidden oradditional charges appearing overtime.The choice of DRaaS providers can be difficult forcompanies, some companies offer benefits in the support, ease of configuration anda wealth of technology and resources available to their hosts in their documentationlibrary. (Posey, 2016)2.6 DR procedure/policyAs previously stated the DRprocedure will take into account all company assets, threats and risks and awell-documented strategy to recover these assets. There are also other factorsto take into account while developing a DRP which includes the key personnelinvolved in its maintenance and creation and testing of the developed plan orprocedure.

There aremajor benefits of organisations having a disaster recovery solution whichinclude having up to date asset and inventory documentation and task redundancyby having more than one person able to do a task. (Morgan, 2015) It will alsodecreases downtime and aids an organisation in recovering after disasters bymaking them more prepared for them. (Brodeur, 2014)Personnel One important part of the DR procedure that can beoverlooked is the key personnel who will be involved in developing and executingthe procedure. Along with “corporate-wide awareness”, training ofkey staff is an “important ingredient of an on-going, pro-active approach” tocreating a DRP.

These key personnel include senior management, functionalmanagement (supervisors and HODs) and the DR team also awareness training forthe general employee population. (Diez and Iyer, 2007) Although Lady (2009) agrees with key personnelincluding management and the DR team, he advises the inclusion of technicalstaff as key personnel to maintain the DRP by performing basic daily tasks to”database administration and release management.”There is however, a risk of knowledge loss withemployee turnover, Rentsys (2016) warns this can be combatted with the properdocumentation including updates of a DR plan “any time a business experienceschanges in objectives, technology or strategies.” This is important toeliminate any changes in personnel “responsible for executing key recoverysteps.”  Documentation andtesting Having a DR plan is not the end of the process, theplan must be maintained, this involves testing and documenting changes andimprovements to the plan to ensure it is up to date and relevant to thebusiness it was developed for. Cisco (2008) agrees that a testing process should bedeveloped to ensure the DR plan is effective and efficient and the frequency oftesting established. Testing at least annually assists in finding weaknesses inthe plan and defines the role of the required personnel for a successful DRplan. Howlett (2010) proposes a range of test from “simpleto more involved.

” Simple tests involve a check list to ensure “every item isin place” another test a walk-through is going through the plan with all keyplayers. More involved tests include technical tests where systems are testedin “real-life situations”.Best practices in DRTo ensure the successand effectiveness of a DR plan organizations should follow best practices toavoid any inadequacies. ·        Keepprimary and secondary sites at a safe geographical distance. If the firm hasinvested resources in a secondary site or data centre it should be located inan area where a disaster affecting the primary location will not affect it.This will include a location on a different electrical grid and within an areathat is not prone to the same natural disaster. (McBeth, 2014) ·        Testthe plan in a real scenario, therefore a full test of the plan must bescheduled to ensure all contingencies and recovery strategies are effective.

Training of employees in DR is also important to ensure staff is well trainedon DR planning and procedures. (Hamilton, 2013) ·        Identifythe mission critical systems in the organization and the people involved inthese systems to prioritize resources. (Chirgwin, 2017)·        Anychanges made to the organization must be reflected in the DR plan. The additionor deletion of any equipment must be updated in the plan and then this mustthen be tested for effectiveness.

(McBeth, 2014)·        Ensurethere is redundancy of all mission-critical systems. Organisations should notfocus on one recovery solution but ideally have more than one solution for ascenario. (Chirgwin, 2017)·        Remoteaccess technology if employed, should be tested and employees trained in using itbefore a disaster strikes. (Hamilton, 2013) 2.7 ConclusionDeveloping DR strategiesor procedures is a daunting task for any business, following a tested andsuccessful implementation will make this task more manageable. Taking intoconsideration best practices of DR planning will ensure that fewer errors areexperienced when it is formulated, however each plan will have its uniquequalities based on each business. The general procedure often determines thefirst task should be to verify and quantify assets, threats and risks. Theproper procedures to secure this data, software and hardware must also bedeveloped with specific details on personnel and tools to be utilized torestore assets.

A DR plan should also cater for the training and educating ofkey personnel and employees on how to execute duties and responsibilities forefficient and speedy recovery. A major and often overlooked feature of DR plansis testing, maintaining and documenting the process to discover any errors andto document steps to ensure personnel executing the plan face no difficulty inits execution. The various forms of ICT used for DR strategies include virtualisation,the use of cloud storage and backup procedures.

These technologies have improvedand made the development of DR strategies and thereby the development of a DR planmore attainable and realistic to businesses.