In the brutal world of ransomware attacks, having a reliable and isolated backup is the ultimate trump card that allows you to stand up from the table and tell the blackmailers: “We don’t need your keys. We have our own.” It’s absolutely the last and most important line of defense that separates a controlled incident from a disaster that could destroy your business. Without a working backup, you are completely at the mercy of cybercriminals.
But in an operational technology (OT) environment, creating and maintaining an effective backup strategy is much more complicated than in the IT world. We’re dealing with dozens of device types from different manufacturers, legacy operating systems and, most importantly, an absolute business continuity priority that makes a simple “turn off and make a copy” often impossible.
Many engineers and managers rely on undocumented, hand-made copies of PLC designs, stored on local drives or unsecured file servers. Such a strategy when juxtaposed with modern, aggressive ransomware is unfortunately an illusion of security. True resilience requires a systematic, automated and, crucially, regularly tested process. Let’s analyze what such a process should look like in the context of three typical attack scenarios.
Shortcuts
- Why, in the face of a ransomware attack, is backup your last and most important line of defense?
- What key indicators (RTO/RPO) should be defined for the critical production line?
- How often should SCADA projects and PLC programs be backed up to comply with IEC 62443?
- Is it safe to store backups of OT systems in the public cloud?
- What is the “air-gapped backup” strategy and how to implement it in an automated manner?
- What tools allow you to perform a “hot” backup of a Windows 7 HMI station without interrupting your work?
- How to effectively isolate the backup server from the rest of the network to protect it from ransomware?
- Do traditional LTO tapes still make sense in modern backup strategies for OT?
- Why is a backup without a tested restoration procedure worthless?
- How to regularly and securely test the integrity and recoverability of backups?
- Are there automated tools that allow you to recreate a PLC in a matter of minutes?
- What does a disaster recovery scenario for an entire production line look like in practice?
- How does nFlo design and implement reliable backup and recovery systems for critical industrial environments?
Why, in the face of a ransomware attack, is backup your last and most important line of defense?
Preventive systems - firewalls, antivirus, training - are like walls and guardians of your digital fortress. Their job is to keep the enemy from entering. But history teaches that even the most powerful walls can be breached at some point. Backup is like secret underground shelters and warehouses where you store everything you need to rebuild your kingdom after an invasion.
Once ransomware gets inside and encrypts your running systems, all prevention ceases to matter. The attacker is already inside and in control of the situation. At this point, the only thing that gives you independence and choice is to have a clean copy of your critical systems and data intact by the attack.
Having a reliable backup allows you to ignore ransom demands with complete peace of mind. Instead of entering into risky negotiations with criminals, you can focus all of your team’s energy on a methodical recovery process. This is a fundamental shift from being a victim to being an organization that controls the process of returning to normalcy. Without backup, you have neither of these options.
📚 Read the complete guide: Backup: Zasada 3-2-1 i najlepsze praktyki backupu
What key indicators (RTO/RPO) should be defined for the critical production line?
Before you even start choosing a technology for backup, you must, in collaboration with the business, define two key metrics that will be the foundation of the entire strategy. The Recovery Point Objective (RPO) answers the question, “What is the maximum amount of data we can lose?” If the RPO for a SCADA project is 4 hours, this means that backups of that system must be made at least every 4 hours.
The Recovery Time Objective (RTO) answers the question, “How quickly do we need to get the system back up and running after a disaster?” If the RTO for a key production line is 2 hours, all your technology, people and procedures must be designed to enable you to restore control of that line in under two hours.
The definition of RTO and RPO is not a technical decision, but a business decision. It must be derived from the Business Impact Analysis (BIA). There will be different requirements for a system controlling a million-dollar-per-hour process, and different requirements for a data archiving system. Precisely defining these metrics is the first step to designing a strategy that is relevant to real needs, not over- or underestimated.
How often should SCADA projects and PLC programs be backed up to comply with IEC 62443?
The frequency of backups is a direct derivative of the defined RPO. However, IEC 62443, the standard in OT security, gives us additional important guidance here. This standard requires an organization to have and regularly test procedures for backing up key data and configurations.
It is good practice, in line with the spirit of the standard, to vary the frequency depending on the type and criticality of the system. SCADA/HMI server designs and configurations that change relatively infrequently can be backed up once a day, for example. PLC programs and configurations that are modified only during maintenance work should be backed up after each authorized change.
Conversely, historical and production data that changes in real time should have an RPO of minutes or hours, depending on its criticality to the process and compliance with quality standards. The key is to automate this process to eliminate the risk of human error.
Is it safe to store backups of OT systems in the public cloud?
Storing backups in the cloud (e.g., Microsoft Azure, AWS) is becoming increasingly popular and offers many advantages, such as scalability and availability. However, in the context of OT, it is a solution that requires very careful consideration and the implementation of additional safeguards.
The main risk, of course, is the security of the connection itself and the cloud platform. All data sent to and stored in the cloud must be strongly encrypted, both in transit and at rest. Access to the backup tray must be protected by multi-factor authentication (MFA) and restrictive access policies.
But even with the best security, relying solely on the cloud is risky. A key principle of resilience is to have multiple, independent copies. That’s why the best strategy is a hybrid approach: having one copy locally, on-site (which ensures fast recovery), and a second, additional copy in an external, secure location - which could be the cloud.
What is the “air-gapped backup” strategy and how to implement it in an automated manner?
The most important feature of a reliable backup in the context of ransomware is its isolation. Modern, aggressive strains of ransomware actively seek out backup servers on the network and attempt to encrypt them first to cut off the victim’s escape route. Therefore, it is crucial that at least one backup is stored in a way that is physically or logically cut off from the main production network.
Traditionally, this was accomplished using tape media (LTO), which was physically removed and stored in a safe after writing. This is still a very secure, albeit slow, method. A more modern approach is logical air-gap, which can be automated. It involves configuring the backup server to initiate a connection, push the data to an isolated repository (e.g., on a server at another location or in the cloud), and then immediately close the connection.
There are also so-called “immutable storage” (immutable storage) technologies. Even if an attacker takes control of the backup server, he won’t be able to overwrite or delete the once-stored copy for a certain period of time (e.g. 30 days). This is an extremely powerful defense mechanism.
What tools allow you to perform a “hot” backup of a Windows 7 HMI station without interrupting your work?
One of the biggest challenges in OT is the backup of running systems, especially older ones based on Windows XP or Windows 7. Stopping such an HMI station to make a cold disk copy is often unacceptable. The solution to this problem is “live” disk imaging (hot imaging or live backup) tools.
They use snapshots technology, such as the Volume Shadow Copy Service (VSS) in Windows. It allows you to “freeze” the state of a file system for a few seconds, make a consistent copy, and then “unfreeze” it, while the system and applications continue to work normally all the time. The entire process is transparent to the operator and causes no downtime.
There are many commercial and free tools that can do this. Choosing the right one depends on the specifics of your system. The key is to test the chosen tool in a lab environment before production deployment to make sure it is 100% compatible with our SCADA/HMI software.
How to effectively isolate the backup server from the rest of the network to protect it from ransomware?
The backup server is your most valuable resource in a moment of crisis and must be protected like a vault. The absolute bottom line is its strict network isolation. This server should be in a separate, dedicated network segment (VLAN), protected by restrictive rules on the firewall.
Communication with the backup server should be kept to an absolute minimum. It should be able to initiate connections to the systems from which it retrieves data, but traffic in the other direction (from the production network to the backup server) should be blocked by default. Administrative access to the server should be possible only from a few, trusted workstations and protected with MFA.
Moreover, the backup server should never be attached to the same Active Directory domain as the rest of the systems. Attackers who gain control of a domain controller often use it to spread ransomware to all connected machines. Keeping the backup server in a separate workgroup or domain significantly hinders this attack vector.
3 crisis scenarios and the role of backups
ScriptDescription of the incidentHow does a reliable backup save the situation?1. ransomware on the HMI stationThe attacker encrypts the operator’s computer disk, paralyzing control of one machine/line.We restore the entire HMI station system from a clean, full image (image-level backup) in a matter of minutes.2 SCADA server failureA key surveillance server suffers a hardware failure or its database is corrupted.Restore the entire server virtual machine on new hardware or recreate the database itself from the last consistent copy.3. sabotage of the PLCThe attacker remotely uploads a malicious or corrupted program to the PLC, causing the machine to malfunction.We stop the machine, connect the engineering station and upload the latest trusted version of the program from the repository to the controller.
Do traditional LTO tapes still make sense in modern backup strategies for OT?
In the age of high-speed drives and the cloud, magnetic tape media (LTO) may seem like an archaic relic. But in the context of ransomware protection, they are experiencing a renaissance and still make great sense as part of a multi-layered strategy.
Their greatest advantage is the ability to create a true physical “air gap. ” Once the data is written, the tape is physically removed from the drive and stored in a secure location (e.g., in a safe, in another location). No ransomware, even the most advanced, can encrypt data on a tape that is not physically connected to any system.
Of course, data recovery from tapes is much slower than from disks. Therefore, they should not be the only medium. The ideal strategy, known as “3-2-1, ” is to have at least 3 copies of data, on 2 different types of media, with 1 copy stored offline/off-site. The combination of fast, disk-based local backups (for quick recovery) with slower but ultra-secure backups on tape or in the cloud (for disaster recovery) is the gold standard today.
Why is a backup without a tested restoration procedure worthless?
This is the most important and most often ignored rule. Having backups that we have never tried to restore is like having a spare wheel on a car for which we don’t have a key. It gives us the illusion of safety, which turns into a disaster when the real need arises.
There are a thousand reasons why a backup can fail: damaged media, software incompatibility, human error in the procedure, an incomplete copy. The only way to make sure our “insurance policy” is valid is to test it regularly.
A restore test is a process during which, in a controlled, isolated environment (e.g., on virtual machines), we attempt to restore a system from a backup and verify that it boots and works properly. Only a successful outcome of such a test gives us real confidence that we will be able to save our production on the day of a crisis.
How to regularly and securely test the integrity and recoverability of backups?
Restoration testing should be an ongoing, scheduled part of the backup system lifecycle. A good practice is to conduct a full restoration test for each critical system at least quarterly or semi-annually.
In order for these tests to be safe and not disruptive to production, they must be conducted in an isolated test environment (sandbox). We create a virtual network, cut off from the production network, and it is in this network that we try to replicate the SCADA server or HMI station. We verify that the system boots, that the application works and that it logs in correctly.
In addition to full restoration tests, many modern backup systems offer automated integrity verification. After each backup, the system can automatically “run” it in the background on a virtual machine, take a screenshot of the login screen and send us an e-mail confirming that the backup is consistent and runnable.
Are there automated tools that allow you to recreate a PLC in a matter of minutes?
Yes. The market for software to manage OT environments has grown tremendously. Today, there are dedicated commercial platforms for central management, backup and restoration of programs for PLCs from various manufacturers.
These tools can automatically connect to controllers on the network on a scheduled basis, download the current version of the logic and configuration, and then compare it to the latest trusted version in a central repository. If an unauthorized change is detected (which can be a sign of an attack), the system immediately alerts the operator.
In the event of a failure or the need to replace the controller, the platform allows the last good version of the program to be uploaded to the new device in no time. This drastically reduces the time required for restoration and minimizes the risk of human error. An investment in such a tool, which can be funded by a grant, is key to achieving a low RTO for the control layer.
What does a disaster recovery scenario for an entire production line look like in practice?
Restoring an entire, complex production line is a complex operation that must be described in a detailed Disaster Recovery Plan (DRP). This step-by-step plan describes the sequence of operations.
It usually begins with restoring the network infrastructure (if it has been compromised). Then, central supervisory systems, such as SCADA servers and historical databases, are restored from “clean” backups. In the next step, the logic on all PLCs comprising the line is verified and possibly restored.
The final, and often longest, stage is recalibration and testing. Engineers and operators must step by step, in service mode, check the operation of every machine, every sensor and every manipulator before the line is put into automatic mode and resumes normal production.
How does nFlo design and implement reliable backup and recovery systems for critical industrial environments?
At nFlo, we understand that in an OT environment, backup reliability is absolutely fundamental. Our approach to designing a Business Continuity and Disaster Recovery (BCDR) strategy always starts with a thorough understanding of your processes and business requirements. We conduct workshops where together we define realistic and adequate RTO and RPO metrics for your critical systems. We select and implement technologies that are not only state-of-the-art, but most importantly proven and reliable in industrial settings. We design multi-tiered architectures, following the “3-2-1” principle, which combine the speed of disk backups with the security of isolated offline copies. A key element of our service, however, is not only implementation, but also assistance in creating and testing restoration procedures. We help you build a system that you can have real, field-verified confidence that it will work exactly as it should on the day of a crisis.
Related Terms
Learn key terms related to this article in our cybersecurity glossary:
- Ransomware — Ransomware is a type of malicious software (malware) that blocks access to a…
- Backup — Backup, also known as a backup copy or safety copy, is the process of creating…
- Cybersecurity — Cybersecurity is a collection of techniques, processes, and practices used to…
- Cybersecurity Incident Management — Cybersecurity incident management is the process of identifying, analyzing,…
- Disaster Recovery — Disaster Recovery (DR) is a set of processes, policies, and procedures aimed at…
Learn More
Explore related articles in our knowledge base:
- Backup Microsoft Entra ID: Why Identity Protection Is Essential Today
- Business Continuity (BCP/DR) in the era of cyber attacks: How to survive a ransomware disaster?
- Cyberinsurance: How to select cyber attack insurance for a company?
- How much does downtime really cost after a cyberattack? A ready-made template for calculating your company’s losses
- Office 365 Backup
Explore Our Services
Need cybersecurity support? Check out:
- Security Audits - comprehensive security assessment
- Penetration Testing - identify vulnerabilities in your infrastructure
- SOC as a Service - 24/7 security monitoring
