With cyber attacks on the industry on the rise and regulatory pressure from the NIS2 directive, more and more companies are realizing the need to have a formal Incident Response Plan (IRP) for their operational technology (OT) environments. Many already have plans in place and functioning well for their IT infrastructure, often based on reputable standards such as those published by the National Institute of Standards and Technology (NIST).
The temptation to take shortcuts is great. It seems logical to take an existing, proven plan from IT, change a few names in it and declare it valid for the factory as well. However, this “copy-paste and adapt” approach is one of the most serious strategic mistakes that can be made. It leads to the creation of a document that looks professional in theory, but in practice, at the time of a real crisis, will prove to be completely useless, or worse, its procedures will escalate the problem and cause physical damage.
The OT world, as we have repeatedly pointed out, is governed by different laws. The differences in priorities, technology and risk tolerance are so fundamental that they require a dedicated IRP plan written from the ground up. This plan must be built on a foundation of deep understanding of industry processes, not just IT processes.
Shortcuts
- Why is “copy-paste” the worst strategy when creating a response plan for the OT?
- What is the incident response lifecycle according to the NIST standard?
- Phase I - Preparation: Why is this phase more important in OT than all the others combined?
- Phase II - Detection and Analysis: What signals in the OT network indicate an attack that an IT analyst will not see?
- Why do you need a process engineer to analyze an incident in OT, not just a security analyst?
- Phase III - Restraint: What are process-safe alternatives to immediate isolation?
- What is a safe stop procedure and why must it be defined in advance?
- Phase IV - Elimination: Why is it more difficult to remove a threat from a SCADA system than from a file server?
- Phase V - Restoration: What unique challenges does restoring an entire production line present?
- Why does computer forensics in embedded systems require completely different tools and techniques?
- Phase VI - Lessons learned: What lessons from an OT incident are not found in IT textbooks?
- Key modifications to the IRP plan for the OT environment
- What are the three key modifications that need to be made to an IRP plan with IT to make it work in OT?
- How does nFlo help you create and test IRP plans that realistically work in your production environment?
- Is your incident response plan a shield to protect the company, or just a document on a shelf?
Why is “copy-paste” the worst strategy when creating a response plan for the OT?
The “copy-paste” strategy fails because it is based on the false assumption that an OT network incident is simply a “different flavor” of an IT network incident. In fact, the differences are so profound that they touch every phase of the response process. The plan from IT is optimized to protect data and office systems. Its procedures, tools and end goals are tailored for this environment.
Trying to apply the same procedures at the factory leads to absurd and dangerous situations. For example, a plan from IT may assume that a binary copy of the infected server’s disk must be made for post-hack analysis. This is standard procedure in IT forensics. However, trying to shut down a SCADA server for several hours to make such a copy is operationally impossible in a running factory.
More importantly, the plan from IT fails to address the most important variable in the OT equation: physical security and process continuity. Its procedures do not answer the key questions: “How will our actions affect the machines? Won’t isolating this segment cause uncontrolled robot behavior? What is a safe procedure for stopping this production line?”. Failure to answer these questions makes a copied plan useless at best and dangerous at worst.
📚 Read the complete guide: OT/ICS Security: Bezpieczeństwo systemów OT/ICS - różnice z IT, zagrożenia, praktyki
What is the incident response lifecycle according to the NIST standard?
To understand exactly where the differences lie, it’s useful to examine the standard, widely accepted incident response lifecycle, defined, for example, in NIST Special Publication 800-61. This model divides the entire process into several logical, consecutive phases. In the IT world, it forms the basis of most IRP plans.
The life cycle according to NIST typically includes the following stages: I. Preparation - that is, all actions taken prior to an incident. II. Detection & Analysis - That is, how we learn about an incident and how we assess its scale. III. Restraint (Containment) - That is, how we limit the spread of the threat.
Once the situation is under control, the **IV. Elimination (Eradication). ** - That is, removing the threat from our network. The next step is V. Recovery - that is, restoring systems to normal operation. The whole thing ends with the phase **VI. Post-Incident Activity. ** - That is, learning lessons for the future. Analyzing each of these phases through the lens of the specifics of OT, we will see why a dedicated plan is absolutely necessary.
Phase I - Preparation: Why is this phase more important in OT than all the others combined?
In IT incident response, the preparation phase is important - it includes creating a plan, training the team and preparing the tools. But in OT, the preparation phase is absolutely fundamental and more important than all the other phases combined. It is here, in calm conditions, that all the “magic” takes place to avoid a disaster in the future.
Preparation in OT is first and foremost, as described in the previous article, a process of negotiation and risk decision-making. It is at this stage that the IT, OT and business teams must jointly determine what the company’s “crown jewels” are (the most important processes), what the maximum tolerable downtime is, and what the pre-approved response procedures are for various scenarios. This is where an integrated team is formed and communication is practiced.
In IT, many decisions can be made dynamically during an incident. In OT, this would be too risky. The preparation phase is designed to reduce improvisation as much as possible. The goal is to create such detailed playbooks and procedures that at the moment of a crisis, the response team does not have to wonder “what do we do?” but can go straight into precise, practiced action.
Phase II - Detection and Analysis: What signals in the OT network indicate an attack that an IT analyst will not see?
In the IT world, incident detection is mainly based on analyzing logs from security systems. The SOC analyst looks for signatures of known attacks in network traffic, alerts from antivirus systems or failed login attempts. These are Indicators of Compromise (IoC) typical of the data world.
In the OT world, these indicators are also important, but often insufficient. An attack on industrial systems may not generate any typical security alerts, but may manifest itself in an entirely different way - through anomalies in the physical process. Small but unusual fluctuations in pressure, unexpected increases in motor temperature, minimal delays in sensor response - these can be the first, subtle signals that someone is tampering with the control system.
An IT analyst, looking at network traffic, may not notice anything of concern. But a process engineer who knows his installation inside out will immediately recognize that “something is wrong.” Therefore, effective detection in OT requires the integration of security monitoring tools (network traffic analysis) with process monitoring systems (data from SCADA and historian systems).
Why do you need a process engineer to analyze an incident in OT, not just a security analyst?
Once an anomaly is detected, there is an analysis phase to understand what is happening, how serious the situation is and what the potential impact is. In IT, a security analyst is able to do most of this work himself. He or she can analyze malware, trace network traffic and assess which servers and data have been compromised.
In OT, a security analyst acting alone is helpless. He can identify that unusual Modbus commands are being sent from an unknown IP address to the PLC. But he can’t answer the most important question, “What are these commands doing to the physical process?” Is it an attempt to change the speed of the centrifuge? Is it an attempt to open the safety valve? What will be the effect of this?
Only a process or automation engineer knows the answer to this question. That’s why OT incident analysis must be a team effort. The security analyst provides the information “what’s going on in the network,” and the OT engineer translates that information into “what it means for machines and people.” Only by combining these two perspectives can the priority and severity of an incident be properly assessed.
Phase III - Restraint: What are process-safe alternatives to immediate isolation?
It is in the containment phase that the conflict of priorities becomes most apparent. As we already know, the default strategy in IT is to isolate infected systems as quickly as possible. In OT, this strategy is often unacceptable. An IRP plan for OT must therefore include a range of alternative, less intrusive containment strategies.
The first alternative is “increased vigilance” and monitoring. Instead of immediately blocking traffic, we can redirect it for detailed analysis and keep a close eye on the attacker’s activities while preparing for more decisive steps. This gives us time to understand his goals without causing downtime.
The second, and most commonly used strategy, is to precisely filter traffic. Instead of cutting off an entire segment, we can implement very granular rules on the firewall that block only specific malicious traffic (e.g., a connection to the attacker’s C&C server), while allowing the rest of the process to continue legitimately. However, this requires advanced tools and a deep knowledge of industry protocols.
What is a safe stop procedure and why must it be defined in advance?
In certain extreme scenarios, the only effective way to stop an attack that threatens physical security may be to stop the process altogether. However, as we already know, this cannot be done by simply “pulling the plug.” Every complex industrial process has a procedure for a safe, controlled stop (safe shutdown).
The procedure is a sequence of steps that must be performed in the correct order to safely extinguish the process without risk of equipment damage or accident. It can include gradually reducing the temperature, draining the reactors, placing the robot arms in the service position, etc. It is a process that can take anywhere from a few minutes to several hours.
The incident response plan must reference or even integrate these procedures. For each critical system, the plan must clearly define what the safe procedure is for stopping it and who has the authority to initiate it. This knowledge must be available to the response team during a crisis to avoid a catastrophic panic response.
Phase IV - Elimination: Why is it more difficult to remove a threat from a SCADA system than from a file server?
In the IT world, the elimination phase often involves removing the malware with an antivirus or, in a more radical case, wiping the disk completely and restoring the server from a clean, trusted backup (re-imaging). This is a relatively quick and standard process.
In the OT world, eliminating the threat is much more complicated. First, it is not possible to install any anti-virus software on many embedded devices, such as PLCs. Second, “re-imaging” a controller is not the same as restoring a server. It requires uploading all firmware and control logic from scratch, a complicated and risky operation.
Often, the only real way to eliminate the threat from a PLC is to physically replace it with a new, clean device. If a SCADA server is compromised, restoring it from backup is also much more difficult than in IT, as it requires not only restoring the operating system and applications, but also restoring proper communication with hundreds of devices on the network.
Phase V - Restoration: What unique challenges does restoring an entire production line present?
The restoration phase in IT involves restoring data from backups and making services available to users again. This is a process that can be largely automated and done relatively quickly. In OT, restoring after a major incident is a mammoth undertaking that can take days or weeks.
Restoring an entire production line is not just a matter of restoring digital systems. After restoring SCADA software and controller logic, it is necessary to recalibrate and test all machines and physical processes. It must be verified that robots are moving along the correct paths, that sensors are showing the correct values and that valves are opening at the correct pressure.
This is a process that requires close collaboration between IT, OT and maintenance teams. Every component must be inspected and tested before production can safely resume. Rushing through this phase can lead to the production of defective batches of product, or worse, a mechanical failure shortly after a restart.
Why does computer forensics in embedded systems require completely different tools and techniques?
After an incident, it is crucial to understand how it happened, what exactly the attacker did and what data he stole. This process, called digital forensics, in IT is based on analyzing images of hard drives and RAM. In OT, where many devices do not have hard drives and their memory is ephemeral, traditional techniques are useless.
Post-intrusion analysis of a PLC or other embedded device must rely on other sources of evidence. Records from network monitoring systems that captured all communications to and from the attacked device become crucial. Analysis of these records makes it possible to reconstruct what commands the attacker was sending.
Another source is logs from host systems, such as SCADA servers and historian, which can contain information about changes in process parameters. Computer forensics in OT is a highly specialized field that requires a deep understanding of industrial protocols and control logic, not just file systems or the Windows registry.
Phase VI - Lessons learned: What lessons from an OT incident are not found in IT textbooks?
Every incident is a painful but valuable lesson. The post-incident action phase, where we analyze what went right and what went wrong, is crucial for continuous improvement. The lessons from an OT incident often go far beyond the typical lessons from the IT world.
In addition to the standard technical lessons (“we need to patch our systems better,” “we need better passwords”), the OT incident teaches first and foremost about the interaction between the digital and physical worlds. It makes you realize how critical collaboration between IT and OT is, how important it is to have rehearsed procedures, and how dangerous decision paralysis can be.
The most important lesson is often humility and an understanding that in the industrial world, it is not just data that is at stake. An incident that in IT would have ended in financial losses, in OT could have been one step away from causing a disaster. These lessons must lead to real changes in the culture, procedures and architecture of the entire organization.
Key modifications to the IRP plan for the OT environment
Element of the PlanTypical approach in ITEssential modification for OTTop priorityProtecting Confidentiality and Integrity of Data (CIA).Protecting physical security and process continuity (Safety).RestraintRapid isolation of infected systems (“disconnect from the network”).Use of pre-negotiated, process-safe methods (e.g., traffic filtering).Key expertsSecurity analysts, IT professionals.Integrated team with key contributions from process engineers and automation engineers.
What are the three key modifications you need to make to your IRP plan with IT to make it work in OT?
In summary, trying to adapt the IRP plan from IT to OT requires at least three fundamental modifications. First, the hierarchy of priorities must be completely changed. The overriding goal is no longer to protect data, but to ensure physical security and operational continuity. Every procedure in the plan must be evaluated by its impact on the production process.
Second, containment strategies must be redefined. The “disconnect from the network” option must be relegated to the role of absolute last resort. Instead, the plan must include a whole catalog of process-safe alternatives, from precise traffic filtering to controlled, emergency containment, and the decision to use them must be negotiated in advance.
Third, the composition and philosophy of the response team must be changed. It must be an interdisciplinary team in which OT engineers play an equal, and often more important, role than security analysts. Incident response in OT is a team sport, in which in-depth knowledge of the physical process is as valuable as the ability to analyze malware.
How does nFlo help you create and test IRP plans that realistically work in your production environment?
At nFlo, we know very well that the most difficult part of creating an IRP plan for an OT is connecting two different worlds and cultures. That’s why our methodology is based on facilitation and bridge-building. We don’t come with a ready-made template copied from IT, but work together with your teams to create a document that is 100% tailored to your unique operational reality.
Our consultants conduct workshops where security analysts from IT and process engineers from OT learn from each other. We help you identify critical processes, analyze potential attack scenarios and, most importantly, negotiate and document secure response procedures. We create together with you practical playbooks that your employees will be able and willing to use.
Our support does not end with the creation of a document. A key part of our offering is planning and conducting realistic “tabletop” simulation exercises. We test the created plan in practice, identify its weaknesses and help in its continuous improvement. Our goal is not just to deliver compliance “on paper,” but to build real, rehearsed resilience for your organization against cyber crises.
Is your incident response plan a shield to protect the company, or just a document on a shelf?
Having an incident response plan is a business and regulatory requirement today. However, simply having a document titled “IRP Plan for OT” does not provide any security. The real question is: is it a living, tested and understood process for all, or just a theoretical document created to please auditors?
A plan that has not been built on a foundation of deep understanding of the specifics of the OT will prove useless in a moment of real crisis. It may even make the situation worse, leading to erroneous and dangerous decisions. That’s why it’s so crucial to move away from the temptation to take shortcuts and invest time and resources in creating a dedicated, thoughtful strategy.
Ultimately, the value of an IRP plan is verified in the heat of battle. And in the industrial world, what’s at stake in this battle is not only a company’s reputation and finances, but also the safety of its employees and the environment. Make sure your shield is forged from the right material.
Related Terms
Learn key terms related to this article in our cybersecurity glossary:
- Incident Response — Incident Response (IR) is an organized process of detecting, analyzing, and…
- Cybersecurity Incident Management — Cybersecurity incident management is the process of identifying, analyzing,…
- NIS2 — NIS2 (Network and Information Security Directive 2) is an EU directive…
- Cybersecurity — Cybersecurity is a collection of techniques, processes, and practices used to…
- Endpoint Detection and Response — Endpoint Detection and Response (EDR) is an advanced cybersecurity solution…
Learn More
Explore related articles in our knowledge base:
- Tabletop Exercise at the Factory: How to test your plan in case of a cyber attack without stopping the production line?
- A security operations center (SOC) in every office? We demystify a key requirement of the KRI and NIS2
- Business Continuity Plan (BCP) for OT: What if the main control system is unavailable for 24 hours?
- 5G network security: What new risks and opportunities does it bring to business?
- How do you build an incident response plan and test it with funding from Cyber Secure Local Government?
Explore Our Services
Need cybersecurity support? Check out:
- Security Audits - comprehensive security assessment
- Penetration Testing - identify vulnerabilities in your infrastructure
- SOC as a Service - 24/7 security monitoring
Cybersecurity for Your Industry
Learn more about cybersecurity in your industry:
Related topics
See also:
