During an audit at a financial sector company, one of the directors asked me a question that comes back to me with every subsequent project: “How are we supposed to know which data is truly critical when we have terabytes of it?” That question hits at the very heart of the problem most organizations struggle with today. Without an answer to it, every security investment — DLP tools, encryption, access management systems — operates somewhat in the dark.
Data classification is the starting point without which information protection remains reactive and piecemeal. When an organization knows what data it holds, where it resides, and how sensitive it is to disclosure, it becomes possible to build proportionate, cost-justified, and regulatory-compliant control mechanisms. In this article I walk through the entire cycle — from the concept and categories, through policy and regulations, to automation, DLP, data owners, and typical implementation mistakes.
What is data classification and why is it the foundation of information security?
Data classification is a formal process of assigning each information asset a label that specifies its sensitivity and required level of protection. It is not merely a documentation exercise — it is an operational mechanism that determines who can access the data, how it may be stored, transmitted, and shared, and how it should be disposed of at the end of its lifecycle.
Information security in its classical formula — confidentiality, integrity, availability — assumes that not all data requires the same level of protection. A document about an acquisition strategy requires different safeguards than a price list publicly available on a website. Classification formalizes this intuition and translates it into concrete technical and procedural rules.
A client from the insurance sector I worked with a few months ago put it plainly: “We implemented an expensive DLP system, but we don’t know what it should protect because we have no classification.” That statement describes a situation where an organization has a tool but no strategy for using it. DLP without classification is like a firewall without a rule policy — it blocks something, but often not what it should.
Data classification is also the foundation of the answer to the audit question: “Do you know what data you process?” In the NIS2, GDPR, and ISO 27001 environment, the absence of a documented approach to classification is treated as a serious gap in risk management. Organizations that can show auditors a complete data map with assigned sensitivity levels are perceived as operationally mature — and that assessment is well deserved.
It is also worth emphasizing that data classification is an ongoing process, not a project with a completion date. Data changes its sensitivity over time — a research and development project that was strictly confidential may become public after a product announcement. A pricing policy prepared for the next quarter loses its sensitivity after publication. A good classification system accounts for this lifecycle and provides mechanisms for reclassification.
📚 Read the complete guide: OT/ICS Security: Bezpieczeństwo systemów OT/ICS - różnice z IT, zagrożenia, praktyki
What classification categories to use — public, internal, confidential, strictly confidential?
The most widely used data classification model comprises four sensitivity levels: public, internal, confidential, and strictly confidential (or secret). Each corresponds to a specific risk profile and set of protection requirements.
Level 1 — Public: Data intended for free distribution outside the organization. This includes marketing materials, content published on the website, press releases, and annual reports made publicly available. Disclosure of this data causes no harm. Protection requirements are minimal — it is however necessary to ensure that documents marked as public genuinely are such and do not contain hidden metadata with sensitive data.
Level 2 — Internal: Data intended exclusively for use within the organization, without particular restrictions between departments. This includes regulations, operational procedures, internal communications and presentations, and general project data. Unauthorized external disclosure may cause reputational damage or minor operational harm, but is not catastrophic.
Level 3 — Confidential: Data whose disclosure could cause serious harm to the organization or its clients. This includes personal data of clients and employees, financial information, commercial contracts, client project documentation, and data relating to business strategy. This level requires access controls, encryption at rest and in transit, and restrictions on printing and transmission.
Level 4 — Strictly Confidential / Secret: Data whose disclosure could cause critical financial, legal, or operational losses — or expose individuals to serious risk. This includes acquisition and merger plans, cryptographic keys, biometric data, special-category medical data, and emergency plans for critical infrastructure. This level requires the most rigorous controls: multi-factor authentication, enterprise-grade encryption, a restricted circle of access, and a full audit trail of every access event.
An important practical rule: it is not worth creating more than four classification levels. Experience shows that five- or six-level systems lead to chaos — employees cannot remember the differences between “confidential” and “highly confidential” and as a result classify everything at the highest possible level, which in turn makes the controls excessively restrictive and operationally unbearable.
Organizations operating in regulated environments may need additional contextual labels such as “GDPR — personal data”, “PCI DSS — payment card data”, or “PHI — medical data”. These are, however, supplementary tags applied on top of the classification level, not separate levels in the hierarchy.
How to build a data classification policy tailored to the organization?
A data classification policy is a formal management document that defines classification levels, criteria for assigning labels, roles and responsibilities, procedures for handling data at each level, and the consequences of policy violations. Developing it requires the involvement of executive management, the legal department, IT, and business data owners — it cannot be solely the product of the security department.
The first step is an inventory of information assets. The organization must know what categories of data it processes before it can decide how to classify them. Typical asset registers include: client data (contact, transactional, behavioural), employee data (HR, performance, medical), financial data (invoices, budgets, forecasts), intellectual property (source code, designs, patents), operational data (system configurations, logs, backups), and regulatory data (compliance documentation, data processing agreements).
During an audit at a manufacturing company, we discovered that nobody knew about the existence of a network folder containing the full client database from the past seven years — unprotected and accessible to all company employees. The data inventory uncovered this in the first week of the project. Without it, the classification system would never have covered that asset.
The policy should define classification criteria for each data type — best in the form of a decision matrix. Helpful questions: Would disclosure of this data expose the organization to regulatory sanctions? Could it cause financial losses above a defined threshold? Does it include personal data within the meaning of GDPR? Could disclosure harm individuals? Based on the answers to these questions, the appropriate classification level can be mechanically assigned.
The next element of the policy is defining data handling rules for each level. These rules cover: storage (which systems and locations are permitted), encryption (at rest and in transit), access control (who has access and on what basis), sharing (internal and external — which channels are permitted), printing (whether permitted, with what restrictions), disposal (certified destruction or cryptographic erasure), and retention (how long data may be stored).
The policy must be approved by executive management and communicated to all employees. A document sitting in a folder on the intranet that nobody knows about does not fulfil its purpose. Key elements are induction training for new employees and regular refresher training for existing ones — especially for those who work with data at higher classification levels.
Which regulations require data classification — NIS2, GDPR, ISO 27001, DORA?
None of the main regulations explicitly requires the use of a specific data classification model, but all of them de facto assume one, because they require proportionate protective measures adequate to the risk — and proportionality is impossible without knowing what is being protected.
GDPR (General Data Protection Regulation) requires the implementation of appropriate technical and organizational measures that ensure a level of security appropriate to the risk of processing. Data classification is a natural tool for implementing this principle. Article 9 identifies special categories of personal data (data concerning health, religious beliefs, racial origin, biometric and genetic data) that require enhanced protection — which in practice means the need for at least one higher classification level for those categories. The absence of a record of processing activities and the assignment of sensitivity levels to individual datasets is treated by data protection authorities as an organizational shortcoming.
NIS2 (Network and Information Security Directive) and its national implementation impose on essential and important entities an obligation to manage risk, which includes the identification and classification of assets. Article 21 of NIS2 explicitly lists measures concerning human resources security, access policies, incident handling, and business continuity — and all these areas assume that the organization knows what data it holds and how sensitive it is to disclosure or loss.
ISO/IEC 27001 in control A.5.12 (formerly A.8.2.1) explicitly requires the classification of information. The standard states that the organization should classify information in accordance with legal requirements, value, criticality, and sensitivity to unauthorized disclosure or modification. Classification must be linked to labelling (A.5.13) and information handling (A.5.14). ISO 27001 auditors always check whether a classification policy exists, whether it is applied in practice, and whether it is consistent with the risk matrix.
DORA (Digital Operational Resilience Act) applies to the financial sector in the European Union and entered full application in 2025. The regulation requires ICT risk management, which includes the identification and classification of key information assets, with particular emphasis on those supporting critical business functions. DORA also requires financial organizations to document dependencies between information assets and critical services — which is impossible without prior classification.
An interesting observation from audit practice: organizations that have implemented solid data classification simultaneously fulfil the requirements of all four regulations — because classification is that common denominator upon which each of them is built. Investment in one well-executed process delivers results across multiple compliance areas.
Beyond the four main regulations, it is worth mentioning PCI DSS (payment cards), which requires strict control over payment card data, and Trade Secrets law, which requires that an entrepreneur take appropriate measures to maintain the confidentiality of information constituting a trade secret — something that is difficult to prove in court without formal classification.
How to automate data classification — tools and approaches?
Automation of data classification is a necessity in organizations that process large volumes of information. Manual classification works well as a starting point and for newly created documents, but catching up on existing assets and maintaining consistency over time requires tooling support.
The basic category of tools for automatic classification is systems based on content-aware classification. These work by scanning the content of files and email messages for predefined patterns, such as national identification numbers, credit card numbers in Luhn format, bank account numbers, IP addresses, or custom regular expressions defined by the organization. When the system recognizes a pattern, it automatically assigns or recommends an appropriate classification label.
A more advanced approach is classification based on machine learning. ML models trained on labelled datasets learn to recognize sensitive content even when it does not match any predefined pattern. This approach works particularly well for narrative documents — contracts, reports, correspondence — where sensitivity derives from context rather than the presence of specific character strings.
Key platforms on the market include Microsoft Purview Information Protection (formerly Azure Information Protection and Microsoft Information Protection — MIP), which integrates natively with the entire Microsoft 365 ecosystem and offers both sensitivity labels and automatic classification policies. A competing solution is the Varonis Data Security Platform, which specializes in user behaviour analysis and detection of excessive access. Forcepoint Data Discovery and Broadcom Symantec DLP also offer automatic classification features as part of broader DLP platforms.
An important principle: automation of classification does not eliminate the need for human involvement — it only changes the division of labour. Automated systems handle repetitive cases and large scale. A person — the data owner or security analyst — verifies exceptions, approves the classification of new data types, and makes decisions in ambiguous cases.
Organizations just starting out should consider a hybrid approach: first manually define policies and labels, then pilot automatic classification on a limited dataset, and only after validating the results expand across the entire organization. Attempting to automatically classify everything at once frequently ends in chaos and the need to manually reverse the system’s decisions.
Regardless of the chosen tool, it is also essential to address the classification of data “at rest” (stored data), “in motion” (data being transmitted), and “in use” (data currently being processed). Each of these categories requires different detection and protection mechanisms.
How does data classification support DLP implementation?
DLP (Data Loss Prevention / Data Leakage Prevention) is a set of technologies and processes aimed at preventing the unauthorized disclosure of data. Classification and DLP are two complementary elements — classification says “what kind of data this is and how sensitive it is”, while DLP says “what can be done with it and what cannot”.
Without classification, a DLP system must rely solely on pattern recognition — searching for national identification numbers, card numbers, email addresses. This works for typical structured data formats, but fails for contextual data: a document describing an acquisition strategy, notes from negotiations, an email with information about a new product before its launch. Classification fills this gap — a file labelled “Strictly Confidential” is treated by DLP as sensitive regardless of its content.
The integration of classification with DLP occurs through sensitivity labels. When a document is classified — manually by an employee or automatically by the system — it receives a metadata label embedded in the file itself (not in the file name or folder, but as a document property). DLP systems can then read these labels and apply appropriate policies: blocking external transmission, requiring encryption, restricting the ability to print or copy to the clipboard.
In a practical DLP implementation based on classification, policies are defined for each sensitivity level. An example set of DLP policies might look as follows: public data — no restrictions; internal data — block transmission to private email addresses and external cloud services; confidential data — mandatory encryption, prohibition on copying to USB drives, prohibition on transmission through unapproved channels; strictly confidential data — block all transmission outside the approved corporate network, mandatory dual electronic signature when sharing with an external recipient.
My experience in DLP projects shows that organizations that implemented DLP without prior classification have an average false-positive rate of forty-five percent — which paralyses operations and leads to alert fatigue. After implementing classification and switching DLP policies to labels rather than relying exclusively on patterns, this rate typically drops to below ten percent. That is an enormous difference in the day-to-day functioning of the security team.
An important aspect is also exception handling. A DLP system based on classification must provide a mechanism for justified deviations from the rule — for example, an employee needs to send a confidential document to an external auditor. The system should enable such action after the employee provides a justification and obtains approval from their manager, and the entire transaction should be recorded in an audit log.
Classification supports DLP not only technically but also culturally. When employees understand why a given document is marked as confidential and what restrictions follow from that, they are more inclined to comply with policies — because they understand the purpose rather than treating them as arbitrary obstacles.
How to engage employees in classification — data owners, roles, responsibilities?
Technology can support classification, but it cannot replace human judgment in deciding on the value and sensitivity of data. An effective classification programme requires a clear model of roles and responsibilities that engages the right people at the right stages of the process.
The central role in the classification model is the Data Owner. This is a business person — a department director, project manager, or manager responsible for a given process — who has the authority and knowledge needed to assess the value of information. The data owner does not need to have technical expertise, but must understand the business context: what consequences would the disclosure of this data have, how long is it needed, who should have access to it. In the hierarchy of accountability, the data owner approves the classification and is responsible for its adequacy.
A separate role is the Data Custodian — usually an IT administrator or system manager who technically manages storage and access. The custodian implements the owner’s decisions: configures permissions, applies encryption, manages backups. They do not make classification decisions, but ensure that classification is technically enforced.
Data users are all employees who have access to data as part of their work. Their role is to responsibly comply with policies, and when creating new documents — to assign the appropriate classification label. Training for users should be practical and tailored to typical scenarios from their daily work, not abstract lectures on security levels.
A client from the healthcare sector asked me: “How do we even start with data owners when nobody wants to take responsibility?” This is a common problem. The solution that works in practice is to link the data owner role to the existing organizational structure — not creating new positions, but assigning responsibility to those who already make business decisions about a given process. The CFO becomes the owner of financial data, the HR director — of personnel data. This is natural and logically justified.
The training programme should cover at least three levels: general training for all employees (definitions of levels, examples, how to label), advanced training for data owners (decision criteria, reclassification, handling exceptions), and technical training for the IT and security department (tool configuration, handling incidents related to misclassification).
Equally important is a culture without punishing mistakes. If employees know that reporting a classification error results in a reprimand rather than correction, they will conceal mistakes. Mature organizations treat misclassification as a valuable source of knowledge about where the policy is unclear or where additional training is needed.
What mistakes do organizations most commonly make with data classification?
Over years of working with organizations on classification implementations, I have compiled a catalogue of mistakes that recur regardless of the industry, company size, or technologies used. Awareness of these mistakes makes it possible to avoid them or at least to recognize and correct them more quickly.
Mistake one: too many classification levels. Organizations often start from the assumption that the more detailed the model, the better. In practice, a six- or seven-level system is operationally unusable. Employees cannot remember the differences between level three and level four, which leads to mass over-classification — everything ends up at level five or six to “be on the safe side.” This in turn leads to over-regulation that paralyses work. Four levels are sufficient in the vast majority of cases.
Mistake two: no assignment of data owners. Classification systems where “everyone is responsible” in practice have nobody responsible. Without specific data owners, there is no one to make reclassification decisions, approve exceptions, or enforce policies in day-to-day work.
Mistake three: classification as a one-time project. Many organizations approach data classification as a project with a start date and an end date — “we implemented classification” and the matter is settled. Data, however, lives, changes, and grows. Classification without maintenance mechanisms — reviews, reclassification, monitoring of policy compliance — quickly becomes outdated and loses its value.
Mistake four: ignoring unstructured data. Most organizations begin classification with data in structured databases — where it is easy. Yet eighty percent of an organization’s data is unstructured: emails, Word documents, Excel spreadsheets, PowerPoint presentations, PDF files, messages in Teams or Slack. That is precisely where most sensitive information ends up and precisely where classification is most difficult and most frequently overlooked.
Mistake five: lack of integration with the data lifecycle. Classification without a retention policy and secure data disposal is incomplete. Confidential data stored longer than necessary represents unnecessary risk. The classification policy should explicitly specify how long data at each level may be retained and what the approved process is for its secure deletion.
The mistake made by nearly every organization during its first implementation: an overly ambitious scope in phase one. Attempting to classify all data across the entire organization at once ends in resource exhaustion, stakeholder frustration, and often abandonment of the project. An iterative approach works far better: start with one department or one critical system, draw lessons, refine the policy, and only then scale.
Mistake six: lack of classification visibility for users. Labels that exist only as background metadata — invisible to the user — do not build habits of safe data handling. A visible marking of a document as “Confidential” or “Strictly Confidential” in the header or footer of the document reminds the user at every interaction with the file what kind of material they are working with. This simple mechanism significantly reduces the number of accidental breaches.
What does the data classification implementation process look like?
Implementing data classification in an organization is an iterative process, typically spanning six to twelve months for a medium-sized company. The table below presents typical phases, their scope, tools, key deliverables, and indicative durations.
| Phase | Scope of activities | Tools / methods | Deliverables | Indicative duration |
|---|---|---|---|---|
| 1. Diagnosis and inventory | Identification of data repositories, interviews with owners, data flow mapping, preliminary risk assessment | Workshops, Data Discovery (Varonis / Purview), interviews | Information asset register, data flow map, gap report | 4–6 weeks |
| 2. Policy design | Definition of classification levels, label assignment criteria, data handling rules, roles and responsibilities | Workshops with executive management and data owners, regulatory review | Data classification policy, roles and responsibilities matrix, data handling rules per level | 3–4 weeks |
| 3. Pilot | Classification implementation in one department or system, tool testing, policy validation, training of pilot group | Microsoft Purview, Varonis, Forcepoint or other DLP/classification tool | Pilot report, list of policy corrections, training materials | 4–6 weeks |
| 4. Training and communication | Training for all employees, management communication, specialist training for data owners and IT | E-learning, workshops, reference materials, FAQ | Trained employees, policy acknowledgement confirmations, knowledge base | 3–4 weeks |
| 5. Full deployment | Rollout across the entire organization, automatic classification configuration, DLP integration, alert configuration | Classification tool + DLP, SIEM/SOAR | Labels deployed in systems, DLP policies active, monitoring dashboard | 6–10 weeks |
| 6. Maintenance and improvement | Classification reviews, reclassification, incident analysis, policy updates, periodic audits | Classification tool reports, internal audits, management reviews | Periodic compliance reports, updated policy, classification incident register | Ongoing (quarterly) |
The key to success is change management — data classification touches the daily work of every employee, not just the IT department. Projects that treat this as a purely technical implementation and overlook the communication and training aspect regularly end with low adoption rates and a return to old habits within weeks of deployment.
An important principle is also the gradual tightening of policies. At the beginning it is worth running classification in “audit only” mode — the system monitors and logs but does not block. This allows for the collection of data on typical user behaviour and the adjustment of DLP thresholds before active policy enforcement begins. Blocking everything from the outset leads to an avalanche of helpdesk tickets and frustration that undermines the purpose of the entire project.
How does nFlo help organizations implement data classification and DLP?
nFlo accompanies organizations through the entire data classification implementation cycle — from diagnosis and policy design, through the pilot and training, to tool deployment and long-term programme maintenance. Drawing on experience gained across more than 500 projects and in collaboration with more than 200 clients, nFlo understands that every organization has a different risk profile, different regulations to comply with, and a different culture of working with data.
At the diagnosis stage, nFlo conducts a detailed inventory of information assets using Data Discovery tools, identifies data flows between systems and with external environments, and assesses the maturity of existing classification and information protection processes. The result of the diagnosis is a concrete asset register and risk map — documents that directly support the requirements of NIS2, GDPR, ISO 27001, and DORA.
At the policy design stage, nFlo experts work with executive management and data owners to create a classification policy tailored to the realities of the organization — not a generic template, but a document crafted to the specifics of the industry, existing systems, and working culture. Particular attention is paid to linking the classification policy with existing security policies, to avoid inconsistencies and contradictions in the documentation.
Technical implementation is carried out both on the basis of Microsoft Purview (for organizations operating within the Microsoft 365 ecosystem) and on the basis of independent DLP and classification platforms for hybrid and multi-cloud environments. The nFlo team configures policies to minimize false positives and ensure smooth operation from the first day of production.
nFlo also offers post-implementation support in a managed security model: DLP alert monitoring, incident analysis, policy updates upon regulatory changes, and periodic reviews of the classification programme status. A response time of under 15 minutes for critical events and a client retention rate of 98% are the quality benchmarks that nFlo consistently maintains.
For organizations just starting out and unsure where to begin, nFlo offers diagnostic workshops — one-day sessions with key stakeholders after which the organization leaves with a concrete action plan and priorities for the coming weeks. This is a low-cost entry point that makes it possible to assess the scale of the challenge and plan a realistic budget before committing to a full project.
Frequently asked questions (FAQ)
Does a small company also need formal data classification?
Yes — though the scale and formalism differ. Even small companies process personal data of clients (GDPR), financial data, and trade secrets. A minimal classification policy — even a two- or three-level one (public / internal / confidential) — provides structure for data protection without excessive administrative burden. Many incidents at small companies stem precisely from a lack of basic awareness of which data is sensitive.
How long does data classification implementation take?
For an organization with several dozen employees: six to eight weeks for a basic implementation. For an organization with several hundred employees and a complex IT environment: six to twelve months for a full implementation with automatic classification and DLP integration. The key variable is the volume of unstructured data and the number of systems requiring integration.
Are classification labels visible to all employees?
It depends on the configuration. In Microsoft Purview, sensitivity labels can be displayed in the header/footer of documents and emails, as well as as markings in the Teams and SharePoint interface. This is the recommended approach — a visible label builds habits and serves as a reminder of handling rules. The label metadata is, however, technically accessible only to DLP systems and management tools.
What happens when an employee assigns the wrong label?
In a well-designed system, an incorrect label is corrected by the data owner or by an automatic reclassification mechanism. An employee should not face disciplinary consequences for a single good-faith error — this destroys a culture of transparency. Repeated errors in a specific area signal the need for additional training or clarification of the policy, not punishment for the employee.
How does data classification relate to identity and access management (IAM)?
Classification and IAM are closely linked. The classification level of data determines what roles and what identities should have access to it. In mature environments, access policies (RBAC / ABAC) are directly parameterized by classification labels: a user with the role “Financial Analyst” automatically gains access to data labelled “Confidential — Finance”, but not to data labelled “Strictly Confidential — M&A.” This significantly simplifies access management and reduces the risk of excessive permissions.
Related topics
- DLP (Data Loss Prevention) — technologies for preventing data leakage
- IAM (Identity and Access Management) — identity and access management
- SIEM / SOAR — monitoring and automated incident response
- IT risk management — regulatory and operational context
- Information security audit — verification of policy compliance
Learn more
- What is data protection in an organization
- What is IT compliance — regulatory compliance
- Internal ISO 27001 audit
- NIS2 — new obligations for Polish companies
Check our services
Do you want to implement data classification and DLP in your organization? nFlo offers a full range of services — from diagnosis and policy design, through tool implementation, to long-term support and monitoring. Contact us to discuss the scope and timeline of a project tailored to your needs.
Sources
- ENISA — Guidelines on data classification, European Union Agency for Cybersecurity
- ISO/IEC 27001:2022 — Information security, cybersecurity and privacy protection — Information security management systems
- NIST Special Publication 800-53 Rev. 5 — Security and Privacy Controls for Information Systems and Organizations
- Regulation (EU) 2016/679 of the European Parliament and of the Council (GDPR)
- NIS2 Directive (EU) 2022/2555 and the Act on the National Cybersecurity System
- DORA Regulation (EU) 2022/2554 — Digital Operational Resilience Act
- Gartner Research — Market Guide for Data Loss Prevention, 2024
- Microsoft Purview Information Protection documentation — Microsoft Learn
- Varonis Data Security Platform — technical documentation
