Learn about what data classification is, the various types of data classification, and how to build a classification policy and process for your organization’s data.
What is Data Classification?
Data classification labels an organization’s data based on a variety of different factors, including:
- Type: Customer data, intellectual property, financial data, etc.
- Sensitivity: High, medium, or low
- Value: The impact to the organization if the data is stolen, modified, or deleted
The goal of data classification is to provide an organization with a basis for making decisions about data security and risk management. For example, certain types of data may fall under the purview of data protection regulations, and the sensitivity and value of data may impact how it is protected and used within an organization.
The Importance of Data Classification
Data classification is a vital part of an enterprise data security policy. For many organizations, their data is their most valuable asset since customer information, intellectual property, and other sensitive data is what enables them to differentiate themselves and compete effectively in the marketplace.
Protecting this data is of utmost importance, and it is impossible to effectively protect sensitive data that you don’t know exists. Data classification provides the visibility required for effective data security and helps organizations to protect themselves against a number of high-impact risks, including:
- Data breaches
- Data loss
- Regulatory non-compliance
As the cyber threat landscape grows more sophisticated and data protection regulations become more common, the cost of poor data security increases dramatically. In 2020, the average cost of a data breach was $3.86 million, and data protection regulations like the EU’s General Data Protection Regulation (GDPR) can levy non-compliance penalties up to 4% of global turnover or 20 million Euros, whichever is higher.
Data Sensitivity Levels
Data sensitivity is one of the most important ways in which an organization can classify data. Classifying data by sensitivity enables an organization to determine the level of protection that a particular piece of data requires.
Many organizations adopt a simple three-tier data sensitivity classification system:
High Sensitivity
High sensitivity data would have a catastrophic impact on an organization if it were compromised or destroyed and would significantly damage an organization if breached. This includes data crucial to an organization’s competitive advantage, such as intellectual property, financial data, and customers’ personally identifiable information (PII).
Medium Sensitivity
Medium sensitivity data is intended for internal use only but is not confidential or highly sensitive. Examples of medium sensitivity data may include internal emails and documents that do not contain sensitive data.
Low Sensitivity
Low sensitivity data is anything that is intended or approved for public disclosure. This includes websites, marketing content, datasheets, and similar public data.
An organization may use the same three-tier system but more descriptive labels. For example, Confidential, Internal Use Only, and Public Release can replace High, Medium, and Low.
This provides users with hints regarding how the different types of data should be treated without the need to memorize the meaning of High, Medium, and Low sensitivity labels.
Types of Data Classification
After defining a sensitivity labeling scheme, an organization needs to select a strategy for applying these labels.
Three common strategies include:
Content-Based Classification
A content-based classification scheme is based on a review of the contents of each piece of data. Based on the information contained in a document, database, etc., labels are applied that define its sensitivity level and the type of data that it contains.
Context-Based Classification
Context-based classification uses metadata and other environmental information to apply classification labels to data. For example, documents produced by a certain employee or application may be automatically classified as financial data. This classification can also be used to generate labels regarding the data sensitivity and type using predefined rules.
User-Based Classification
User-based classification relies on the judgment of a knowledgeable user to apply a classification label to a piece of data. This may be the data creator or a specialized classification authority within an organization.
The approach that an organization takes to data classification can depend on its unique situation. For example, organizations that generate massive amounts of data may not be able to rely upon user-based classification due to scalability issues.
Organizations can also adopt a hybrid model for data classification. For example, an automated tool may be used to perform preliminary classification based on metadata (context-based classification) and the presence of certain types of sensitive data (content-based). A user can then perform second-stage classification for any data flagged as needing further review.
Data Classification Policy
For many organizations, the primary driver behind their data classification policies is regulatory compliance. Most organizations are subject to a number of different data protection regulations. These regulations protect specific types of data and mandate that an organization put certain protections in place for the data under their jurisdiction.
Some data protection regulations are designed to protect specific types of data or data in certain industries.
Examples of these include:
Health Insurance Portability and Accessibility Act (HIPAA)
HIPAA is a US law that protects personal health information (PHI). Its restrictions apply to healthcare providers and their business associates.
Payment Card Industry Data Security Standard (PCI DSS)
PCI DSS is a standard developed by credit card companies to protect payment card data. Any organization that processes payment card data (i.e. accepts credit or debit cards) falls under the jurisdiction of PCI DSS.
Sarbanes Oxley Act (SOX)
SOX is a US regulation designed to protect investors against financial fraud. It requires organizations to disclose risks that could impact the value of investments, including cybersecurity risk.
Other regulations are designed to protect residents of a certain area. Examples include:
General Data Protection Regulation (GDPR)
The GDPR is designed to protect the personal information of EU citizens. It applies to all organizations processing, transmitting, or storing EU citizen data regardless of whether or not they operate within the EU itself.
California Consumer Privacy Act (CCPA)
The CCPA and the California Privacy Rights Act (CPRA) apply to the personal data of Californian residents and households. Like the GDPR, the CCPA and CPRA describe data security requirements and consumer rights for data under their jurisdiction.
An effective data classification policy is essential for data security and comply with regulatory requirements. For example, the GDPR and CCPA give data subjects the right to request a complete copy of their data in an organization’s possession. Without data classification, complying with this requirement may require searching through all of the data in an organization’s possession, which is likely infeasible within the time period mandated by the regulation.
Data Classification Process
To build a data classification process, work through these ten steps:
Define the Goals:
A data classification policy should be designed to achieve a particular goal. Whether the objective is to achieve regulatory compliance, improve corporate data security, or a mix of both, the objectives of the data classification should help to shape the policy.
Perform Data Discovery:
An organization’s data classification policy should depend on the types of data in its possession. Before building a data classification policy, it is necessary to perform data discovery to identify the types of data that an organization has in its possession.
Identify Regulatory Requirements:
Based on the results of data discovery, the next step is to identify any applicable data protection regulations. This should be based upon the types of data that an organization has (financial data, customer PII, etc.) and any relevant jurisdictional requirements, including both the data sources and locations where an organization does business.
Develop a Data Classification Policy:
Based on the types of data in an organization’s possession, applicable regulatory requirements and corporate security needs, develop a policy for classifying these types of data. This may be as simple as defining all protected data as High sensitivity and labeling it by type and applicable regulation, or a policy may have a more granular breakdown of sensitivity based on the type and value of the data in question.
Create Data Security Requirements:
For each of the sensitivity levels and types of data, create the security requirements for that particular type of data. While these requirements should comply with applicable regulations, taking a checkbox approach to compliance creates complexity and does not eliminate risk. A better approach is to create a consistent policy that meets the requirements of all applicable regulations and places security controls in place (like data encryption) to protect data against breach and other threats. For example, Prey’s BitLocker encryption solutions help to meet PCI DSS Requirement 3, which addresses the use of encryption to protect cardholder data.
Define a Data Classification Process:
After defining the data classification and security policies, create a process for applying them to data. This process should outline how data should be initially classified and policies for periodic reviews of data classification.
Implement Required Tools:
Scalable data classification requires the use of tools and automation. With a policy and process in place, select and deploy the tools needed to implement and enforce this policy.
Perform Initial Classification:
When all of the components are in place, perform initial classification of all data currently in the organization’s possession. As new data is created or acquired, classify that data as well.
Employee Education:
Effective data security requires employee cooperation. When the new policy, processes, and tools are in place, train employees on how the data classification system works.
Monitor and Maintain:
Data classification is not a one-time event. Data classification policies and processes should be monitored and periodically tested and reviewed to ensure that they meet the organization’s needs.
Performing data classification may seem like a daunting task, but it is one worth doing. Effective data classification decreases enterprise risk and helps an organization to avoid costly data breaches and regulatory non-compliance penalties.