Almost all information that is not in the DBMS is classified as unstructured data. Today, unstructured data is one of the main information assets of any business. It includes electronic documents and files located in company storage units, namely office documents, PDF files, scanned copies and audio and video content.
The problem of protecting unstructured data storage units is acute in many companies. Before proceeding to the methods of protection, it is necessary to determine why the task of protecting such systems is vital.
According to various estimates, the average amount of unstructured data can reach 90 percent of the total amount of electronic data on company hard drives. However, most unstructured data often brings no value to the business. In many enterprise storages, you can find:
- Duplicate documents created by employees due to lack of copy control.
- Obsolete files that have not been accessed for years.
- Content unrelated to the direct commercial activities of the company (photos, videos, files).
It is also important to note that the average growth in the volume of unstructured data can reach 30% per year, which requires constant expansion of storage space.
Lack of control over this data not only leads to unnecessary storage costs, but also carries the risk of violating regulatory requirements and data breaches.
Data governance solutions come to the rescue
There is a separate class of products – Data Governance (DG) which may also include Data-Centric Audit and Protection (DCAP) which helps control and protect unstructured data. DG solutions help solve the following tasks:
- Audit user actions against unstructured data storage units.
- Classification of documents located in protected storage units.
- Analysis and management of user access rights.
- Data integrity monitoring.
What type of unstructured data stores do companies use?
A small business infrastructure can include different types of file storage. Let me highlight the most common:
- Windows Server-controlled file storage, including DFS.
- MS SharePoint portals, including cloud versions.
- MS Exchange mail servers.
- Linux-based repositories.
- Atlassian Confluence knowledge bases.
- NextCloud cloud storage.
- Dell EMC and NetApp NAS systems.
Separately, I should also mention MS Active Directory domain controllers. Formally, they cannot be called unstructured data stores, but DG solutions usually take care of their protection as well.
For companies with heterogeneous file stores or an infrastructure comprising several interconnected domains, the use of DG/DCAP solutions becomes even more relevant. Such solutions allow you to use a single interface to manage all security functions related to data storage.
Inside data governance tools
Prior to DG, customers used DLP Tools to solve similar problems. However, DLP functionality in terms of protecting unstructured data storage units was often limited to classification functionality. At the same time, due to the specifics of DLP systems (control of user actions on workstations), it was difficult to fully use DLP with many unstructured data storage units.
Data governance systems are typically used for fairly straightforward tasks, such as controlling data and access rights, as well as providing access to data located in corporate stores.
Let me touch on an example of a critical task that data governance products help solve: finding the location (or classification) of critical information in corporate stores.
To solve this problem, DG systems have a large number of pre-installed information categories that fall under international legislation and industry requirements. DG systems also support a large list of file formats. Data governance solutions also allow you to customize categories when searching for information that is non-standard, but at the same time relevant to a particular business.
Categories represent combinations of phrases, words, regular expressions, and the frequency of their occurrence. You can create your own or use pre-installed categories so false positives are minimal.
At the same time, there is a function of analyzing the content of graphical data formats both using the ROCK module and the search module for models of scanned copies of documents, built on the basis of artificial intelligence and neural networks. This module is in high demand because employees of IT and information security departments sometimes do not have complete information about the exact location of the most valuable data.
The continuous classification of file storage units provided by Data Governance solutions reduces the risk of critical information leaks and simplifies the audit process.
Analysis of data access rights
Understanding the location of critical information assets involves a second important task: determining current access rights to corporate resources. To solve this problem, DG tools can also be used. I’m talking about both the analysis of access rights to a particular directory / document and the ability to view all available resources for a specific employee.
I must say that even in a simple infrastructure with a file server based on Windows Server, it is difficult to solve this task without data governance tools, considering that access rights can be issued both directly and through security groups and policies, which, in turn, can be inherited. In addition, we must not forget risks such as the presence of critical documents in the public domain, the direct access rights of certain employees and unmanaged directories.
By using DG solutions, you can automatically identify the above risks and reduce unnecessary access rights. For safe duty reduction, some Data governance tools have the ability to simulate rights changes. This allows you to understand, even before the actual change, which resources an employee may lose access to based on their previous activities with different sets of data.
Identify illegal data access and prevent data breaches
Auditing employee actions in terms of data access is one of the most important features of DCAP/DG products. A business audit is not only an opportunity to retrospectively investigate information security incidents, but also a solution to day-to-day problems such as loss of documents by employees. As a rule, these tasks are solved by sending a request to the company’s IT help desk, which, in turn, restores the document from a backup copy (if one exists).
Data governance tools record all instances of access to a document, including moves, renames, deletions, and changes to access rights. With Data Governance solutions, processing a search request or restoring access to a document will take several minutes.
Data Governance at the service of the DSI
Data governance solutions are useful not only for solving problems for the security department, but also for IT teams. DG systems can help IT professionals solve problems related to file storage load optimization. DG products can detect the presence of duplicate large files, identify resources that have not been accessed for a long time, or analyze documents that take up most of the disk space.
IT departments often use the domain controller protection feature. We are talking here about change control tasks that involve the Active Directory, as well as analyzing the configurations and settings of different accounts. With DG products, you can quickly identify the list of accounts with permanent passwords, the presence of empty security groups or inactive accounts.
Listing the full list of standards, requirements, laws, and legal acts for which data governance systems are useful requires a separate article. I can only note here that the options for access auditing and scanning the contents of file storage, for example, for the presence of personal data, can significantly reduce the cost of preparing for and passing audits while at the same time increasing your company’s level of information security.
The practical benefits of data governance systems often become visible not just to a few departments that use them, but to the business as a whole. Data governance systems have a high level of automation and can reduce costs. The problems described in this article can be solved with different products, from semi-automatic DG systems using scripts to complete DCA / DG class systems. Separately, it is important to highlight the usefulness of DCAP / DG systems in terms of optimizing file storage, which, with a limited amount of computing resources, can ensure the smooth operation of any business.
Alex Vakulov is a cybersecurity researcher with over 20 years of experience in malware analysis. Alex has strong malware removal skills. He writes for many technology-related publications, sharing his security experience.