Data consolidation refers to the collection and integration of data from multiple sources into a single destination. During this process, different data sources are put together, or consolidated, into a single data store.
Because data comes from a broad range of sources, consolidation allows organizations to more easily present data, while also facilitating effective data analysis. Data consolidation techniques reduce inefficiencies, like data duplication, costs related to reliance on multiple databases and multiple data management points.
Data consolidation best practices
Organizations should plan and execute data consolidation projects carefully. These best practices promote effective data consolidation:
- Check to see whether data types in your source and target are compatible: If they’re not, you’ll have to transform data to address differences among data types.
- Maintain copies of your data: Data lineage allows an organization to understand exactly what was done to the data — and how — during the consolidation process. You may need information to demonstrate regulatory compliance, or for retracing steps to understand the results of analytics and any business decisions based on them.
- Standardize character set conversions: If you work with an application that allows you to store single-byte characters — such as Western languages — and double-byte characters — such as some Asian languages — in a database, the application can convert between these character types. However, when you move the data, the tools processing the data may be unaware that the data is stored in a different format. By standardizing character set conversions, you increase the likelihood of consolidating data for a reliable outcome.
Data consolidation challenges
There are challenges in the data consolidation process. The most common ones include:
- Limited resources: Hand-coding consolidation techniques require data engineers who must write code and manage the process, and write more code every time a new data source comes online. The more sources and data types involved, the longer the process becomes.
- Security issues (real and perceived): Security concerns include guarding data from breaches before and after consolidation, and developing backup and disaster recovery capabilities if data is compromised, corrupted, or deleted. Companies also must guard against “inside jobs” like exfiltration — the unauthorized copying, transfer, or retrieval of data from a computer or server.
- Data spread across multiple locations: Today’s decentralized data landscape can make data integration challenging. Having data in different locations — including in the cloud, on premises, and at remote locations — adds to the complexity of data consolidation. For instance, data stored in legacy systems may be missing times and dates for activities, which more modern systems commonly include; and data from external systems may not contain the same level of detail as internal sources.