Deduplication
Save storage space by reducing duplicated data
Data deduplication is an important technology to have, to help you control data proliferation and can help you dramatically reduce your overall backup and restore costs by allowing greater retention of backup data on disk.
The average UNIX® or Windows® disk volume contains thousands or even millions of duplicate data objects. As data is created, distributed, backed up, and archived, duplicate data objects are stored unabated across all storage tiers. The end result is inefficient utilisation of data storage resources.
By eliminating duplicated data gives immediate benefit through storage space efficiencies.
(1) Management Benefit
The ability the store “more” data per storage unit, or retain online data for longer periods of time.
(2) Cost Benefit
Reduced initial storage acquisition cost, or longer intervals between storage capacity upgrades.
Taking advantage of highly efficient data deduplication technologies from the likes of Data Domain, Diligent, Overland and NetApp will allow you to economically store typically 25x more backup data.
When choosing a solution it is important to understand the different types of deduplication and which is best for what application i.e. VTL , primary storage etc.
Data Deduplication Types
The different methodologies include:
(1) In-Line Processing Data de-duplication
Performed as data flows into the secondary storage system. While the CPU-intensive process is only performed once, it requires more processing power and therefore, can be slower than other methods. The speed of in-line processing is highly dependent on the design of the de-duplication algorithm and the hardware on which it is running, so pay special attention to these aspects. Aside from these considerations, in-line processing ensures the most efficient use of available disk space when compared with the other approaches.
(2) Post Processing Data de-duplication
Performed once data is already stored, which demands less processing power as the de-duplication can be performed during non-primetime hours. Be aware, however, that this approach makes much less efficient use of disk space and can cause bottlenecks when performing regular backups, replication or disaster recovery procedures.
(3) Parallel Processing I/O de-duplication
Operations are handled simultaneously on one storage platform. As a result, processing power can be impacted severely while also slowing backups, if processing is diminished. This approach also can yield inefficient use of disk space. Utilise minimal system resources - primary data, backup data, and archival data can all be de-duplicated with nominal impact on data centre operations. Schedule de-duplication to occur during off-peak times, applications can sustain critical performance but still realise significantly reduced storage capacity requirements.