De-duplication is a practice for data reduction that methodically inspects data at the sub-file level and replaces reference pointers for any redundant elements. De-duplication decreases the disk space needed to store data by 90% or more when compared to other traditional disk systems.
Working at a sub-file level is more powerful than looking at entire files.
In reality, there’s usually only one part of a file that actually changes, not the entire file. Since data de-duplication works at the sub-file level, it can store only unique data. There are a few possible approaches offered – using a variable length block system to find redundancy is the most common.
It can make a big difference whether you carry out de-duplication at the source or at the target.
Source de-duplication happens on the client or backup server as a function of the backup software. While it can cut out some network requirements, it takes up CPU cycles on host servers during the backup process, which can slow down backup and interfere with primary applications. These drawbacks make source de-duplication ideal for smaller systems, but not for bigger systems.
Target de-duplication normally gives a faster ingest, shorter backup window, and a faster turn around for disaster recovery processes. It can also allow a single device to support different backup applications. Overall, target de-duplication is the preferred choice for larger systems.
You need to pay attention to when you de-duplicate.
If you choose to de-duplicate during ingest, you’ll use less disk but the overhead can negatively affect the backup window. If the vendor provides an adaptive buffering approach, this problem can be avoided. De-duplicating after the ingest gives faster backups, but you’ll need to reserve disk as a landing area.