What is the difference between a Duplicate and a Redundant file in Shinydocs Pro?
Your users may have questions about the terms “duplicate” and “redundant” when reviewing scanned data in Shinydocs Pro.
Imagine your phone's photo gallery. Redundant files are like having ten slightly different photos of the same sunset—each unique but essentially showing the same thing. You might want to keep the best one and remove the rest. Binary duplicates, on the other hand, are exact copies of the same photo. If you delete the duplicates, you lose nothing because you have the original.
Duplicate Files
Duplicates are different files with the same content. For example, a conference agenda saved in the Marketing Department’s Events folder and the same conference agenda saved in the Training Department’s Conference Sessions folder are duplicate files.
From a technical perspective, duplicates are identified by the TagDuplicates feature in Shinydocs Pro which relies on an SHA-1 hash of the files in order to identify duplication. To learn more about how the hash values are calculated, please refer to https://help.shinydocs.com/cognitive-suite/2.10.0/what-is-a-hash-hashing-and-hash-value.
Redundant Files
Redundant files have been tagged by Shinydocs using our standard rules for Redundant, Obsolete and Trivial (ROT) files. Importantly, there are several criteria for tagging a file as redundant, which should not be confused with duplication, as these are distinct concepts.
The current rules in Shinydocs Pro for determining whether a file is redundant are as follows:
Copied files (anywhere that contains "Copy" in the file name)
Large disk image files, .iso files, .img files, and so on, there's hundreds of extensions we look for here
Email files in folders called "Email archive"
Java files, .class, .jar, .java etc
Smartphone software, .ipa, .apk, .ipsw files
To learn more about the precise rules that Shinydocs Pro uses to determine ROT, please refer to this link:
Default ROT Rules for Shinydocs Pro
Summary
Above all, it is important to remember that not all duplicate files are necessarily redundant. For example, two different Emergency Services departments — Fire Services and Paramedic Services — may each have a copy of a procedure within their file shares because each department’s share has restricted access. In this example, the duplicate data is necessary.