Replace Duplicate Files (Name and Size) with Hardlinks in Linux


This script searches duplicate files by Name and File size and create hardlinks

  1. Copy this script to new text file.
  2. Make it executable with chmod +x convert_duplicates_to_hardlinks.sh
  3. Run it like this
    ./convert_duplicates_to_hardlinks.sh /path/to/dir [--dry-run]
#!/bin/bash

# Usage: ./convert_duplicates_to_hardlinks.sh /path/to/dir [--dry-run]

# Check for parameters
if [ -z "$1" ]; then
    echo "Usage: $0 /path/to/target [--dry-run]"
    exit 1
fi

TARGET="$1"
DRYRUN=false

if [ "$2" == "--dry-run" ]; then
    DRYRUN=true
fi

# Temporary file to store file info
TMPFILE=$(mktemp)

# Step 1: Find all files and store "size|basename|fullpath"
find "$TARGET" -type f -printf "%s|%f|%p\n" > "$TMPFILE"

# Step 2: Sort by size and name
sort -t"|" -k1,1n -k2,2 "$TMPFILE" > "${TMPFILE}_sorted"

# Step 3: Deduplicate
prev_size=""
prev_name=""
prev_path=""

while IFS="|" read -r size name path; do
    if [[ "$size" == "$prev_size" && "$name" == "$prev_name" ]]; then
        if [ "$DRYRUN" = true ]; then
            echo "[DRY-RUN] Would link: $path -> $prev_path"
        else
            ln -f "$prev_path" "$path"
            echo "Linked: $path -> $prev_path"
        fi
    else
        prev_size="$size"
        prev_name="$name"
        prev_path="$path"
    fi
done < "${TMPFILE}_sorted"

# Cleanup
rm "$TMPFILE" "${TMPFILE}_sorted"

echo "Done."

Previous Next