We are living in a big data world which is both a blessing and a curse. Big data usually means a huge number of files such as photos and videos and finally a huge amount of storage space. Files are accidentally or deliberately moved from location to location without first considering that these duplicate files consumes more and more storage space. I want to change that with you in this blog post. We will search duplicate files and erase them.
The Goal
With my script in hand you are able to perform the described scenario.
The Script
Here is the code to create the powershell script file delete_duplicate.ps1
Clear-Host
$SourceFiles = Read-Host 'Enter Source file path for building files list (e.g. C:\Temp, C:\)'
$ScanFiles = Read-Host 'Enter file path to scan and erase duplicate files (e.g. C:\Temp, C:\)'
Write-Host "Source Path : " $SourcePath -ForegroundColor Yellow
Write-Host "Scan Path : " $ScanPath -ForegroundColor Yellow
Write-Host
If ((Test-Path $SourcePath) -And (Test-Path $ScanPath)) {
$StartTime = $(get-date)
Write-Host "Start time : " $StartTime
$StartTime = $(get-date)
Write-Host "Building Source Files List ... Please wait ..." -ForegroundColor Green
$SourceFiles = Get-ChildItem $SourcePath -File -Recurse -ErrorAction SilentlyContinue | Get-FileHash | Group-Object -Property Hash
Write-Host $SourceFiles.Count " Source file(s)" -ForegroundColor Green
Write-Host "Task time : " ($(Get-Date) - $StartTime)
Write-Host
$ScanTime = $(get-date)
Write-Host "Start time : " $ScanTime
Write-Host "Building Scan Files List ... Please wait ..." -ForegroundColor Green
$ScanFiles = Get-ChildItem $ScanPath -File -Recurse -ErrorAction SilentlyContinue | Get-FileHash | Group-Object -Property Hash
Write-Host $ScanFiles.Count " Scan file(s)" -ForegroundColor Green
Write-Host "Task time : " ($(Get-Date) - $ScanTime)
Write-Host "Elapsed time : " ($(Get-Date) - $StartTime)
Write-Host
$DuplicateTime = $(get-date)
Write-Host "Start time : " $DuplicateTime
Write-Host "Searching for Duplicate files ... Please wait ..." -ForegroundColor Green
$DuplicateFiles = foreach ($File in $ScanFiles) {
if ($SourceFiles.Name.Contains($File.Name)){
$File.Group | Select-Object -Property Path,Hash
}
}
Write-Host $DuplicateFiles.Count " duplicates file(s) to erase" -ForegroundColor Red
Write-Host "Task time : " ($(Get-Date) - $DuplicateTime)
Write-Host "Elapsed time : " ($(Get-Date) - $StartTime)
Write-Host
$DeleteTime = $(get-date)
Write-Host "Start time : " $DeleteTime
if ($DuplicateFiles) {
$Warning = $DuplicateFiles.Count.ToString() + ' Duplicated file(s) found'
$WarningPreference = "Continue"
Write-Warning $Warning
$i = 1
foreach ($Duplicate in $DuplicateFiles) {
$filepath = $Duplicate.Path
# Write-Host "Erasing file (" $i "/" $DuplicateFiles.Count ") File: " $filepath
Try {
Remove-Item -Path $filepath -Force
} catch {}
$i = $i + 1
}
Write-Host "Task time : " ($(Get-Date) - $DeleteTime)
Write-Host "Elapsed time : " ($(Get-Date) - $StartTime)
Write-Host
} else { Write-Output 'No duplicated file(s) found' }
} else {Write-Output 'File path does not exist'}
DO NOT USE THE SAME PATH for Source and Scan has this limit case has not been adress in this script and it will result in all your files being deleted.
Be the first to comment