Powershell script to delete duplicated file

Windows PowerShell
Windows PowerShell

We are living in a big data world which is both a blessing and a curse. Big data usually means a huge number of files such as photos and videos and finally a huge amount of storage space. Files are accidentally or deliberately moved from location to location without first considering that these duplicate files consumes more and more storage space. I want to change that with you in this blog post. We will search duplicate files and erase them.

The Goal

With my script in hand you are able to perform the described scenario.

The Script

Here is the code  to create the powershell script file delete_duplicate.ps1

Clear-Host
$SourceFiles = Read-Host 'Enter Source file path for building files list (e.g. C:\Temp, C:\)'

$ScanFiles = Read-Host 'Enter file path to scan and erase duplicate files (e.g. C:\Temp, C:\)'
Write-Host "Source Path : " $SourcePath -ForegroundColor Yellow
Write-Host "Scan Path   : " $ScanPath -ForegroundColor Yellow
Write-Host


If ((Test-Path $SourcePath) -And (Test-Path $ScanPath)) {

 $StartTime = $(get-date)
 Write-Host "Start time : " $StartTime
 
 $StartTime = $(get-date)
 Write-Host "Building Source Files List ... Please wait ..." -ForegroundColor Green
 $SourceFiles = Get-ChildItem $SourcePath -File -Recurse -ErrorAction SilentlyContinue | Get-FileHash | Group-Object -Property Hash
 Write-Host $SourceFiles.Count " Source file(s)" -ForegroundColor Green
 Write-Host "Task time : " ($(Get-Date) - $StartTime)
 Write-Host

 $ScanTime = $(get-date)
 Write-Host "Start time : " $ScanTime
 Write-Host "Building Scan Files List ... Please wait ..." -ForegroundColor Green
 $ScanFiles = Get-ChildItem $ScanPath -File -Recurse -ErrorAction SilentlyContinue | Get-FileHash | Group-Object -Property Hash
 Write-Host $ScanFiles.Count " Scan file(s)" -ForegroundColor Green
 Write-Host "Task time    : " ($(Get-Date) - $ScanTime)
 Write-Host "Elapsed time : " ($(Get-Date) - $StartTime)
 Write-Host

 $DuplicateTime = $(get-date)
 Write-Host "Start time : " $DuplicateTime
 Write-Host "Searching for Duplicate files ... Please wait ..." -ForegroundColor Green
 $DuplicateFiles = foreach ($File in $ScanFiles) {
    if ($SourceFiles.Name.Contains($File.Name)){
        $File.Group | Select-Object -Property Path,Hash
    }
 }
 Write-Host $DuplicateFiles.Count " duplicates file(s) to erase"  -ForegroundColor Red
 Write-Host "Task time    : " ($(Get-Date) - $DuplicateTime)
 Write-Host "Elapsed time : " ($(Get-Date) - $StartTime)
 Write-Host

 $DeleteTime = $(get-date)
 Write-Host "Start time : " $DeleteTime
 if ($DuplicateFiles) {
   $Warning = $DuplicateFiles.Count.ToString() + ' Duplicated file(s) found'
   $WarningPreference = "Continue"
   Write-Warning $Warning

   $i = 1
   foreach ($Duplicate in $DuplicateFiles) { 
     $filepath = $Duplicate.Path     
     # Write-Host "Erasing file (" $i "/"  $DuplicateFiles.Count ") File: " $filepath
     Try { 
       Remove-Item -Path $filepath -Force
     } catch {}
     $i = $i + 1
   }
   Write-Host "Task time    : " ($(Get-Date) - $DeleteTime)
   Write-Host "Elapsed time : " ($(Get-Date) - $StartTime)
   Write-Host
 } else { Write-Output 'No duplicated file(s) found' }

} else {Write-Output 'File path does not exist'}

DO NOT USE THE SAME PATH for Source and Scan has this limit case has not been adress in this script and it will result in all your files being deleted.

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.