Everyone is guilty of it, maybe not of duplicating yourself, but for sure files on your computer
Everyone is guilty of it, maybe not of duplicating yourself, but for sure files on your computer

# 🗄️ PSDupes Crescendo Module

By
,
Powershell
,
Modules
,
Crescendo
Published 2022-10-15

Get the PSdupes Module using this link
https://www.powershellgallery.com/packages/PSDupes/0.0.1

# Another Weekend Another Module 📁

So I have had another amazing week at work, working with a super-team doing more Powershell goodness. Honestly I know I been coding for a good number of years, but to have a job where you have the time in the working day to code a solution, it just makes every-day a dream day.

Just in the past I have to put these ideas into practice in my own-time then still work a full working day the next-day. Just nice to be employed to do a job and have the time and the team to support you to doing that in the working day

Again just had so many instances in previous positions where they do not want you to script a solution. Which never made sense to me, as you then eliminate human error, and most importantly you save a massive amount of time. I personally felt that those people possibly felt I was trying to replace them with a script. Which is not the case, I just want to work smarter not harder, to me that is what computers are all about. Let the computer do the hard work, why should I have to point and click things manually in 2022?

To be working somewhere where scripting and automation is embraced, encouraged and implemented. It just makes every-day a joy to be getting up in the morning, to be blessed with all these things as well as a great team just makes me a very happy automation platform engineer.

Just because a file has a different name it could still be a duplicate. Just like this clone one has a beard the other does not but yet they are still the same person
Just because a file has a different name it could still be a duplicate. Just like this clone one has a beard the other does not but yet they are still the same person

# Why build this as a module? 👷

So after having a lovely day with my family visiting family, I put on my laptop and thought about my disk space. Like I have had this laptop a very long time, and I know I have duplicate files on this laptop.

My plan was to have a butchers at finding all the duplicate files I might have. I know Powershell has a Get-FileHash and various other methods and exisiting scripts and functions, even modules out there that exist already to find duplicate files. As well as many graphical user interface programs out there that do this task of finding and reporting on duplicate files.

I turned to duckduckgo.com to search for a script to do this for me, and the first link I came across looked like it ticked all the boxes, I had a skim through it and thought yep that will do nicely

However running this script it just gave me one line to tell me to wait. So I waited and waited. I even waited some more, then I just got bored, and did a ctrl+c and thought there must be another quicker way or just like a decent progress-bar. I did not want a point and click program, I just wanted to be able to type one line of code and get some results back.

I always think before doing something coding related, what is the point of me spending loads of my own-time to cook something up, if I know this has been done many, many times before. So I headed over to

the official scoop site
https://scoop.sh/
to have a butchers at what was available. That is where I found this command-line solution.

A powerful duplicate file finder and an enhanced fork of fdupes
https://github.com/jbruchon/jdupes

This was great, as it was super fast, and gave me progress on the screen so I at least knew it was doing something and how long it would roughly take to complete the task. Not only that but it had a load of parameter options to use with the executable. Most importantly I saw it had JSON output, which means that I also had the ability to turn these results back into objects for Powershell to display.

Think of a plan to make this happen
Think of a plan to make this happen

# I knew this worked well 🎶

After having a look at the help file for this program and running it a good few times, it just made sense to me to smash this into a Powershell Crescendo Module to give to the masses and share this very efficient tool of locating and detailing duplicate files.

The only beef I have with using tools that I never used before like jdupes.exe is that it normally means I will have to read the help file at least once to grasp how to use the application. Then if I do not use something on a frequent basis I will have to use the help file again, mainly this time is just because I cannot remember the -m stands for summarize the information. Or that -M means to summarize the information but also show the file matches. In my head it just makes more sense to have -m as -Summarize and how about -M as -SummarizeMatchTypes I know these are more of a mouthful, but as they are switches, it now makes total sense what the switch does, as it is practically named.

# Slight Issue 🔥

I was so excited when I saw that this application had JSON output, I even let out a YES loudly, and my daughter was like Dad what are you happy about. When I explained it did not look like I had brought that same excitement to my 7 year old daughter.

However this excitement was short-lived. Why? Well because this particular application does not support JSON output with every-single command. The way the JSON file is configured it would happen on every parameter passed, it would try to convert it to JSON. This meant that the summary information was not displaying correctly, and I was also getting the dredded red error text on my screen, even though I was using -ErrorAction SilentlyContinue

No problem I thought, I can just edit the PSM1 file that gets built with the JSON file. I then changed my mind and decided to just remove it from the JSON file then re-exported the module from this JSON, and every command worked correctly now. This also gave me the opportunity to still make the JSON output into proper Powershell objects to inspect, or format differently for reporting.

# Build Steps 🥼

Time to get making a Crescendo Module to find those pesky duplicates
Time to get making a Crescendo Module to find those pesky duplicates

So I did include this in the module as build.ps1 file, which contains the following Powershell code

Import-Module Microsoft.Powershell.Crescendo
$NewConfiguration = @{
    '$schema' = 'https://aka.ms/PowerShell/Crescendo/Schemas/2021-11'
    Commands  = @()
}
$parameters = @{
    Verb         = 'Invoke'
    Noun         = 'PSdupes'
    OriginalName = "$PSScriptRoot\jupes\jdupes.exe"
}
$NewConfiguration.Commands += New-CrescendoCommand @parameters
$NewConfiguration | ConvertTo-Json -Depth 3 | Out-File .\PSdupes.json

From running this I got the following JSON file which I edited to look like this

{
  "$schema": "https://aka.ms/PowerShell/Crescendo/Schemas/2021-11",
  "Commands": [
    {
      "Verb": "Invoke",
      "Noun": "PSdupes",
      "OriginalName": "$PSScriptRoot\\jdupes\\jdupes.exe",
      "Platform": ["Windows"],
      "Description": "Finds duplicate files",
      "Parameters": [
          {
            "ParameterSetName": ["All"],
            "ParameterType": "string",
            "OriginalName"  : "",
            "OriginalPosition": 0,
            "Name"          : "Path"
        },
        {
          "ParameterSetName": ["All","Summary"],
            "ParameterType": "switch",
            "OriginalName"  : "-m",
            "OriginalPosition": 1,
            "Name"          : "Summarize"
        },
         {
           "ParameterSetName": ["All","SummaryMatch"],
          "ParameterType": "switch",
            "OriginalName"  : "-M",
            "OriginalPosition": 2,
            "Name"          : "SummarizeMatchTypes"
        },
         {
           "ParameterSetName": ["All","JSON"],
            "ParameterType": "switch",
            "OriginalName"  : "-j",
            "OriginalPosition": 3,
            "Name"          : "JSONoutput"
        },
         {
           "ParameterSetName": ["All","Delete"],
            "ParameterType": "switch",
            "OriginalName"  : "-N",
            "OriginalPosition": 4,
            "Name"          : "NoPrompt"
        },
         {
           "ParameterSetName": ["All","Delete","JSON"],
          "ParameterType": "switch",
            "OriginalName"  : "-d",
            "OriginalPosition": 5,
            "Name"          : "Delete"
        },
         {
           "ParameterSetName": ["All","Summary","SummaryMatch","Delete","JSON"],
            "ParameterType": "switch",
            "OriginalName"  : "-r",
            "OriginalPosition": 6,
            "Name"          : "Recurse"
        },
         {
           "ParameterSetName": ["All","Summary","SummaryMatch","Delete","JSON"],
            "ParameterType": "switch",
            "OriginalName"  : "-Z",
            "OriginalPosition": 7,
            "Name"          : "EnableResultsOnAbort"
        },
         {
           "ParameterSetName": ["All","JSON"],
           "ParameterType": "switch",
            "OriginalName"  : "-S",
            "OriginalPosition": 8,
            "Name"          : "ShowSize"
        }
      ]
    }
  ]
}

All that was needed to create the actual module was for me to type this one-line of code

Export-CrescendoModule -ConfigurationFile .\PSdupes.json -ModuleName PSdupes -Force

# A few examples

Invoke-PSdupes -Path "C:\TRAINING\" -Recurse -Summarize

This will recursively search within the directory specified as all child sub-directories within the parent directory typed in the path parameter It displays how many files has been searched, how many duplicates have been found and the amount of space these duplicate files are using.

Invoke-PSdupes -Path "C:\TRAINING\" -Recurse -SummarizeMatchTypes

This will do the same as the above command as well as list all the duplicate files found with the complete path.

Invoke-PSdupes -Path "C:\TRAINING\" -Recurse -NoPrompt -Delete

Will automatically delete all the duplicate files found, but keep the first item in the list, therefore leaving you with only one copy of the file. This will not prompt you to delete the files, as the -NoPrompt parameter has been used.

$Duplicates = Invoke-PSdupes -Path "C:\TRAINING\" -Recurse -ShowSize -JSONoutput | ConvertFrom-Json -ErrorAction SilentlyContinue | Add-Member -TypeName Dupes -passthru

Turning the ouput of the JSON switch then convert this from json back into a Powershell Object. As I stored this in a variable I can now produce some results from this variable as it is holding Powershell objects. Just pipe this variable to Get-Member to see the cool things you can now report on. $Duplicates.matchSets is good for starters

Although I did not include every single parameter included within the original application I believe I included enough to complete the task of finding duplicate files on any given Windows system. This allowed me to clean-up my laptop of duplicate files and did this in a timely manner with progress indication which helped on letting me know how far into the process it was.

# I hope this helps you 🌱

House-keeping is always a good task to perform on any system that has been built for a good number of years ago, as it will most likely contain a number of duplicate files. Unless it is food, I do not believe you need duplicate files on the same disk drive. Next-time you decide a machine needs a good house-keeping on checking for duplicate files, then please give this module a go, as it really does work well, and should be intuitive enough to just use it without needing to study the help behind it.

Till next time take care and read this CLONE comic from Image Comics
Till next time take care and read this CLONE comic from Image Comics