# ReadPDF Powershell Module

By
,
Powershell
,
Modules
Published 2022-07-16

ReadPDF Download
https://www.powershellgallery.com/packages/ReadPDF/1.0.3

Happens to me everytime I use ReadPDF
Happens to me everytime I use ReadPDF

# PDF used to just be Adobe Reader 🤖

Nowadays there is so many different PDF readers and writers. I mean even Microsoft Word now allows you to save as PDF. The Portable Document Format has really exploded in the amount it is used. However have you ever tried to do something like:-

Get-Content C:\Path2PDF\my.pdf

Then get returned something like this...

Believe me my CV does not look like how Powershell just output it to the screen. I even had a problem at a previous company where the 4000 invoices needed to go out but the system was broke that normally did it. This particular issue had to be sorted that day else no invoices were going out which could have led to the collapse of the business. No problem, right? I managed to get the one huge PDF, split it via book mark using some third party command line tool, then I had to import a .DLL textsharp and load that into Powershell, which allowed me to do a regular expression to match within the PDF of the email address, and email it to the right person. Thankfully this all worked and this method was used for several months until the underlying issue was resolved by third party software vendors.

# PDF Pig 🐷

Yes you did read that title correctly. So I went to nuget.org and I found

I thought with the naming this would make an ideal first Powershell binary module. So this is the following .CS file I constructed which allowed me to make this into a Powershell module.

using System;
using System.Management.Automation;
using UglyToad.PdfPig;
using UglyToad.PdfPig.Content;
namespace ReadPDF

{
    [Cmdlet(VerbsData.Import, "PDFFile")]
    public class Program : PSCmdlet
    {
        [Parameter(Mandatory = true)]
        public string Path { get; set; }
        protected override void EndProcessing()
        {
            using (PdfDocument document = PdfDocument.Open(Path))
            {
                foreach (Page page in document.GetPages())
                {
                    string pageText = page.Text;

                    foreach (Word word in page.GetWords())
                    {
                        WriteObject(word.Text);
                    }
                }
            }

        }
    }
}

# Powershell ReadPDF 🛠️

So after I built this into a module, you only get the .DLL file you have created, you need to manually make the .PSD1 and .PSM1 files yourself. Anyways I wanted to test the module I had built, so this module does one thing. It reads a PDF file, word by word, and outputs that to the screen. So that's not great, but using a bit of Powershell code I could now read my .PDF CV in Powershell console directly.

(Import-PDFFile -Path C:\YourDocPath\your.pdf) -join " "

Boom I was now able to get all the text from my CV file

But what if you just want to see if the document contains a particular word? No problem

Okay but I need to know in what way this word was used. No problem

I think this was a perfect introduction for me to get into writing binary modules. I hope this post has enlightened you on this module 😃