Python’s PyPDF2 library fills PDF templates with Excel data. It’s lightning fast too.
Data Entry. It is slow. It is boring. It is prone to human error. It is rule based and repetitive. It is ripe for automation.
Filling out a PDF template’s fields with the data found in an excel spreadsheet is a common task and one that takes a human an age to complete. In fewer than 100 lines of python code, you can automate this process. Once set up this thing can churn out 2971 copies a minute¹. And that’s without any coffee breaks.
You will need Adobe Reader installed to create a PDF template with labelled fields but it is not required if you have an existing template or are just looking to run this example; the full repo on my GitHub has a template ready to play with! You just have to install python 3 and two libraries: pandas and PyPDF2 . You can install these by typing in the below commands into your command prompt
pip install pandas
pip install PyPDF2
This will allow you to import these modules into a Python interpreter with the below lines
import pandas as pd
import PyPDF2
In this example we have the quintessentially boring task of filling out an EIS3 Enterprise Tax Relief form with data found in a csv file. In the downloadable example, you will see the script itself and two folders, “In” and “Out”.
- Root
- In Contains PDF template, .csv file with data to fill with
- Out The folder where the filled out templates are output to
pdf_processor.py The main script to execute
Running pdf_processor.py will immediatley show you how quickly this works. The gif below is me running this very example.
The pertinent parts of the script are below. The full, more verbose script, is what you will need to adjust to your requirement (but it’s not any more complex than what is seen in the snippet below).
You will need a separate dictionary object for each page you wish to fill in — in our example only page 1 requires filling so we only have the one dictionary. Otherwise this is all you need to do to get it running — it really is as simple as this!
The documentation for the PyPDF2 is found here but for this job you shouldn’t need much more detail! You will need to consult it if you are trying to do anything a little more sophisticated. https://pythonhosted.org/PyPDF2/
If you are trying to get this to work for your own use case and run into any issues — get in touch! You can find my GitHub profile below. When I’m not surfing or building robots, I’m most probably at my computer tinkering with new ways to automate the really boring stuff so please don’t hesitate to reach out! https://github.com/Hridai
[1] My very average Huawei Matebook D 14 laptop churns out a PDF copy every ~0.02 seconds. Only 2971 a minute.