The Python MailMerge

It's me. Henry.
Herota Tech
Python logo with an envelope

Python is such a powerful tool with many different applications.

One of the best things it does is data analysis and manipulation via the Pandas library.

The last non-profit I worked for used QuickBooks Desktop (QBD) for their daily bookkeeping and reporting but wanted to migrate to the QuickBooks Online (QBO) version to maintain support from Intuit. In general, QBO can do most of the tasks that the desktop version can do, but one of the places it falls short is in its custom receipt creation. What QBO offers is a handful of templates that you can put your company logo on, but very little by way of customization of the message you receive or other fields that need to be included on non-profit receipts because of tax deductions and designated funds. I realized that we could download a report from QBO and put it into a Microsoft Word MailMerge to give us that customization ability.

The problem was that the report data was formatted incorrectly for a MailMerge.

The below image shows an example:

Sample data showing QBO receipt data
Sample data showing QBO receipt data

While it may not be perfectly clear from the sample data, QBO outputs a report with nested data, sometimes two or three levels deep. Additionally, if a customer (or donor in the case of a non-profit) is paying into multiple different accounts (or designated funds, again for non-profits) there is a need to merge multiple lines into a single receipt.

This is where Python shines. It can intake an Excel or .csv document, manipulate the data with Pandas, and then perform a MailMerge with a library called “docx-mailmerge” originally created by Bouke Haarsma and now maintained by a community of developers building on his initial efforts.

Version 1

The first version of this project was very utilitarian and borderline Spartan. I ran it from my IDE (Pycharm) by using a tkinter file selector gui to choose the report and MailMerge template. It created a single document with individual receipts on separate pages in the root directory of my project. I would then print and finally delete the document from my computer.

Part of the first version that I am pretty proud of, and that has survived the 2 overhauls is the use of the aggregate (.agg()) function from Pandas. It took me a good while to figure out how to get multiple rows of information into a single donor line (nested data, am I right?). Here was my solution:

aggregate function solution
df = (df.groupby(['Donor',
                  'Greeting',
                  'Date',
                  'Donor/Vendor Message',
                  'Billing address',
                  'Billing city',
                  'Billing state',
                  'Billing ZIP code'])
      .agg({'Amount': lambda x: '\n'.join(x.apply(lambda y: f"${y:,.2f}")),
            'Balance': lambda x: locale.currency(max(x), grouping=True),
            'Class': '\n'.join}))

In summary, this script worked well considering the short amount of development time I had, but it was very tedious and impractical. Firstly, if there were any mistakes on a receipt, I had to go through the rigamarole of opening the document, finding the right receipt, and then printing only the one with the mistake. If there were 2 receipts with mistakes and they were not consecutive, it became even more annoying. Additionally, deleting the documents didn’t make sense because then I would have to repeat the whole process.

Version 2

In the second version, I worked smarter rather than harder. The single document for multiple receipts was great, but I wanted to also create individual Word documents of each receipt. For example, if I had four receipts in my batch, the program would give me one document with all the receipts and four individual receipt documents for a total of five documents. Additionally, I utilized the Google Drive api to automatically save the documents into a shared /receipts folder in our company’s Google Drive account. I extended this functionality by adding a Google Sheets document that gave the name of each donor in the first column and the link to their receipt in the second column. Each time the script ran, a new sheet was added with the day’s date and the list of donors and links.

An example of a sheet in Google Sheets created by the Python MailMerge script
An example of a sheet in Google Sheets created by the Python MailMerge script

Version 3

The thing about QBO is that they are still developing it. I realize that this is the standard operating procedure with CI/CD right now, but as a user of a product it makes things very difficult to rely on. So the third version was created in response to some breaking errors as a result of Intuit rolling out a new version of QBO that output different column names and data types in the report I used to get the information from our data input.

For example, the ZIP column would be relabeled as Billing_Zip_Code, and would be output as a float data type rather than as a string. In conjunction with the emerging tool ChatGPT, I created this function to standardize column names:

standardize column names
def standardize_column_name(col_name):
  
    col_name = re.sub(r'[^a-z0-9]', '_', col_name.lower()) # Define keyword mappings
    keyword_mappings = {
        "message": "Message",
        "amount": "Amount",
        "class": "Class",
        "date": "Date",
        "street": "Street",
        "city": "City",
        "state": "State",
        "zip": "Zip",
        "greeting": "Greeting",
        "balance": "Balance",
        "email": "Email",
        "receipt": "Receipt",
        "donor": "Donor", }
    
    for keyword, standard_name in keyword_mappings.items():
        if keyword in col_name:
            return standard_name # Return the mapped standard name
    return col_name

And then finally, I handled data type changes with a single line for the needed columns when reading the Excel file:

type change
df['Date'] = df['Date'].astype(str) # Change datetime and float format to string

Because of the length of this post, I think I’m going to stop there. The functionality of this project was incredibly useful for our organization and I think that is why it’s one of my favorite projects to-date. The GitHub repository for this project is not public but if you are looking to build a similar project, please feel free to reach out using the contact page.

Cheers!