How to convert HTML to PDF with Puppeteer

Whether you're looking to generate invoices, create reports, or preserve web content for offline use, Puppeteer is a powerful tool that can automate the process seamlessly. In this guide, we will explore how to convert HTML to PDF using Puppeteer, an open-source Node.js library developed by Google.

Setting Up Puppeteer

To get started, you'll need to have Node.js installed on your machine. Open your terminal or command prompt and create a new directory for your project. Navigate into the project directory and initialize a new Node.js project by running the following command:

npm init -y

Next, install Puppeteer as a dependency by executing the following command:

npm install puppeteer

Puppeteer will now be added to your project, allowing you to programmatically control a headless Chrome or Chromium browser.

Writing the conversion script

Create a new JavaScript file, such as convert-html-to-pdf.js, in your project directory. Open the file in your preferred text editor and begin by importing the Puppeteer library:

const puppeteer = require('puppeteer');

Initializing Puppeteer and Converting HTML to PDF

Inside the convert-html-to-pdf.js file, add the following code to initialize Puppeteer and convert the HTML to PDF:

(async () => {
  try {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('file:///path/to/your/html/file.html', { waitUntil: 'networkidle0' });

    // To use normal CSS instead of only print styles
    await page.emulateMediaType('screen');

    await page.pdf({ path: 'output.pdf', format: 'A4' });

      console.log('Conversion complete. PDF file generated successfully.');

      await browser.close();
    } catch (error) {
      console.error('An error occurred:', error);
    }

})();

The example shows how to convert an existing HTML file to the pdf, but you might be needed to convert html code which could be passed as a parameter to the puppeteer. In this case function page.setContent() should be used. Check the example below:

const html = '<html...';
await page.setContent(html, { waitUntil: 'networkidle0' });

Converting URL to PDF

You can easily update the code to convert an URL instead of HTML or file, by passing the URL to the page.goto method:

await page.goto('https://example.com', { waitUntil: 'networkidle0' });

It is important to understand waitUntil parameter and its values to choose a proper one.

  • load: when load event is fired.

  • domcontentloaded: when the DOMContentLoaded event is fired.

  • networkidle0: when there are no more than 0 network connections for at least 500 ms.

  • networkidle2: when there are no more than 2 network connections for at least 500 ms.

For more detailed information, which options can be passed to the puppeteer.goto method check the puppeteer documentation.

Break down the code

We have created an async function using an immediately invoked function expression (IIFE) to ensure proper execution. Inside the function, we launched a new instance of the Puppeteer controlled browser.

We open a new page and navigate to the desired HTML file using the page.goto() method. Make sure to replace 'file:///path/to/your/html/file.html' with the actual path to your HTML file. By specifying { waitUntil: 'networkidle0' }, we ensure that the page is fully loaded before generating the PDF.

It is necessary to execute the method: page.emulateMediaType('screen'); to have fully loaded CSS.

Finally, we call page.pdf() to convert the HTML to PDF and specify the output path and format. Adjust the path and format properties as needed. After a successful conversion, we log a message and close the browser instance.

Running the conversion script

Save the convert-html-to-pdf.js file, navigate to your project directory in the terminal or command prompt, and run the following command:

node convert-html-to-pdf.js

Puppeteer will launch a headless browser, load the HTML file, convert it to PDF, and save the output as output.pdf in the project directory.

Using html2pdf.app API

Managing Puppeteer can be a challenging task, especially if you need to scale up the application. A huge number of requests can break the server performance since PDF conversion requires quite many resources.

Make your life easier by using HTML to pdf API to convert PDFs.

The following code shows how it is simple todo with our API:

import axios from 'axios';
import fs from 'fs';

axios.post('https://api.html2pdf.app/v1/generate', {
  html: 'https://example.com',
  apiKey: '{your-api-key}',
}, {responseType: 'arraybuffer'}).then((response) => {
  fs.writeFileSync('./document.pdf', response.data);
}).catch((err) => {
  console.log(err.message);
});

Conclusion

By following the steps outlined in this guide, you can easily convert HTML files to PDF using Puppeteer. This versatile library opens up a world of possibilities for automating document generation, report creation, and more.

If you do not want to struggle with some edge cases and have a fast result, try already built-in solutions to convert PDFs like html2pdf.app.

Happy coding!