my package of the day – htmldoc – for converting html to pdf on the fly

PDF creation got actually fairly easy. OpenOffice.org, the Cups printing system, KDE provide methods for easily printing nearly everything to a PDF file right away. A feature that even outperforms most Windows setups today. But there are still PDF related task that are not that simple. One I often run into is automated PDF creation on a web server. Let’s say you write a web application and want to create PDF invoices on the fly.

There are, of course, PDF frameworks available. Let’s take PHP as an example: If you want to create a PDF from a php script, you can choose between FPDF, Dompdf, the sophisticated Zend Framework and more (and commercial solutions). But to be honest, they are all either complicated (as you often have to use a specific syntax) to use or just quite limited in their possibilities to create a pdf file (as you can only use few design features). As I needed a simple solution for creating a 50+ pages pdf file with a huge table on the fly I tested most frameworks and failed with most of them (often just as I did not have enough time to write dozens of line of code).

So I hoped to find a solution that allowed me just to convert a simple HTML file to a PDF file on the fly providing better compatibility than Dompdf for instance. The solution was … uncommon. It was no PHP class but a neat command line tool called “htmldoc” available as a package. If you want to give it a try just install it by calling “aptitude install htmldoc”.

You can test htmldoc by saving some html files to disk and call “htmldoc –webpage filename.html”. There a lot of interesting features like setting font size, font type, the footer, color and greyscale mode and so on. But let’s use htmldoc from PHP right away. The following very simple script uses the PHP output buffer for minimizing the need for a write to disk to one file only (if somebody knows a way of using this without any temporary files from a script, let me know):

// start output buffer for pdf capture
 
ob_start();
?>
your normal html output will be places here either by
dumping html directly or by using normal php code
<?php
// save output buffer
$html=ob_get_contents();
// delete Output-Buffer
ob_end_clean();
// write the html to a file
$filename = './tmp.html';
if (!$handle = fopen($filename, 'w')) {
	print "Could not open $filename";
	exit;
}
if (!fwrite($handle, $html)) {
	print "Could not write $filename";
	exit;
}
fclose($handle);
// htmldoc call
$passthru = 'htmldoc --quiet --gray --textfont helvetica \
--bodyfont helvetica --logoimage banner.png --headfootsize 10 \
--footer D/l --fontsize 9 --size 297x210mm -t pdf14 \
--webpage '.$filename;
 
// write output of htmldoc to clean output buffer
ob_start();
passthru($passthru);
$pdf=ob_get_contents();
ob_end_clean();
 
// deliver pdf file as download
header("Content-type: application/pdf");
header("Content-Disposition: attachment; filename=test.pdf");
header('Content-length: ' . strlen($pdf));
echo $pdf;

As you can see, this is neither rocket science nor magic. Just a wrapper for htmldoc enabling you to forget about the pdf when writing the actual content of the html file. You’ll have to check how htmldoc handles your html code. You should make it as simple as possible, forget about advanced css or nested tables. But it’s actually enough for a really neat pdf file and it’s fast: The creating of 50+ page pdf files is fast enough in my case to make the on demand access of htmldoc feel like static file usage.

Please note: Calling external programs and command line tools from a web script is always a security issue and you should carefully check input and updates for the program you are using. The code provided should be easily ported to another web language/framework like Perl and Rails.

12 Gedanken zu “my package of the day – htmldoc – for converting html to pdf on the fly

  1. Pingback: PHP Coding School » Blog Archive » php code [2008-07-07 08:47:00]

  2. You could use proc_open() to feed the html as stdin through htmldoc without temporary files.

  3. hi!, I simply right clic on the htmlfile i saved on my desktop, open it with openoffice writer and export as pdf: it works quite all the time!

  4. @DurchR_PW:

    I’ll need to check if htmldoc is able to read from stdin. Thanks for the suggestion, I’ll check that.

    @Reine:

    Thank you for the hint. You are right, when you only want to convert files on a desktop. The method described is for automated processing on a server.

  5. You mean running it as a cgi script? I am afraid not as it does not render out the necessary headers as long as I know.

  6. i want to know what is the solution for print document html to pdf width footer and header,when i specified the value of footer it get me a number of pagination but i want character

  7. Hi all,

    I’m trying to use htmldoc, and it works. But I’ve some problems with the font-size option. If i change it nothing happens in the pdf.

    Could you help me maybe?

    My code :

    $passthru = ‘htmldoc –quiet –textfont helvetica \
    –bodyfont helvetica –fontsize 8 \
    –size A4 -t pdf14 \
    –webpage ‘.$filename;

    Thanks

  8. ups. the html got killed from the comment

    you should use a font tag inside the html of the document like

    font size=”-1″

  9. Can you tell me how to feed the html without a temporary file?

    I can create a pdf without a pdf temp file.
    But to achieve this I need to create a html temp file.
    Is there any solution?

    ob_start();
    system(“htmldoc –webpage –format pdf –landscape teste.html”);
    header(‘Content-Type: application/pdf’);
    ob_end_flush();

    Someone can help me? I don’t want to use teste.html , instead I want to use a string with HTML code.

    Thanks.

Hinterlasse eine Antwort

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *

Du kannst folgende HTML-Tags benutzen: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>