Ben Biddington

Whatever it is, it's not about "coding"

HTML formatting man pages

leave a comment »

I’d like to be able to print out man pages as HTML.

Attempt 1. print man output straight to file (failed)

We can try redirecting the man output directly:

$ man grep > man/grep.txt

but that emits a whole lot of non-printable characters — it is for the shell after all.

Attempt 2. format with man2html (succeeded)

The man2html documentation states that the following should work:

$ man grep | man2html > man/grep.html

It doesn’t though (on cygwin anyway). Instead it produces a message like:

Content-type: text/html
Invalid Manpage

The requested file (stdin) is not a valid (unformatted) man page.

Obviously it does not understand the above formatting being output from man. It seems to be requiring unformatted output, which means the raw man file itself.

[TBD: Verify same behaviour exhibited on Linux]

Getting the right file format for man2html

1. Use man‘s t option

Try and get man to output a format understood by man2html.

$ man -t grep

as described in the manual, the -t option employs groff with postscript default:

Use /usr/bin/groff -Tps -mandoc to format the manual page, passing the output to stdout. The default output format of /usr/bin/groff -Tps -mandoc is Postscript, refer to the manual page of /usr/bin/groff -Tps -mandoc for ways to pick an alternate format.

resulting in a file like:

%!PS-Adobe-3.0
%%Creator: groff version 1.19.2
%%CreationDate: Sat Oct 24 14:57:32 2009
%%DocumentNeededResources: font Times-Roman
...

which cannot be processed by man2html. I guess I could’ve tried to change the groff format to output html directly, but I didn’t.

2. Supply raw man file to man2html

Rather than pipe the output from man, we could bypass it and send a file instead. All we need to do is locate the file on disk.

To find a man page on disk (in this case for grep), run:

$ man -w grep

Which produces:

/usr/share/man/man1/grep.1.gz

Notice the .gz extension: the man files are compressed. I am not sure if this is always the case.

So, we need to decompress grep.1.gz to get the raw man file:

$ gzip -dc $(man -w grep)

producing a file like:

.\" GNU grep man page
.if !\n(.g \{\
.	if !\w|\*(lq| \{\
.		ds lq ``
.		if \w'\(lq' .ds lq "\(lq
.	\}
.	if !\w|\*(rq| \{\
.		ds rq ''
.		if \w'\(rq' .ds rq "\(rq
.	\}
.\}

Which is the unformatted man file required by man2html.

Solution

Pretty simple really:

  1. Unzip the required man file
  2. Pass it to man2html

Here’s how:

$ gzip -dc $(man -w grep) | man2html > man/grep.html

[!] The -c option on gzip ensure the original file is preserved. If you don’t supply this, you’ll have your compressed man file replaced with its uncompressed version.

Notes

The only problem I have with that is that the documentation reads:

The man2html filter reads formatted nroff text from standard input (stdin) and writes a HTML document to standard output (stdout).

But the raw man files are not in nroff format, and if I try this:

$ gzip -dc $(man -w grep) | nroff | man2html > man/grep.nroff.html

The resultant file contains error message:

The requested file (stdin) is not a valid (unformatted) man page

References

  • man — an interface to the on-line reference manuals.
  • man2html — convert UNIX nroff(1) manual pages to HTML format.
  • groff — front-end for the groff document formatting system.
  • nroff — emulate nroff command with groff.
  • troff — the troff processor of the groff text formatting system.
  • troff.org
The man2html filter reads formatted nroff text from standard
     input (stdin) and writes a HTML document to standard output
     (stdout).
Advertisements

Written by benbiddington

25 October, 2009 at 13:37

Posted in development

Tagged with , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: