HTML formatting man pages
I’d like to be able to print out man pages as HTML.
Attempt 1. print man output straight to file (failed)
We can try redirecting the man output directly:
$ man grep > man/grep.txt
but that emits a whole lot of non-printable characters — it is for the shell after all.
Attempt 2. format with man2html (succeeded)
The man2html documentation states that the following should work:
$ man grep | man2html > man/grep.html
It doesn’t though (on cygwin anyway). Instead it produces a message like:
Content-type: text/html Invalid Manpage The requested file (stdin) is not a valid (unformatted) man page.
Obviously it does not understand the above formatting being output from man. It seems to be requiring unformatted output, which means the raw man file itself.
[TBD: Verify same behaviour exhibited on Linux]
Getting the right file format for man2html
1. Use man‘s t option
Try and get man to output a format understood by man2html.
$ man -t grep
as described in the manual, the -t option employs groff with postscript default:
Use /usr/bin/groff -Tps -mandoc to format the manual page, passing the output to stdout. The default output format of /usr/bin/groff -Tps -mandoc is Postscript, refer to the manual page of /usr/bin/groff -Tps -mandoc for ways to pick an alternate format.
resulting in a file like:
%!PS-Adobe-3.0 %%Creator: groff version 1.19.2 %%CreationDate: Sat Oct 24 14:57:32 2009 %%DocumentNeededResources: font Times-Roman ...
which cannot be processed by man2html. I guess I could’ve tried to change the groff format to output html directly, but I didn’t.
2. Supply raw man file to man2html
Rather than pipe the output from man, we could bypass it and send a file instead. All we need to do is locate the file on disk.
To find a man page on disk (in this case for grep), run:
$ man -w grep
Which produces:
/usr/share/man/man1/grep.1.gz
Notice the .gz extension: the man files are compressed. I am not sure if this is always the case.
So, we need to decompress grep.1.gz to get the raw man file:
$ gzip -dc $(man -w grep)
producing a file like:
.\" GNU grep man page
.if !\n(.g \{\
. if !\w|\*(lq| \{\
. ds lq ``
. if \w'\(lq' .ds lq "\(lq
. \}
. if !\w|\*(rq| \{\
. ds rq ''
. if \w'\(rq' .ds rq "\(rq
. \}
.\}
Which is the unformatted man file required by man2html.
Solution
Pretty simple really:
- Unzip the required man file
- Pass it to man2html
Here’s how:
$ gzip -dc $(man -w grep) | man2html > man/grep.html
[!] The -c option on gzip ensure the original file is preserved. If you don’t supply this, you’ll have your compressed man file replaced with its uncompressed version.
Notes
The only problem I have with that is that the documentation reads:
The man2html filter reads formatted nroff text from standard input (stdin) and writes a HTML document to standard output (stdout).
But the raw man files are not in nroff format, and if I try this:
$ gzip -dc $(man -w grep) | nroff | man2html > man/grep.nroff.html
The resultant file contains error message:
The requested file (stdin) is not a valid (unformatted) man page
References
- man — an interface to the on-line reference manuals.
- man2html — convert UNIX nroff(1) manual pages to HTML format.
- groff — front-end for the groff document formatting system.
- nroff — emulate nroff command with groff.
- troff — the troff processor of the groff text formatting system.
- troff.org
The man2html filter reads formatted nroff text from standard
input (stdin) and writes a HTML document to standard output
(stdout).
