Ben Biddington

Whatever it is, it's not about "coding"

Posts Tagged ‘bash

Git — remove all deleted files in one line

with one comment

I got sick of manually removing deleted files, this makes it easier:

$ git rm $(git status | grep deleted: | awk '{ print $3 }')
Advertisements

Written by benbiddington

3 January, 2010 at 13:15

Posted in development

Tagged with , ,

HTML formatting man pages

leave a comment »

I’d like to be able to print out man pages as HTML.

Attempt 1. print man output straight to file (failed)

We can try redirecting the man output directly:

$ man grep > man/grep.txt

but that emits a whole lot of non-printable characters — it is for the shell after all.

Attempt 2. format with man2html (succeeded)

The man2html documentation states that the following should work:

$ man grep | man2html > man/grep.html

It doesn’t though (on cygwin anyway). Instead it produces a message like:

Content-type: text/html
Invalid Manpage

The requested file (stdin) is not a valid (unformatted) man page.

Obviously it does not understand the above formatting being output from man. It seems to be requiring unformatted output, which means the raw man file itself.

[TBD: Verify same behaviour exhibited on Linux]

Getting the right file format for man2html

1. Use man‘s t option

Try and get man to output a format understood by man2html.

$ man -t grep

as described in the manual, the -t option employs groff with postscript default:

Use /usr/bin/groff -Tps -mandoc to format the manual page, passing the output to stdout. The default output format of /usr/bin/groff -Tps -mandoc is Postscript, refer to the manual page of /usr/bin/groff -Tps -mandoc for ways to pick an alternate format.

resulting in a file like:

%!PS-Adobe-3.0
%%Creator: groff version 1.19.2
%%CreationDate: Sat Oct 24 14:57:32 2009
%%DocumentNeededResources: font Times-Roman
...

which cannot be processed by man2html. I guess I could’ve tried to change the groff format to output html directly, but I didn’t.

2. Supply raw man file to man2html

Rather than pipe the output from man, we could bypass it and send a file instead. All we need to do is locate the file on disk.

To find a man page on disk (in this case for grep), run:

$ man -w grep

Which produces:

/usr/share/man/man1/grep.1.gz

Notice the .gz extension: the man files are compressed. I am not sure if this is always the case.

So, we need to decompress grep.1.gz to get the raw man file:

$ gzip -dc $(man -w grep)

producing a file like:

.\" GNU grep man page
.if !\n(.g \{\
.	if !\w|\*(lq| \{\
.		ds lq ``
.		if \w'\(lq' .ds lq "\(lq
.	\}
.	if !\w|\*(rq| \{\
.		ds rq ''
.		if \w'\(rq' .ds rq "\(rq
.	\}
.\}

Which is the unformatted man file required by man2html.

Solution

Pretty simple really:

  1. Unzip the required man file
  2. Pass it to man2html

Here’s how:

$ gzip -dc $(man -w grep) | man2html > man/grep.html

[!] The -c option on gzip ensure the original file is preserved. If you don’t supply this, you’ll have your compressed man file replaced with its uncompressed version.

Notes

The only problem I have with that is that the documentation reads:

The man2html filter reads formatted nroff text from standard input (stdin) and writes a HTML document to standard output (stdout).

But the raw man files are not in nroff format, and if I try this:

$ gzip -dc $(man -w grep) | nroff | man2html > man/grep.nroff.html

The resultant file contains error message:

The requested file (stdin) is not a valid (unformatted) man page

References

  • man — an interface to the on-line reference manuals.
  • man2html — convert UNIX nroff(1) manual pages to HTML format.
  • groff — front-end for the groff document formatting system.
  • nroff — emulate nroff command with groff.
  • troff — the troff processor of the groff text formatting system.
  • troff.org
The man2html filter reads formatted nroff text from standard
     input (stdin) and writes a HTML document to standard output
     (stdout).

Written by benbiddington

25 October, 2009 at 13:37

Posted in development

Tagged with , , , , , , ,

Grepping lines with multiple matches

with one comment

I want to filter log files by matching lines that contains all of  set of matches (I don’t want to alternate them).

For example lines like:

RawURL /1.2/user/authorize?ip=10.254.142.175&method=GET

I want to match lines that contain the terms “RawURL” and “/1.2/user/authorize?” without resorting to piping multiple greps:

$ grep RawURL | grep /1.2/user/authorize? *.txt

This means I want a regexp like:

RawURL.+\/1.2\/user\/authorize

Translating to grep extended regex, this becomes:

$ grep -E RawURL.+/1.2/user/authorize *.txt

Note: There is no need to enforce non-greedy matching.

Regular expressions, character classes and special characters

The expression:

RawURL.+\/1.2\/user\/authorize

Is not equivalent to:

RawURL[.]+\/1.2\/user\/authorize

Because in the second one, the special character ‘.’ is no longer special. Character classes have all special characters turned off.

Written by benbiddington

23 October, 2009 at 13:37

Posted in development

Tagged with , , ,