Ben Biddington

Whatever it is, it's not about "coding"

Archive for October 2009

HTML formatting man pages

leave a comment »

I’d like to be able to print out man pages as HTML.

Attempt 1. print man output straight to file (failed)

We can try redirecting the man output directly:

$ man grep > man/grep.txt

but that emits a whole lot of non-printable characters — it is for the shell after all.

Attempt 2. format with man2html (succeeded)

The man2html documentation states that the following should work:

$ man grep | man2html > man/grep.html

It doesn’t though (on cygwin anyway). Instead it produces a message like:

Content-type: text/html
Invalid Manpage

The requested file (stdin) is not a valid (unformatted) man page.

Obviously it does not understand the above formatting being output from man. It seems to be requiring unformatted output, which means the raw man file itself.

[TBD: Verify same behaviour exhibited on Linux]

Getting the right file format for man2html

1. Use man‘s t option

Try and get man to output a format understood by man2html.

$ man -t grep

as described in the manual, the -t option employs groff with postscript default:

Use /usr/bin/groff -Tps -mandoc to format the manual page, passing the output to stdout. The default output format of /usr/bin/groff -Tps -mandoc is Postscript, refer to the manual page of /usr/bin/groff -Tps -mandoc for ways to pick an alternate format.

resulting in a file like:

%!PS-Adobe-3.0
%%Creator: groff version 1.19.2
%%CreationDate: Sat Oct 24 14:57:32 2009
%%DocumentNeededResources: font Times-Roman
...

which cannot be processed by man2html. I guess I could’ve tried to change the groff format to output html directly, but I didn’t.

2. Supply raw man file to man2html

Rather than pipe the output from man, we could bypass it and send a file instead. All we need to do is locate the file on disk.

To find a man page on disk (in this case for grep), run:

$ man -w grep

Which produces:

/usr/share/man/man1/grep.1.gz

Notice the .gz extension: the man files are compressed. I am not sure if this is always the case.

So, we need to decompress grep.1.gz to get the raw man file:

$ gzip -dc $(man -w grep)

producing a file like:

.\" GNU grep man page
.if !\n(.g \{\
.	if !\w|\*(lq| \{\
.		ds lq ``
.		if \w'\(lq' .ds lq "\(lq
.	\}
.	if !\w|\*(rq| \{\
.		ds rq ''
.		if \w'\(rq' .ds rq "\(rq
.	\}
.\}

Which is the unformatted man file required by man2html.

Solution

Pretty simple really:

  1. Unzip the required man file
  2. Pass it to man2html

Here’s how:

$ gzip -dc $(man -w grep) | man2html > man/grep.html

[!] The -c option on gzip ensure the original file is preserved. If you don’t supply this, you’ll have your compressed man file replaced with its uncompressed version.

Notes

The only problem I have with that is that the documentation reads:

The man2html filter reads formatted nroff text from standard input (stdin) and writes a HTML document to standard output (stdout).

But the raw man files are not in nroff format, and if I try this:

$ gzip -dc $(man -w grep) | nroff | man2html > man/grep.nroff.html

The resultant file contains error message:

The requested file (stdin) is not a valid (unformatted) man page

References

  • man — an interface to the on-line reference manuals.
  • man2html — convert UNIX nroff(1) manual pages to HTML format.
  • groff — front-end for the groff document formatting system.
  • nroff — emulate nroff command with groff.
  • troff — the troff processor of the groff text formatting system.
  • troff.org
The man2html filter reads formatted nroff text from standard
     input (stdin) and writes a HTML document to standard output
     (stdout).

Written by benbiddington

25 October, 2009 at 13:37

Posted in development

Tagged with , , , , , , ,

Grepping lines with multiple matches

with one comment

I want to filter log files by matching lines that contains all of  set of matches (I don’t want to alternate them).

For example lines like:

RawURL /1.2/user/authorize?ip=10.254.142.175&method=GET

I want to match lines that contain the terms “RawURL” and “/1.2/user/authorize?” without resorting to piping multiple greps:

$ grep RawURL | grep /1.2/user/authorize? *.txt

This means I want a regexp like:

RawURL.+\/1.2\/user\/authorize

Translating to grep extended regex, this becomes:

$ grep -E RawURL.+/1.2/user/authorize *.txt

Note: There is no need to enforce non-greedy matching.

Regular expressions, character classes and special characters

The expression:

RawURL.+\/1.2\/user\/authorize

Is not equivalent to:

RawURL[.]+\/1.2\/user\/authorize

Because in the second one, the special character ‘.’ is no longer special. Character classes have all special characters turned off.

Written by benbiddington

23 October, 2009 at 13:37

Posted in development

Tagged with , , ,

Creating a wave robot

leave a comment »

Download Eclipse Java EE IDE, and follow the instructions for getting all of the AppEngine plugins. It’s important to get these because it simplifies the deployment process to single click.

  1. Create a new web application project.
  2. Download all of the Wave Robot Java Client Library jars from here.
    1. Copy them to your /war/WEB-INF/lib directory.
    2. Reference them (Alt + Enter > Java Build Path > Libraries > Add JARS…).
      If successful, they’ll appear in a Referenced Libraries node in Package Explorer.

Edit web.xml

Add an endpoint for Wave to post to:

<servlet>
    <servlet-name>SearchServlet</servlet-name>
    <servlet-class>org.coriander.wave.sevendigital.servlets.search.SearchServlet</servlet-class>
</servlet>
<servlet-mapping>
    <servlet-name>SearchServlet</servlet-name>
    <url-pattern>/_wave/robot/jsonrpc</url-pattern>
</servlet-mapping>

We now have a handler for incoming wave requests. All wave requests will be routed to SearchServlet.

Add capabilities.xml

Registers servlets for wave events, add one to your /war/_wave directory:

<?xml version="1.0" encoding="utf-8"?>
<w:robot xmlns:w="http://wave.google.com/extensions/robots/1.0">
  <w:capabilities>
    <w:capability name="WAVELET_PARTICIPANTS_CHANGED" content="true" />
    <w:capability name="BLIP_SUBMITTED" content="true" />
  </w:capabilities>
  <w:version>1</w:version>
</w:robot>

Implementing a robot servlet

Very easy, derive from com.google.wave.api.AbstractRobotServlet, which has a single method:

import com.google.wave.api.AbstractRobotServlet;
import com.google.wave.api.RobotMessageBundle;

public class SearchServlet extends AbstractRobotServlet {
    @Override
    public void processEvents(RobotMessageBundle bundle) { }
}

There are numerous tutorials around describing what to do next.

Setting robot profile image

This amounts to adding the root /_wave/robot/profile endpoint. There easiest way to do this is to derive a servlet from com.google.wave.api.ProfileServlet, and override any of the methods you choose.

To set the avatar image, override getRobotAvatarUrl:

import com.google.wave.api.ProfileServlet;
...
public class Profile extends ProfileServlet {
    @Override
    public String getRobotAvatarUrl() {
        return "http://coriander-7digital.appspot.com/_wave/coriander-7digital.png;
    }
}

And then place your image your /war/_wave directory. There are other useful settings like display name which can also be provided by override. The default content type of ProfileServlet-derive types is application/json.

Don’t forget to register the new servlet by updating web.xml.

Deployment

Very simple, it’s all done by the IDE.

References

Written by benbiddington

18 October, 2009 at 10:53