wiley-logo-sm.gif
> wiley.com

UNIX SHELL PROGRAMMING, FOURTH EDITION

Appendix Y - Nroff and Troff - Continued

  # Output files (s)
  if [ $ # -gt 1 ]         # Check parameter count
  then
  # Save the parameters entered by the user 
     while [ `echo $1 | cut -cl` -eq "-" ]         # Flags
     do 
      parameters="${parameters} $1"    # Save flag
      shift                            # Delete flag
     done 
  # Check the files and save their names
     while [ "$1"]             # While more arguments
     do
      if [ -r $ 1 ]            # Readable files?
      then
       files="${files} $1"     # Save file name
      else
       echo "file $1 not found"
       fi
       shift                   # Delete argument
      done
  # Determine the input filters
      tblcnt=`grep -c"^.TS" $* | cut -f2 -d: | sort -nr | line`
      gathcnt=`grep -c "^~" $* | cut -f2 -d: | sort -nr | line`
      if [ $tblcnt -gt 0 ]
      then
        inputcnt=`expr ${inputcnt:=0} + 1`
      fi
      if [ $gathcnt -gt 0 ]
      then
        inputcnt=`expr ${inputcnt:=0} + 2`
      fi
      case $inputcnt in
      0)
        inputfilters="cat $files | "
        ;;
      1)
        inputfilters="tbl $files | "
        ;;
      2)
        inputfilters="gath $files | "
        ;;
      3)
        inputfilters="gath $files | tbl | ""
        ;;
   
      *)
       echo "Error in output: filters $inputfilter"
      easc
      case $TERM in
      vt100|5420|tv970
        parameters="$parameters -u -r00 -rW79 "
        outputfilter=" | col | sed -e "s/\b.//g" | uniq"
        ;;
       lp|620|630)
        parameters="-r08 -rW79 $parameters"
        outputfilter=""
        ;;
       ti|700|745)
        parameters="-r00 -rW79 $parameters "
        outputfilters=" | col"
        ;;
      easc
      eval "${inputfilter} nroff ${macros:="=cm"} ${parameters} \
        {outputfilter}"
  else
      echo "No files specified"
  fi

The actual makeup of the output command will vary with the type of terminals in use. But the advantages of typing the following simple command to print files should be obvious.

  output document

Once formatted, documents can be spooled for output to any number of devices or line printers.

SPOOLING DOCUMENTS

A line printer (lp or csh: lpr spooler was added to UNIX Version 4 and later versions. Lp allows the system administrator to define the types of printers on the system. Each user can spool files to a spool files to a printer without actually logging on, and the printout can be picked up later at the user's convenience. Lp can even send the user mail when the files finishes printing. To use lp, the printers have to be defined and labeled by the system administrator. Then, documents can easily be spooled by creating a command to make a few decisions for the user. The spooler must decide whether to print an existing file or to execute a command and print its output. Using lp (sh) or lpr (csh), documents can be formatted with the output or pr command, and printed by the line printer:

  output file | lp        output file | lpr
  pr -o8 file | lp        pr file | lpr -i8
                          troff -t troff_file | lpr -t

In the first example, the output command will format the document file and spool the resulting file. In the second example, pr will format the text file and then print it, indented by eight spaces. The last (csh) example formats the document using troff; lpr then converts the troff (-t) input into PostScript for the receiving printer. Formating requests that are more complex than those provided by output and pr will require the use of other miscellaneous filters. To check on the status of print jobs, just enter the command lpstat (csh: lpq). This command checks the queues and lets you know if your job has finished printing.

MISCELLANEOUS DOCUMENTATION FILTERS

Figures Y.3 and Y.4 show the various filters available for handling documents with nroff and troff. Troff uses the same input filters as nroff with one exception: cw, the constant width preprocessor. Cw uses some additional macros to handle special output requirements. Each of these filtes is handy for special circumstances, but most users will rarely need them.

Figure Y.3 Nroff Input and Output Filters

Figure Y.4 Troff Input and Output Filters

MISCELLANEOUS COMMANDS

There are a number of commands that work with nroff text: man, mm, mmt, and deroff. Man formats manual pages using a unique set of macros. It executes nroff with a variety of options to format the output. Mm invokes nroff using the memorandum macros (System V). It can do much of what the previously developed Shell commands, such as output, can do, but it lacks the robustness available with the Shell. Mmt typesets slides and viewgraphs (System V). Deroff removes all nroff and troff macros from a file, which occasionally can be useful.
In Berkeley systems, the formatting commands are ms and me. Ms handles standard text processing. Me provides multicolumn capabilities for formal papers to academia or for publication.

Document Analysis

Spell is one of the most useful document processing tools. It finds most of the spelling errors in a document. Spell produces its list of errors one word per line. If there are extensive errors, the list will scroll off the screen quickly. A solution is to print the output of spell horizontally:

  spell document | pr -4

The Writer's Workbench facility also provides some excellent tools for examining and improving documents. Commands such as style and prose can be beneficial to writers. Shell commands can be created to run all of these commands against a document and print or store the results. These Shells should be created as needed. Let's use Shell to build an analyzer of our own. One of the things I find most useful is to know what the keywords are in a document-those that are repeated most frequently. These words often capture the true content of a document. How could we do this using Shell? First, using deroff, we would want to remove all of the nroff or troff formatting commands:

  deroff $1

Then, we would want to transform the file in the following ways:
  • Convert uppercase to lowercase (so that Troff and troff would be counted as the same world).
  • Convert blanks and tabs to newlines (to create one word per line).
  • Convert all punctuation into newlines (so that "sentence." and "sentence" will be counted as the same word).

We can accomplish all of this using tr:


  deroff $1 | \
  tr [A-Z \t#',.;:()] \
    [a-z\012\012\012\012\012\012\012\012\012\012]

Next, we can select just the words (using grep, sort the word list in descending numeric order, and then count words using uniq:

  deroff $1 | \
  tr [A-Z \t#',.;:()]\
    [a-z\012\012\012\012\012\012\012\012\012\012] \
    grep [a-z] | sort -nr | uniq -c

If we executed this command, we'd get a listing of keywords that includes the following:
      127 the
      86 a
      32 an
      31 shell
      27 programming   
Words such as the and a are the most common words in the English language. We might want to use sed to eliminate these words. This will help illuminate the true keywords. Similarly, the numbers are only useful to get the file in order. We can use cut to eliminate them:

  deroff $1 | \
  tr [A-Z \t#',.;:()]\
    [a-z\012\012\012\012\012\012\012\012\012\012] \
    grep [a-z] | sort -nr | uniq -c
    cut -c6- | sed -f conjunctions

The file, conjunctions, would contain entries for all filler words-a, an, and, and the. It would tell sed to delete them as follows:

  /^a$/d
  /^and$/d
  /^but$/d
  /^the$/d	
  . . .

With all of the filter words out of the way, we might want to capture just the first ten keywords. We could use head to do this for us:

  deroff $1 | \
  tr [A-Z \t#',.;:()]\
    [a-z\012\012\012\012\012\012\012\012\012\012] \
    grep [a-z] | sort -nr | uniq -c | \
    cut -c6- | sed -f conjunctions | \
    head

Using this command on this chapter, we might get the following results:
  document
  documents
  nroff
  documentation
  troff
  shell
  printer
  terminal
From these results you'll notice that keyword cannot differentiate between the singular and plural words (document and documents). To overcome this, we could write another Shell program, plural, to extract words that end in s:

  # plural
  cat $* > /tmp /tmp$$
  sed -e "s/s$//" /tmp/tmp$$ | sort | uniq -d> /tmp/tmp1$$
  # Find plurals
  if [ -s /tmp/tmp1$$ ]   # Plurals were found!
  then    # Create sed file to delete them
   sed -e "s/^/\//" -e "s/$/s\/d/"/tmp/tmp1$$ >/tmp/tmp2$$
  else
   cat /tmp/tmp$$
  fi
  rm -f /tmp/tmp*ss

These commands, keyword and plural, can dramatically aid the development of an indes for a book, or a reference index for an information archive. First you figure out what is most important and then you combine them into a beginning table of contents:

  for file in chapter*
  do
    keyword $file >>/tmp/tmp$$
    sort /tmp/tmp$$ > index
    rm /tmp/tmp$$
    done

This is just a simple example of the ways that documents can be examined and evaluated using Shell. I hope it has opened your eyes to the possibilities inherent in the flexibility of the Shell. Shell provides some excellent tools for handling documents and preparing them for output on the wide variety of devices available to UNIX. Connecting all of these tools can improve efficiency and reliability: Users type one simple command and it determines how to format each document. Commands can easily be created for each set of nroff or troff macros. Documentation is one of the strenghts of UNIX. Full-screen word processors have largely displayed nroff and troff, but there are still hundreds of thousands of nroff users who need simple interfaces to its facilities. Use the Shell to fill those needs.
Cover

ISBN 0471168947

Wiley Computer Publishing
Timely. Practical. Reliable.

[ Home ] [ Appendix X - The Shell Filter Builder ] [ Appendix Y - Nroff and Troff ] [ Appendix Y - Nroff and Troff - continued ] [ Appendix Z - Regular Expressions ]