UNIX SHELL PROGRAMMING, FOURTH EDITION
# Output files (s)
if [ $ # -gt 1 ] # Check parameter count
then
# Save the parameters entered by the user
while [ `echo $1 | cut -cl` -eq "-" ] # Flags
do
parameters="${parameters} $1" # Save flag
shift # Delete flag
done
# Check the files and save their names
while [ "$1"] # While more arguments
do
if [ -r $ 1 ] # Readable files?
then
files="${files} $1" # Save file name
else
echo "file $1 not found"
fi
shift # Delete argument
done
# Determine the input filters
tblcnt=`grep -c"^.TS" $* | cut -f2 -d: | sort -nr | line`
gathcnt=`grep -c "^~" $* | cut -f2 -d: | sort -nr | line`
if [ $tblcnt -gt 0 ]
then
inputcnt=`expr ${inputcnt:=0} + 1`
fi
if [ $gathcnt -gt 0 ]
then
inputcnt=`expr ${inputcnt:=0} + 2`
fi
case $inputcnt in
0)
inputfilters="cat $files | "
;;
1)
inputfilters="tbl $files | "
;;
2)
inputfilters="gath $files | "
;;
3)
inputfilters="gath $files | tbl | ""
;;
*)
echo "Error in output: filters $inputfilter"
easc
case $TERM in
vt100|5420|tv970
parameters="$parameters -u -r00 -rW79 "
outputfilter=" | col | sed -e "s/\b.//g" | uniq"
;;
lp|620|630)
parameters="-r08 -rW79 $parameters"
outputfilter=""
;;
ti|700|745)
parameters="-r00 -rW79 $parameters "
outputfilters=" | col"
;;
easc
eval "${inputfilter} nroff ${macros:="=cm"} ${parameters} \
{outputfilter}"
else
echo "No files specified"
fi
The actual makeup of the output command will
vary with the type of terminals in use. But the advantages of typing the following simple
command to print files should be obvious.
output document
Once formatted, documents can be spooled for output to any number of devices or line
printers.
A line printer (lp or csh: lpr spooler was added to UNIX Version 4 and
later versions. Lp allows the system administrator to define the types of
printers on the system. Each user can spool files to a spool files to a printer
without actually logging on, and the printout can be picked up later at the user's
convenience. Lp can even send the user mail when the files finishes printing.
To use lp, the printers have to be defined
and labeled by the system administrator. Then, documents can easily be spooled
by creating a command to make a few decisions for the user. The spooler must
decide whether to print an existing file or to execute a command and print its
output. Using lp (sh) or lpr (csh), documents can be formatted
with the output or pr command, and printed by the line printer:
output file | lp output file | lpr
pr -o8 file | lp pr file | lpr -i8
troff -t troff_file | lpr -t
In the first example, the output command
will format the document file and spool the resulting file. In the second example,
pr will format the text file and then print it, indented by eight spaces.
The last (csh) example formats the document using troff; lpr then
converts the troff (-t) input into PostScript for the receiving
printer. Formating requests that are more complex than those provided by
output and pr will require the use of other miscellaneous filters.
To check on the status of print jobs, just
enter the command lpstat (csh: lpq). This command checks the
queues and lets you know if your job has finished printing.
Figures Y.3 and Y.4 show the various filters available for handling documents with
nroff and troff.
Troff uses the same input filters as
nroff with one exception: cw, the constant width preprocessor.
Cw uses some additional macros to handle special output requirements.
Each of these filtes is handy for special circumstances, but most users will
rarely need them.
Figure Y.3 Nroff Input and Output Filters
Figure Y.4 Troff Input and Output Filters
There are a number of commands that work with nroff text: man, mm,
mmt, and deroff. Man formats manual pages using a unique set
of macros. It executes nroff with a variety of options to format the
output. Mm invokes nroff using the memorandum macros
(System V). It can do much of what the previously developed Shell commands, such
as output, can do, but it lacks the robustness available with the
Shell. Mmt typesets slides and viewgraphs (System V). Deroff
removes all nroff and troff macros from a file, which
occasionally can be useful.
In Berkeley systems, the formatting commands
are ms and me. Ms handles standard text processing. Me
provides multicolumn capabilities for formal papers to academia or for
publication.
Spell is one of the most useful document processing tools. It finds most of
the spelling errors in a document. Spell produces its list of errors one
word per line. If there are extensive errors, the list will scroll off the screen
quickly. A solution is to print the output of spell horizontally:
spell document | pr -4
The Writer's Workbench facility also provides some
excellent tools for examining and improving documents. Commands such
as style and prose can be beneficial to writers. Shell commands
can be created to run all of these commands against a document and print or
store the results. These Shells should be created as needed.
Let's use Shell to build an analyzer of our
own. One of the things I find most useful is to know what the keywords are in
a document-those that are repeated most frequently. These words often capture
the true content of a document. How could we do this using Shell? First, using
deroff, we would want to remove all of the nroff or troff
formatting commands:
deroff $1
Then, we would want to transform the file in the following ways:
- Convert uppercase to lowercase (so that Troff and
troff would be counted as the same world).
- Convert blanks and tabs to newlines (to create one word per
line).
- Convert all punctuation into newlines (so that "sentence." and
"sentence" will be counted as the same word).
We can accomplish all of this using tr:
deroff $1 | \
tr [A-Z \t#',.;:()] \
[a-z\012\012\012\012\012\012\012\012\012\012]
Next, we can select just the words (using grep, sort the word list
in descending numeric order, and then count words using uniq:
deroff $1 | \
tr [A-Z \t#',.;:()]\
[a-z\012\012\012\012\012\012\012\012\012\012] \
grep [a-z] | sort -nr | uniq -c
If we executed this command, we'd get a listing of keywords that includes the
following:
127 the
86 a
32 an
31 shell
27 programming
Words such as the and a are the most common words in the English
language. We might want to use sed to eliminate these words. This will
help illuminate the true keywords. Similarly, the numbers are only useful to get
the file in order. We can use cut to eliminate them:
deroff $1 | \
tr [A-Z \t#',.;:()]\
[a-z\012\012\012\012\012\012\012\012\012\012] \
grep [a-z] | sort -nr | uniq -c
cut -c6- | sed -f conjunctions
The file, conjunctions, would contain entries for all filler words-a, an,
and, and the. It would tell sed to delete them as follows:
/^a$/d
/^and$/d
/^but$/d
/^the$/d
. . .
With all of the filter words out of the way, we might want to capture just the
first ten keywords. We could use head to do this for us:
deroff $1 | \
tr [A-Z \t#',.;:()]\
[a-z\012\012\012\012\012\012\012\012\012\012] \
grep [a-z] | sort -nr | uniq -c | \
cut -c6- | sed -f conjunctions | \
head
Using this command on this chapter, we might get the following results:
document
documents
nroff
documentation
troff
shell
printer
terminal
From these results you'll notice that keyword
cannot differentiate between the singular and plural words (document and documents).
To overcome this, we could write another Shell program, plural, to extract
words that end in s:
# plural
cat $* > /tmp /tmp$$
sed -e "s/s$//" /tmp/tmp$$ | sort | uniq -d> /tmp/tmp1$$
# Find plurals
if [ -s /tmp/tmp1$$ ] # Plurals were found!
then # Create sed file to delete them
sed -e "s/^/\//" -e "s/$/s\/d/"/tmp/tmp1$$ >/tmp/tmp2$$
else
cat /tmp/tmp$$
fi
rm -f /tmp/tmp*ss
These commands, keyword and plural, can
dramatically aid the development of an indes for a book, or a reference index for
an information archive. First you figure out what is most important and then you
combine them into a beginning table of contents:
for file in chapter*
do
keyword $file >>/tmp/tmp$$
sort /tmp/tmp$$ > index
rm /tmp/tmp$$
done
This is just a simple example of the ways that documents can be examined and evaluated
using Shell. I hope it has opened your eyes to the possibilities inherent in the
flexibility of the Shell.
Shell provides some excellent tools for handling
documents and preparing them for output on the wide variety of devices available to
UNIX. Connecting all of these tools can improve efficiency and reliability: Users type
one simple command and it determines how to format each document. Commands can
easily be created for each set of nroff or troff macros.
Documentation is one of the strenghts of UNIX. Full-screen
word processors have largely displayed nroff and troff, but there are
still hundreds of thousands of nroff users who need simple interfaces
to its facilities. Use the Shell to fill those needs.
|
 |
ISBN 0471168947
Wiley
Computer Publishing
Timely. Practical. Reliable.
|