search - find keys using fast access methods
SYNOPSIS
search [-m[sbihr]] [-fc -Dn -h -s] [-l [2> location]]
search -m[sbh] [[options] -n -x +x -i] tableorlist keycol... < keytable
search -mi [[options] -n -x +x -i] tableorlist < keytable
search -mr [options] tableorlist < record_numbers
DESCRIPTION
search
finds a row quickly
by searching
tableorlist
for a row or rows
which match the key(s) supplied on the standard input.
search
It is a high speed join in which one table is only input keys and
the output is only the rows of the other table that match.
The table or list file must have been previously indexed using the /rdb
index
command.
If the table or list file is not large (under several hundred
rows), it is probably not worth the effort to index it:
in that case, use
row.
search
is for high speed or interactive access using indexed keys, while
row
is slower, but can retrieve rows that match complex logical conditions
and regular expressions.
search
reads its input in three ways:
(1) interactively, from the keyboard;
(2) from a
keytable,
using the
"`< file'"
form; and
(3) from a pipe.
Exactly how
search
looks for keys depends upon the method specified on the command line.
For some methods it first searches
another table called a secondary index file that
was built by the
index
program.
In one case, sequential,
search
looks at every row.
To determine which method to use,
try them all and
use the UNIX
time
command or the
timesearch
shell script
to find out how long each method takes.
METHOD OPTIONS
- -mr
- record number.
Record numbers are integers which specify
the offset in the file of the row to retrieve.
For example, if the record number is 5,
search -mr
will get the fifth row in the file.
search
uses a secondary file that is fixed length and
has the offset for each record in the big table.
The record method is a fast way to retrieve a row,
but only if its position in the file is known and maintained.
Another program,
record,
does the same thing without needing an index,
but it's slower on large files.
Because
search
is looking for rows by
position rather than content, the
-n, -x, +x,
and
-i
options do not apply.
- -ms
- sequential or linear.
A sequential search looks at each record in
the big table in first row to last row order.
It is the default method and is the same as using no
-m?
switch at all.
It is provided as a benchmark for timing other methods.
If the file is not big enough to
make the other access methods worthwhile, use sequential
search
or
row
to save the overhead of
index.
Sequential can search multiple key columns,
and any column can be searched.
Searching with the
sequential method is different from general
UNIX pattern matching programs like
grep,
or
bm
because it compares chosen columns instead of
scanning entire rows.
- -mb
- binary method. A binary search first looks
at the middle record of the file.
If the key column value is greater than the key that
search
is looking for,
search
will look at the record one quarter of the way through the file.
search
will continue to divide the part of the file it looks at
in half, until the row is found, or not.
The binary method
requires that the file be sorted on the key column.
If
tableorlist
is to be searched using the
-x, +x
or
-n
switches, then those switches must also
be used with the
index
command prior to searching.
The following table shows how many file accesses
are required for files with various numbers of rows --
from small to huge.
Accesses Records Number
-------- ------- ------
10 thousand 1,000
20 million 1,000,000
30 billion 1,000,000,000
40 trillion 1,000,000,000,000
- -mi
- inverted method. An inverted search uses a secondary
file created by the
index
program.
This index file has two columns.
One is the key column from
the table which is sorted by the index program so that
search
can do a binary search for the key.
If
the key is found,
search
looks at the adjacent offset column for the address of the row
in the big table. With this address,
search
retrieves the row in a single disk read.
The inverted method takes a little longer than binary,
but sorting the index file is much faster than sorting the big
file.
If
tableorlist
is to be searched using the
-x, +x
or
-n
switches, then those switches must also
be used with the
index
command which creates the
tableorlist.i
secondary file.
- -mh
- hash method.
A hash search computes an address from the key and
looks in the secondary file at that address, where it
finds an offset in the main file.
search
retrieves the row at the main file offset and
compares the keys.
If they match, the row is printed.
If the offset is zero,
then the keys do not match and the row is not in the file.
If another key hashes to the same address (called a
collision),
then
search
will keep looking at
subsequent entries in the secondary table until it
finds a match, or a zero offset.
A hash search usually takes two disk accesses when the
hash table is half full.
This is why hash tables are set up to be
twice as large as the number of records we are hashing.
As the data table grows, it should be reindexed to keep
the hash speed down.
Hash is a good method for tables that must remain static:
that is, they should not change significantly over time.
It is important that other options used with the
index
command which creates the hash table must also be used
in the search command.
This is because the input key is defined by the option
prior to hashing: for example, if
-x
is specified, then the key is stripped of excess blanks
and converted to lower case before hashing.
If
-x
is not used in the subsequent
search
command, then the hash value will be different and
the key will not be found, even though it exists.
This applies to the
+x
and
-n.
options as well.
The
-i
switch is the exception to this rule:
it may be specified on the
search
command line to suppress printing of blank keys
even if it was not used on the
index
command line.
OTHER OPTIONS
- -fc
- use c (instead of TAB) as the input column
separation character.
- -Dn
- turn on debugging at level n, or 1 if n
is not specified.
- -h
- suppress printing of the table head line.
- -s
- only a single row is searched for. This will speed up
the search because it quits as soon as the first match is found.
- -x
- allow partial matching and ignore case (like Berkeley
look).
By default, non-blank values must match exactly.
- +x
- exact match.
The default mode ignores blanks when comparing keys; using
+x
forces byte-for-byte matching of key and column contents.
- -i
- ignore blank key matches.
- -l
- print the location of the record and also the
location of the index record in the index table.
This location, in the form:
"start end xstart xend,"
can be used by
replace,
lock
and
unlock.
By default,
search
prints this location on the
"standard error"
to separate it from the output table.
seek
is a shell program which
uses this option to find the location and output the record
into another file at the same time.
Turning on this option also turns on the
-s
option because only one row (per user)
may be locked and/or replaced at a time.
EXAMPLE
$ cat inventory
Item Amount Cost Value Desc
---- ------ ---- ----- -----------
1 4 50.1 rubber gloves
03 23 19 437 plates
001 100 5.23 500 test tubes
2 99 24 2376 cleaning cloth
4.0 89 147 13083 bunsen burners
005 5 175 875 scales
1.000 5 80 400 clamps
$ index -mi -n inventory Item
$ cat inventory.i
Offset Item
--------- -------
63 1
107 001
219 1.000
135 2
87 03
166 4.0
198 005
$ echo 1 | search -mi -n inventory Item
Item Amount Cost Value Desc
---- ------ ---- ----- -----------
1 4 50.1 rubber gloves
001 100 5.23 500 test tubes
1.000 5 80 400 clamps
SEE ALSO