GLIMPSE
A tool to search entire file systems
Introduction
Glimpse is a very powerful indexing and query system that allows you to
search through all your files very quickly. It can be used by
individuals for their personal file systems as well as by organizations
for large data collections. Glimpse is the default search engine in
Harvest.
Glimpse is now at
version 3.0, which
improves the
original version in many ways.
The Glimpse package contains several programs, the most important of which
are glimpse, glimpseindex, agrep, and glimpseserver.
To index all files in the a directory tree rooted at DIR, you simply say
glimpseindex DIR
(E.g., glimpseindex ~ indexes all your files.)
Afterwards, glimpse can
search through all these files much the same way as agrep (or any other
grep), except that you don't have to specify file names and the search
is fast. For example,
glimpse -1 unbelievable
will find all occurrences (in all your files!) of "unbelievable"
allowing one spelling error;
glimpse -F mail arizona
will find all occurrences of "arizona" in all files with "mail" somewhere
in their name;
glimpse 'Arizona desert;windsurfing'
will find all lines that contain both "Arizona desert" and "windsurfing".
Glimpse supports three types of indexes: a tiny one (2-3% of the
size of all files), a small one (7-9%), and a medium one (20-30%).
The larger the index the faster the search.
For most applications, the small index (glimpseindex -o) is the best choice.
Glimpse supports most of agrep's options (agrep is our powerful version
of grep, and it is part of glimpse) including approximate matching
(e.g., finding misspelled words), Boolean queries, and even some
limited forms of regular expressions.
Demos
Documentation
Software
GlimpseHTTP
GlimpseHTTP
is a collection of tools that allows you to use
Glimpse to search your files using HTTP interface.
You find it as a good alternative for WAIS search scripts.
To be put on glimpse mailing list, send mail to
glimpse-request@cs.arizona.edu
This is the ``official'' version 3.0
2.1 ---> 3.0
- added a data structure (in .glimpse_turbo) that speeds up queries
using -w and -i considerably for large indexes. It is meant mostly for
servers using glimpse (e.g., Harvest and glimpseHTTP servers),
but it benefits everyone. With this "turbo" option, typical queries
take less than a second even for very large indexes.
This was so successful that we made it the default rather than an
option (it used to be -T in some earlier versions).
If the .glimpse_turbo file is deleted, glimpse will still work properly
(but glimpseindex -f and -a require it).
- incremental indexing is now fully supported (even for -b). Deletion
from the index is supported. glimpseindex -d filename(s) completely
deletes the files from the index; glimpseindex -D filename(s) deletes
the files only from the file list.
- the index has been improved in several ways (transparently except for
speed and space). As a result, indices built with earlier versions of
glimpseindex will not work with 3.0 -- you must reindex again.
- several options were added to glimpseindex and glimpse:
-
glimpseindex -E indexes all files without attempting to run the filetype
filtering (but excluded files or suffixes still apply).
-
glimpse -Q extends -N in a nice way giving much more information about
the matches in the index.
-
glimpse -L has more options: -L x | x:y | x:y:z
if one number is given, it is a limit on the total number of matches.
Glimpse outputs only the first x matches.
If two numbers are given (x:y), then y is an added limit on the total
number of files.
If three numbers are given (x:y:z), then z is an added limit on the
number of matches per file.
If any of the x, y, or z is set to 0, it means to ignore it
(in other words 0 = infinity in this case); for example,
-L 0:10 will output all matches to the first 10 files that
contain a match.
(There are also some undocumented-as-yet options. We are running out
of letters. Only -j and -Y are not used!)
- glimpse 3.0b still has a LOT of makefiles (one per architecture / OS).
We hope to include autoconf support for glimpse in the future:
but these should be sufficient for most purposes.
- glimpseserver restarts by exec'ing itself after 20sec delay when the
signal SIGUSR2 (31) is given, so that there is no need to startup a new
glimpseserver after each indexing (the delay is for the OS to garbage
collect the TCP port).
- added some changes to improve support for ISO 8859 Latin char set
- several bugs were fixed, and the whole package is now more portable.
Binaries and make files for the following platforms are now available:
AIX-3.2.5, HPPA, HPMC68K, IBM-RS6000, Linux, SGI. (Binaries and make
files for SUNOS4.1.1, SUNOS4.1.3, SOLARIS 5.3 and DEC OSF/1 (ALPHA)
are avaialable as usual.) Watch this space for more ports.
Glimpse was developed by Udi Manber,
Sun Wu, and Burra Gopal.
glimpse@cs.arizona.edu