
Search
Prerequisite for a search in the cache archives with the WebAssistant
is an indexation.
The search is available:
- with the URL http://www.MM3Tools.de/WebAssistant/search
- with the Command ls.
At first you select a cache archive. Make sure that only indexed archives are available. You can search for words, domains and URLs. Several search criterias are combined with the operation AND.
Search Terms
Search for words
- Equal
Input: Search-Word
Output: pages with words which are equal to the Search-Word. - Word beginning
Input: Search-Word*
Output: pages with words which start with the Search-Word. - Word ending
Input: *Search-Word
Output: pages with words which end with the Search-Word. - Include
Input: *Search-Word*
Output: pages with words which include the Search-Word.
Search for a domain
- Equal
Input: site:Search-Domain
Output: pages from the Search-Domain. - Domain beginning
Input: site:Search-Domain*
Output: pages from domains which start with the Search-Domain. - Domain ending
Input: site:*Search-Domain
Output: pages from domains which end with the Search-Domain. - Include
Input: site:*Search-Domain*
Output: pages from domains which include the Search-Domain.
Search in a part of URL
- Input: url:Search-URL
Output: pages which include the Search-URL as a part of their URL.
Output of a Search
The result of a search is displayed as a hitlist. The files (pages)
are listed with their URL, size, date of archiving
as well as 200 characters.
Text files are marked by [TXT] in addition.
The title and the description are reported to HTML
files in addition.
The sequence of the files corresponds to the alphabetical sort of URL.
Several files from the same domain are reported intentionally. Files with
a red archiving date were actualized after construction of their index.
About the link Marker the page is displayed
with highlighted Search-Words. Marking isn't
possible for all files.
Information about the Index
Word Histogram
The histogram displays a sorting of the words and the number of the files in which the corresponding word occurs.
For an alphabetical sort you use keyword wordAlphabetical and the following input.
- All
wordAlphabetical:* - Equal
wordAlphabetical:Search-Word - Word beginning
wordAlphabetical:Search-Word* - Word ending
wordAlphabetical:*Search-Word - Include
wordAlphabetical:*Search-Word*
For a sorting after frequency you use the keyword wordFrequency.
For a sorting after word length you use the keyword wordLength.
Domain Histogram
The histogram displays an alphabetical sort of the domains and the number of the files which are included in the domain. Therefore you use the keyword siteAlphabetical.
- All
siteAlphabetical:* - Equal
siteAlphabetical:Search-Domain - Word beginning
siteAlphabetical:Search-Domain* - Word ending
siteAlphabetical:*Search-Domain - Include
siteAlphabetical:*Search-Domain*
For a sorting after frequency you use the keyword siteFrequency.

Indexing
The search in the cache archives with the WebAssistant
presupposes an indexation. It becomes indexed text and HTML
files (pages). The algorithm of the Indexer, works essentially language
independent. At this the corresponding lower case characters are always
used for capital characters and support only Latin characters as well as
some special characters of European languages.
Please, inform Tools,
if you need another language.
Script file
You start the indexing with one of the following script files:
- For operating systems of Microsoft the BAT file:
MM3-Indexer.bat - For operating systems Linux and UNIX the skript:
MM3-Indexer.sh - For the operating system Mac OS X from Apple:
MM3-Indexer.sh
Configuration of the Indexer
For the indexing you can set the following configuration:
- Select the cache archive to be indexed
- Specification of the minimal word length.
Only words which have a minimal word length are included into the indexing. Simplified, this word length consists of the characters of a word. - Display of the positive and negative word list
- Negative word list
These words aren't included into the index. - Positive word list
These words are taken despite fall below the minimal word length.
- Negative word list
You start the indexation after you have done the settings. The needed duration is dependent on the size of the archives. The indexation can take up some time. Please, close the MM3-WebAssistant before indexation.
Protocol output
You can take from the output of the Indexer:
- Indexed cache archives
- Number of the file still to be indexed.
- At the moment indexed domain
- Time needed till now
- Progress bar
- Summarizing statistics about the indexation
Out of Memory
The needed memory is dependent on the size of the archives and the
chosen minimal word length. You can increase the available memory for the
program in the script file, if the Indexer
needs more memory. You can alternatively subdivide the cache archive into
several archives or increase the minimal word length.