Interchange Guides: Search Tutorial

Stefan Hornburg

This documentation is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

It is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Abstract

The purpose of this document is to describe the search "subsystem" in Interchange and link together all search-related topics.


Table of Contents

Searching using Swish-e
Swish-e Integration
Setup Searches
Search Examples
Configuration File Examples
Search Reference
ac — mv_all_chars
bd — mv_base_directory
bs — mv_begin_string
ck — mv_cache_key
cs — mv_case
op — mv_column_op
co — mv_coordinate
cv — mv_verbatim_columns
de — mv_dict_end
df — mv_dict_fold
di — mv_dict_limit
dl — mv_dict_look
DL — mv_raw_dict_look
do — mv_dict_order
dr — mv_record_delim
em — mv_exact_match
er — mv_spelling_errors
fc — mv_force_coordinate
ff — mv_field_file
fi — mv_search_file
ft — mv_field_title
fm — mv_first_match
fn — mv_field_names
hs — mv_head_skip
id — mv_index_delim
lb — mv_search_label
lf — mv_like_field
lo — mv_list_only
lr — mv_search_line_return
ls — mv_like_spec
ma — mv_more_alpha
mc — mv_more_alpha_chars
md — mv_more_decade
mi — mv_more_id
ml — mv_matchlimit
mm — mv_max_matches
MM — mv_more_matches
mp — mv_profile
ms — mv_min_string
ne — mv_negate
ng — mv_negate
nh — mv_no_hide
nm — mv_no_more
np — mv_nextpage
ns — mv_next_search
nu — mv_numeric
os — mv_orsearch
pm — mv_more_permanent
ra — mv_return_all
dr — mv_return_delim
re — mv_search_reference
rf — mv_return_fields
rg — mv_range_alpha
rl — mv_range_look
rm — mv_range_min
rn — mv_return_file_name
rr — mv_return_reference
rs — mv_return_spec
rx — mv_range_max
sd — mv_small_data
se — mv_searchspec
sf — mv_search_field
sg — mv_search_group
si — mv_search_immediate
sm — mv_start_match
sp — mv_search_page
sq — mv_sql_query
sr — mv_search_relate
st — mv_searchtype
su — mv_substring_match
tf — mv_sort_field
to — mv_sort_option
un — mv_unique
va — mv_value

Searching using Swish-e

The Swish search module allows you to search index files generated by Swish-e.

Swish-e Integration

To enable any Swish searching, modify your interchange.cfg to add:

Require module Vend::Swish
AddDirective Swish hash
Variable swish Vend::Swish

To configure your catalog to use Swish, modify the appropriate catalog.cfg and add:

Swish command /usr/bin/swish-e
Swish index products/swish-e.db

Setup Searches

Finally, in search parameters, use mv_searchtype=swish or the shorthand notation st=swish.

The fields to be returned from Swish to Interchange are configurable, and default to:

mv_return_fields=code score title url mod_date filesize
mv_field_names=code score title url mod_date filesize

These correspond to:

  code      swishreccount
  score     swishrank
  url       swishdocpath
  title     swishtitle
  filesize  swishdocsize
  mod_date  swishlastmodified

The date in the mod_date field is returned in the format %Y-%m-%d %H:%M:%S.

You can change that with the date_format option:

Swish date_format "%d %b %Y"

See time glossary entry for supported format strings.

Search Examples

Simple search for the term Swish:

swish-e -w Swish

Same search with specifying the index file:

swish-e -w Swish -f db/xmldocs

You can include properties in the output:

swish-e -w Swish -f db/xmldocs -p purpose

Or search within a property:

swish-e -w purpose=LWP -f db/xmldocs

Configuration File Examples

Indexing web sites is pretty easy. Swish provides a spider script, which is simply called with the parameters default starting_URL. Create a configuration file similar to the following:

IndexFile db/icdevgroup
IndexDir /usr/local/lib/swish-e/spider.pl
SwishProgParameters default http://www.icdevgroup.org/docs/

Now you can start indexing with swish-e -S prog -c icdevgroup.conf.

Search Reference

ac — mv_all_chars

(0/1, default 1)

escape non-alphanumeric characters in search specification.

bd — mv_base_directory

(directory_name, default ProductDir)

base directory in which to look up text files to search (related option fi).

Directory paths can be absolute, provided that the pathname is equal to the MV_SEARCH_FILE variable, or a scratch variable of the same name is 1. To enable searching in say, /etc/dict, use either [calcn]$Variable->{MV_SEARCH_FILE} = '/etc/dict'; return[/calcn] or [tmp /etc/dict]1[/tmp].

bs — mv_begin_string

(1/0, default false)

the search string matches only at the beginning of a column.

ck — mv_cache_key

(search_reference_pointer, default none)

not intended for common use. When more tag is used, this option automatically provides a pointer to the search reference.

cs — mv_case

(0/1, default 0)

case sensitive search.

op — mv_column_op

(rm | eq | tq | aq, default rm)

operation to perform to check field for a match.

For tq and aq matching using Text::Query module, see Q:.

co — mv_coordinate

(0/1, default 0)

the so-called "coordinated" search allows for multiple search options to be stacked on top of each other.

If the number of search fields (sf options) equals the number of search specs (se options), the search will return items that match all or one of the field-specification blocks (controlled with mv_orsearch). When the two numbers do not match, coordinated mode will be automatically and silently turned off. To force a coordinated search, see mv_force_coordinate.

When coordinated searching is used, case sensitivity, substring matching, negation and other options can be specified multiple times and work on a field-by field basis, according to the following rules:

  • If only one instance of the option is set, it will affect all fields (search specifications).

  • If the number of instances of the option is greater than, or equal to, the number of search specifications, all will be used independently. (Eventual trailing, excess instances will be ignored).

  • If more than one instance of the option is set, but fewer than the total number of search specifications, the default, documented setting will be used for trailing search specifications.

  • If a search specification is blank, it will be removed and all case-sensitivity, negation, substring and other options will be adjusted accordingly. If you need to match on a blank string, use quotes ("").

cv — mv_verbatim_columns

(/, default )

de — mv_dict_end

(/, default )

df — mv_dict_fold

(/, default )

Make dictionary matching case-insensitive. Ignored unless mv_dict_look is set.

di — mv_dict_limit

(/, default )

dl — mv_dict_look

(/, default )

DL — mv_raw_dict_look

(/, default )

do — mv_dict_order

(/, default )

Make dictionary matching follow dictionary order, where only word characters and whitespace matter. Ignored unless mv_dict_look is set.

dr — mv_record_delim

(record_delimiter, default \n)

delimiter for counting records in search index files. The default, a newline, works well for most line-based index files.

em — mv_exact_match

(0/1, default 0)

require that search field matches the search specification exactly (as opposed to the default word-based matching, or substring matching with su). Search specification will behave as it was enclosed in quotes.

er — mv_spelling_errors

(/, default )

fc — mv_force_coordinate

(0/1, default 0)

force coordinated search (enabled with mv_coordinate).

Normally, coordinated mode is automatically turned off when the number of search specifications does not match the number of search fields. With this option, however, instead of disabling coordinated mode, Interchange ensures the number of search specifications does match the number of fields by filling the missing specifications with the last one specified, or by discarding extras.

This option is useful when you want to search for one string in multiple fields with different options.

ff — mv_field_file

(header_filename, default none)

specify filename containing a single line with the list of database fields, separated by TABs. This is used when you are searching databases without the "field header" on the first line, but you would still want to refer to fields by their names.

fi — mv_search_file

(/, default all ProductFiles tables)

tables and/or text files to be searched.

ft — mv_field_title

(/, default )

fm — mv_first_match

(search_result_number, default 1)

return search results from the specified result number onwards. When this option is set, Interchange will return search results starting from the match number specified even if there is only one page of results. If set to a value greater than the total number of matches, it will act as if no matches were found.

fn — mv_field_names

(/, default )

hs — mv_head_skip

(row_count, default 1 for text files, 0 otherwise)

number of lines to skip at the beginning of a search index or text file. Interchange normally skips one line for text-based searches (st=text) to exclude the header line.

id — mv_index_delim

(field_delimiter, default \t)

delimiter for counting fields in search index files. The default, a TAB character, works well for most line-based index files.

lb — mv_search_label

(/, default )

lf — mv_like_field

(field_name, default none)

perform search similar to SQL "LIKE" functionality. When defined, mv_like_spec is required as well.

lo — mv_list_only

(/, default )

lr — mv_search_line_return

(/, default )

ls — mv_like_spec

(search_specification, default none)

string to search for in mv_like_field. The behaviour of the % character and case-sensitivity depends upon your SQL implementation.

ma — mv_more_alpha

(/, default )

mc — mv_more_alpha_chars

(/, default )

md — mv_more_decade

(/, default )

mi — mv_more_id

(/, default )

ml — mv_matchlimit

(record_count, default 50)

maximum number of records (search results) to return from a search. When all the results are displayed on a single page, this option is equivalent to mm. When the more tag is used to display results multi-page, then this option determines the number of results per page. To specify unlimited, use none or all, not 0.

mm — mv_max_matches

(record_count, default unlimited)

final, maximum number of records (search results) to return from a search (related option ml).

MM — mv_more_matches

(/, default )

mp — mv_profile

(/, default )

ms — mv_min_string

(min_length, default 1 for text-based searches)

minimum size of a search string for a search operation.

ne — mv_negate

(/, default )

ng — mv_negate

(/, default )

nh — mv_no_hide

(/, default )

nm — mv_no_more

(/, default )

np — mv_nextpage

(/, default )

ns — mv_next_search

(/, default )

nu — mv_numeric

(1/0, default 0)

search operator will perform numeric (instead of string) comparison.

os — mv_orsearch

(/, default )

pm — mv_more_permanent

(/, default )

ra — mv_return_all

(1/0, default 0)

return all records

dr — mv_return_delim

(/, default )

re — mv_search_reference

(/, default )

rf — mv_return_fields

(/, default )

(specification of :* indicates "all fields").

rg — mv_range_alpha

(/, default )

rl — mv_range_look

(/, default )

rm — mv_range_min

(/, default )

rn — mv_return_file_name

(/, default )

rr — mv_return_reference

(/, default )

rs — mv_return_spec

(1/0, default 0)

the one and only match from the search will be the value of the mv_searchspec itself. Useful in testing, or yes/no confirmation whether the search string was found

rx — mv_range_max

(/, default )

sd — mv_small_data

(/, default )

se — mv_searchspec

(/, default )

sf — mv_search_field

(/, default )

sg — mv_search_group

(/, default )

si — mv_search_immediate

(/, default )

sm — mv_start_match

(/, default )

sp — mv_search_page

(/, default )

sq — mv_sql_query

(SQL_Query, default none)

for text-based searches (st=text only), this option specifies the SQL query to run over the lines in the file. This is not the same as an external SQL database search.

Furthermore, the SQL_Query undergoes a little modification before it is used. Here's a practical example:

Artist: <input name="artist" />
Title:  <input name="title"  />
<input type="hidden" name="mv_sql_query" value="
  SELECT  code FROM products
  WHERE artist LIKE artist
  AND    title LIKE title
" />

If the right-hand side of every part of expression is an alphanumeric, unquoted word, then it is replaced with the appropriate form variable value. (Or if it's a one-click search, scratch variables are used instead). Quoted right-hand side values are taken literally.

If the left-hand side of every part of expression is a quoted word, the behavior is reversed. That part is replaced with the appropriate form variable value. (Or if it's a one-click search, scratch variables are used instead). Unquoted left-hand side values are taken literally.

Here's an example that allows users to select whether they want to search in title or artist fields:

Search for: <input name="searchstring" /><br />
Search in   <input type="radio" name="column" value="title"  /> title
            <input type="radio" name="column" value="artist" /> artist

<input type=hidden name="mv_sql_query" value="
  SELECT code    FROM products
  WHERE 'column' LIKE searchstring
" />

Just for a reference, here's what the two above examples would look like when coded "manually":

[page search="
  co=yes
  sf=artist
  op=rm
  se=[value artist]
  sf=title
  op=rm
  se=[value title]
"]
Search for [value artist], [value title]
</a>


[page search="
  co=yes
  sf=[value column]
  op=rm
  se=[value searchstring]
"]
Search for [value searchstring] in [value column]
</a>

sr — mv_search_relate

(/, default )

st — mv_searchtype

( [ glimpse | db | sql | text | ref ], default none)

select search type. glimpse uses the Glimpse search engine (see Glimpse), db (or the equivalent sql) iterate over every row in the SQL database, text searches corresponding database text source files, and ref iterates over the results from some previous, already-performed search (related option lb).

su — mv_substring_match

(0/1, default 0)

match on substrings as well as whole words. This is typically set in dictionary-based searches.

tf — mv_sort_field

(field_name_or_index [,field_name2_or_index2...], default none)

determine sort order of the returned data. It is possible to refer to columns by both using their names (if the search is such that column names are known) and their indices, starting from 0.

to — mv_sort_option

(/, default )

un — mv_unique

(0/1, default 0)

removes duplicate records from the result set. Duplicates are determined by comparing the value of the first search return field (set with rf).

va — mv_value

(value_variable_name=value, default none)

assign value to a value variable. This is exactly what happens with normal variables in search profiles when you use the variable_name=value syntax, so you should use this option only where variables cannot be set directly (i.e. in one-click searches):

[page
  href=scan
  arg="se=Renaissance
       se=Impressionists
       va=category_name=Renaissance and Impressionist Paintings
       os=yes"
]Renaissance and Impressionist Paintings<a>

DocBook!Interchange!