SQL database system
 
Manual page for indexes(TDH)

Indexes

Indexes may be used to speed up SELECTs, UPDATEs, and DELETEs on larger tables or ordinary files. shsql uses ISAM (indexed sequential) indexes, generally of two or three tiers depending on table size. Indexes are not dynamically updated on the fly, but instead must be rebuilt from time to time.

Creating and removing

Indexes are created using the CREATE INDEX command and removed using the DROP INDEX command. You can also effectively remove indexes by removing the relevant files in the indexes subdirectory of your project directory.

Getting information about indexes

You can see which fields in a table have indexes (as well as attibutes of the indexes) by using the tabdef(1) command from the unix command line. Another way to do it is to go into the indexes subdirectory of your project directory.. all files are ascii readable with header information in the first line.

Indexes and the WHERE clause

When a SELECT, UPDATE, or DELETE command is issued, the WHERE clause is examined to determine if indexing can be used to quicken the operation.

Single conditionals: Indexed access will occur if all of these are true for the comparison condition:

  • it has a left operand that is a fieldname for which an index exists
  • it uses a comparison operator that is one of: = LIKE > >= < <= IN INLIKE INRANGE OUTRANGE CONTAINS
  • it has a right operand that is a constant and does not begin with a wild card character. (With INLIKE, none of the list members may begin with a wild card. With CONTAINS wild card characters are considered punctuation and are ignored, so this isn't an issue.)

Compound conditionals: When several individual comparisons are connected by AND, only the leftmost comparison will be used for indexing, subject to the above rules.

When several conditionals are connected by OR, the leftmost comparison in each OR term will be used for indexing, subject to the above rules. If the leftmost OR term is eligible for indexing, all OR terms are expected to be eligibile for indexing; if one turns out not to be eligible, an error is issued (this is not ideal and may be changed in a future release).


Table scans

Queries not eligible for indexed access will result in a "table scan", meaning that all records in the table are examined. For smaller tables this is not a problem, but for larger ones performance may be poor. You can prohibit table scans on any table for which an index exists by setting dbmustindex in your project config file.


Alpha vs. Numeric

Indexing uses alphanumeric comparison unless the index is created with ORDER = NUMERIC in which case numeric comparison will be used. ORDER = NUMERIC must be used for proper results with where clause comparisons that will use numeric comparison operators >, >=, <, <=, INRANGE or OUTRANGE. Non-numerics and NULL can be present in numeric fields, and indexing may be used to access such values.

Note: for integer serial number fields, alpha is usually a better choice than numeric, since numeric magnitude comparison is usually not needed, but operations involving IN (etc) are often useful.


Available index types

standard index

a standard index is the default type and is used in most cases. A two-tier or three tier ISAM index will be built depending on table size.


direct

A direct index is useful with data files that will seldom or never be updated using shsql INSERT or UPDATE. Direct indexes are higher performance and use less disk space than a regular index. However, you must sort the data file yourself before building or rebuilding a direct index.

If the field to be direct-indexed is alphanumeric and could contain a mixture of upper and lower case values, it must be sorted without regard to case (sort -f does this; eg. {abc, Abd, aBe, Abf}). If you're using ORDER = NUMERIC then the field to be direct-indexed must be sorted in numeric order (sort -n). And, don't forget that the field name header must be put back at the top of the file after doing the sort... this can be done in a text editor.

You can use any WHERE clause comparison with a direct-indexed field as with a standard-indexed field. Data files that have a direct index may be updated by shsql INSERT or UPDATE, but the table must be manually sorted again into the correct order before doing an index rebuild (otherwise subsequent retrievals will not work properly).

To create a direct index use an SQL command like this: create index type=direct on dictionary ( term )


word

A word index is useful when searching fields that contain titles, descriptions, or lists of values. Each word gets its own entry. word indexes are often found in combination with multiword search queries that use CONTAINS. Described in more detail here.


combinedword

A combinedword index is similar to a word index, but it takes values from several database fields to build the index, instead of just one, for better search efficiency when several fields will frequently be searched together, as is often the case with search engine applications. Described in more detail here.


combined

A combined index is the same as a combinedword index, except that database fields are not parsed into multiple words, but each is taken as a whole. The same rules mentioned here apply. (New in version 1.27)

Example of where this is useful: An application that searches 3 fields, each of which contains a single identifier token, but where some of the identifier fields contain punctuation characters. An attempt to set up a combinedword breaks these up into multiple "words" which is incorrect. Using a combined index allows there to be one index for all 3 of these fields, without undergoing the word parsing.


Notes

Retrievals that use an index will be ordered on the indexed field by default.

DISTINCT is automatically in effect on index-eligible SELECT retrievals when:

  • OR is present
  • CONTAINS is present
  • a direct index is involved and any list-based comparison operators (such as IN or INLIKE) are present

This is done to avoid unwanted duplication in the result row set as a consequence of the iterative method that shsql uses to retrieve rows in such situations.

If duplicate list members are specified in an IN or INLIKE expression (for example when expressions are generated dynamically), SELECT DISTINCT should be used in order to eliminate duplication in the result row set.

Alphanumeric index tags are truncated to a certain length, by default 15 characters. This can be raised in your config file.

Indexes are implemented as tabular ascii files located in the indexes directory.

Index building is actually done by buildix(1) which in turn invokes unix sort. sort is invoked such that alphanumerics will be sorted in ascii order (case sensitive).

A list of table fields that have indexes is maintained by shsql in files called tablename.fieldname.0


Copyright Steve Grubb  


Markup created by unroff 1.0,    February 09, 2005.