CWB
Functions

special-chars.h File Reference

#include "globals.h"

Functions


Function Documentation

unsigned char* cl_string_maptable ( CorpusCharset  charset,
int  flags 
)

Gets a specified character mapping table for use in regular expressions.

Returns pointer to static mapping table for given flags (IGNORE_CASE and IGNORE_DIAC) and character set.

Removed from the public API for 3.2.0 because there's no way for it to work if the CorpusCharset is UTF8. Prototype moved to special-chars.h

Tables exist for all character sets, but for all except Latin1 and ASCII, they are currently identical to the ASCII tables (i.e. the awareness of case/accent relationships in the upper half of each character set have not yet been inserted).

Parameters:
charsetThe character set of this corpus. Currently ignored.
flagsThe flags that specify which table is required. Can be IGNORE_CASE and/or IGNORE_DIAC.
Returns:
Pointer to the appropriate mapping table. DO NOT FREE this, or modify it, it is a CL-internal data blob.

References ascii, charset, identity_tab, identity_tab_init, IGNORE_CASE, IGNORE_DIAC, maptable_init_both(), maptable_init_identity(), nocase_nodiac_tab, nocase_nodiac_tab_init, nocase_tab, nodiac_tab, and utf8.

Referenced by cl_string_canonical().