MaltParser 1.4.1 - Available options
All options are categorized into one of the following option groups:
system,
config,
singlemalt,
input,
output,
graph,
nivre,
2planar,
planar,
covington,
libsvm,
liblinear,
guide,
pproj.
Every option can have the following attributes:
Attribute | Description |
---|
name | The name of the option |
type |
There are following option types:
unary | The option has no value, this type is only used by the help option to indicate that help should be displayed. | bool | Boolean option, can take either true or false value. | integer | Integer option, can take an integer value. | string | String option, can take a string value. | enum | Enum option, can only take a predefined value. | stringenum | StringEnum option, can either take a string value or a predefined value. | class | Class option, can take a predefined value that corresponds to a class in the MaltParser distribution. |
|
flag |
A short version option indicator.
|
default |
If there is a default value it is specified by this attribute.
|
usage |
Indicates the usage of the option:
train | The option is only relevant during learning. | process | The option is only relevant during processing (parsing) | both | The option is relevant both during learning and processing (parsing) | save | The option is saved during learning and cannot be overridden during processing (parsing) |
|
All the option groups and options are described in detail below. An option begins with the
following format if the attribute is applicable:
name | -flag | type | default value | usage |
system
The system option group contains options that have a special status, because they control the overall system. These
options can only have one value each. For instance, you cannot specify more than one option file.
There are several ways to control MaltParser and one way is to supply all options
in an option file.
The option_file option can be used to specify the path to this option file.
Displays a short description of all available options.
There are several levels of verbosity for the system output stream, from showing all debugging messages (which can
be useful when modifying or extending the source code of MaltParser) to turning off all messages. MaltParser uses
Apache log4j logging services. To find out more about
the different levels please consult the Apache log4j documentation. The default verbosity level is info, which means that
all error, warning and informational messages are displayed.
off | Logging turned off |
fatal | Logging of very severe error events |
error | Logging of error events |
warn | Logging of harmful situations |
info | Logging of informational messages |
debug | Logging of debugging messages |
config
The config option group contains general options for a configuration.
The configuration name is the name of the configuration and also the name of the MaltParser configuration file, which ends
with the file suffix .mco.
The name is your own choice, but it is appropriate to give the configuration a name that reflects the content. This option must always be
specified, except when the url option is used instead of name.
It is possible to specify a URL to
the configuration file instead of specifying the configuration name. For example, if you have a configuration
file with the following URL: http://maltparser.org/mco/test.mco
you can write
-u http://maltparser.org/mco/test.mco
.
flowchart | -m | enum | parse | both |
There are seven predefined flow charts.
learn | Learn a Single MaltParser configuration |
parse | Parse with a Single MaltParser configuration |
info | Prints the info file of a configuration |
unpack | Unpacks a configuration |
convert | Simple format converter |
proj | Projectivizes input data using a configuration |
deproj | Deprojectivizes input data using a configuration |
learnwo | Same as learn, but also outputs the graphs to file specified by the flag -o |
testdata | Generates test instances to run experiments with a learner outside MaltParser. Use for example the flag -lsi truei> to save instances. |
versioning | Converts an old parser model (mco-file) into the latest version (supports version 1.3 parser model or later) |
type | -t | class | singlemalt | both |
MaltParser 1.4.1 has one available configuration type: singlemalt. Later releases
may contain additional configuration types. For example, one type could be an ensemble parser configuration containing
many single malt configurations.
singlemalt | Single Malt Parser configuration |
workingdir | -w | string | user.dir | both |
By default the working directory is the directory where MaltParser is started from, but it is possible to
specify another directory with the workingdir option.
logging | -cl | enum | info | both |
In contrast to the system-verbosity option, the logging option controls the level of verbosity of an
individual configuration. The different verbosity or logging levels are the same as for the system-verbosity option.
off | Logging turned off |
fatal | Logging of very severe error events |
error | Logging of error events |
warn | Logging of harmful situations |
info | Logging of informational messages |
debug | Logging of debugging messages |
logfile | -lfi | string | stdout | both |
By default the logging will be output to the standard
output stream, but it is possible to direct this output stream to a logging file by specifying the logfile option.
singlemalt
The singlemalt option group is used when the singlemalt configuration type is specified.
This option is replaced by --config-flowchart and should not by used anymore. The value of this option will be mapped to --config-flowchart.
parsing_algorithm | -a | class | nivreeager | save |
The single malt configuration contains seven deterministic parsing algorithms. Four algorithms produce
projective dependency graphs: Nivre arc-eager, Nivre arc-standard, Covington projective and Stack projective. Three algorithms are able to produce
non-projective graphs: Covington non-projective, Stack eager and Stack lazy. Nivre's parsing algorithms have an option group called nivre, for controlling
the behavior of the algorithm, Covington's algorithms have a corresponding option group called covington. For more information about the
parsing algorithm see the user guide: Parsing Algorithms.
nivreeager | Nivre arc-eager |
nivrestandard | Nivre arc-standard |
covnonproj | Covington non-projective |
covproj | Covington projective |
stackproj | Stack projective |
stackeager | Stack eager |
stacklazy | Stack lazy |
planar | Planar eager |
2planar | 2-Planar eager |
guide_model | -gm | class | single | save |
MaltParser 1.4.1 has one available guide model type: single. Later releases
may contain additional guide model types.
null_value | -nv | enum | one | save |
MaltParser 1.4.1 and later versions (implemented in Java) have the possibility of distinguishing between
different kinds of null-values when extracting the feature vector. For input columns like POSTAG
it is possible to
differentiate two null-values:
NO NODE
: There exists no corresponding dependency graph node (e.g., because the lookahead extend beyond the end of the string),
which means that the feature is really undefined.
ROOT NODE
: The dependency graph node is a root node, which means that it is not possible to extract an input column value
(for example, the word form or the part-of-speech).
In addition to the two null value categories for input columns, there is one more for the output columns:
NO VALUE
: The dependency graph node exists and is not the root, but has not yet been assigned a value for the output column
requested (e.g., has not been assigned a head and therefore does not have a dependency type).
With this option it is possible to specify the degree of differentiation of null-values.
none
: Excludes all kinds of null-values when extracting the feature vector, this option value is not possible for learning
methods that have symbolic feature vector encoding.
one
: Maps all kinds of null values to one symbol.
rootlabel
: Maps all kinds of null values to one symbol and for output columns this symbol is the same as the root
label (used to emulate MaltParser 0.4)
rootnode
: Distinguishes between NO NODE
and ROOT NODE
, and the NO VALUE
null-value case is mapped
to the ROOT NODE
null-value for output columns.
novalue
: Distinguishes between NO NODE
and ROOT NODE
for both input and output columns, and NO VALUE
for output columns.
none | Excludes all types of null values |
one | Maps all kinds of null values to one symbol |
rootlabel | Same as 'one', but null value for output column is mapped to the root label |
rootnode | Distinguish between no node and root node |
novalue | Distinguish between no node and root node, and no value for output column |
diagnostics | -di | bool | false | both |
If true ,then diagnostics is written to standard out or the file specified by option diafile. By default this option is false.
diafile | -dif | string | stdout | both |
By default the diagnostics will be output to the standard
output stream, but it is possible to direct this output stream to a diagnostics file by specifying the diafile option.
use_partial_tree | -up | bool | false | save |
If true, then partial trees are allowed as input and the parser will construct these partial trees before parsing. By default
this option is false. Please see the user guide: Partial trees
propagation | -fp | string | | save |
The propagation option is used for specifying the propagation specification file, which is an XML file
(see user guide: Propagation)
input
The input option group contains options that control the input data. In MaltParser 1.4.1, the values of options in the
input option group must match the values of corresponding options in the output option group.
This restriction is likely to be removed in later releases.
The input data file is specified by the infile option. It is important that the input data file is formatted
according to the format specified by the format option. For example, if format=conllx the input file should at least contain eight columns
during learning and six column during parsing.
format | -if | stringenum | conllx | save |
This option tells the parser which format is used in the input data file. The format is defined in an XML file. For
more information see the user guide: Input and output format.
There are already two data format specification files in the MaltParser distribution (included in malt.jar):
- conllx defines the CoNLL-X shared task format
- malttab defines the Malt-TAB format.
- negraps defines simple version of the Negra export format, including lemma column.
- negrads defines internally a dependency representation of the Negra export format.
conllx | CoNLL-X data format |
malttab | MaltTAB data format |
negraps | Negra phrase structure |
negrads | Negra dependency structure |
tal05ps | Talbanken05 phrase structure |
tal05ds | Talbanken05 dependency structure |
tigerps | Tiger phrase structure |
tigerds | Tiger dependency structure |
reader | -ir | class | tab | both |
In MaltParser 1.4.1 there are two input readers:
- tab reads tab-separated files and with columns defined by the input format.
- negra reads line-oriented files similar to the Negra export format.
tab | Tab-separated reader |
negra | Negra reader |
tiger | TigerXML reader |
charset | -ic | string | UTF-8 | save |
The charset option defines the character set of the input data file, for example, UTF-8
or ISO8858-1
.
reader_options | -iro | string | | both |
MaltParser has several data readers and with this option it is possible to control individual data readers.
iterations | -it | integer | 1 | both |
Number of iterations over the input file.
output
The output option group contains options that control the output data. In MaltParser 1.4.1, the values of options in the
output option group must match the values of corresponding options in the input option group.
This restriction is likely to be removed in later releases.
The output data file is specified by the outfile option.
format | -of | stringenum | | both |
This option tells the parser which format is used for the output data file. The format is defined in an XML file.
For more information see the user guide: Define your own input/output format.
There are already two data format specification files in the MaltParser distribution (included in malt.jar):
- conllx defines the CoNLL-X shared task format
- malttab defines the Malt-TAB format.
- negraps defines a simple version of the Negra export format, including lemma column.
- negrads defines internally a dependency representation of the Negra export format.
conllx | CoNLL-X data format |
malttab | MaltTAB data format |
negraps | Negra phrase structure |
negrads | Negra dependency structure |
tal05ps | Talbanken05 phrase structure |
tal05ds | Talbanken05 dependency structure |
tigerps | Tiger phrase structure |
tigerds | Tiger dependency structure |
writer | -ow | class | tab | both |
In MaltParser 1.4.1 there is two output writer:
- tab reads tab-separated files with columns defined by the input format.
- negra reads line-oriented files similar to the Negra export format.
tab | Tab-separated writer |
negra | Negra writer |
tiger | TigerXML writer |
charset | -oc | string | UTF-8 | save |
The charset option defines the character set of the output data file, for example, UTF-8
or ISO8858-1
.
writer_options | -owo | string | | both |
MaltParser has several data writers and with this option it is possible to control individual data writers.
graph
The graph option group controls internal data structures, such as the sentence and the dependency graph.
max_sentence_length | -gsl | integer | 256 | both |
By default, the maximum sentence length is 256 tokens. If the input data file has sentences that are longer
than 256 tokens, this option may be used to adjust the internal data structures, so that longer sentences can be loaded. This option is deprecated, there is no upper limit of the sentence length.
root_label | -grl | string | ROOT | save |
Default label used for unattached tokens that are automatically attached to the special root node after parsing is completed.
head_rules | -ghr | string | | save |
It is possible to define head finding rules to control the transformation from phrase structure to dependency structure.
For more information see the user guide: Head-finding rules.
nivre
The nivre option group controls the Nivre arc-eager and Nivre arc-standard parsing algorithms.
root_handling | -r | enum | normal | save |
The root_handling option specifies how dependents of the special root node are handled.
strict | Root dependents not attached during parsing (attached with default label afterwards), reduction of unattached tokens not permissible |
relaxed | Root dependents not attached during parsing (attached with default label afterwards), reduction of unattached tokens permissible |
normal | Root dependents attached by RightArc transition during parsing (unattached tokens attached with default label afterwards) |
2planar
The 2-planar option group controls the 2-planar parsing algorithm.
reduceonswitch | -2pr | bool | false | save |
If reduceonswitch=true, the parser reduces the active stack immediately after switching stacks.
planar_root_handling | -prh | enum | normal | save |
The planar_root_handling option specifies how dependents of the special root node are handled in the 2-planar parser.
relaxed | Root dependents not attached during parsing (attached with default label afterwards). |
normal | Root dependents attached by RightArc transition during parsing (unattached tokens attached with default label afterwards). |
planar
The planar option group controls the Nivre planar parsing algorithm.
connectedness | -pcon | enum | none | save |
If connectedness=true, the parser only generates connected dependency graphs.
none | Don't enforce connectedness at all, words whose head the parser doesn't know will be linked to the root node. With this option, the parser will work with planar dependency forests. A forest may be seen as a tree by considering all the roots linked to the dummy root node, but it needn't be planar when seen this way. |
reduceonly | The last node in a connected component cannot be reduced. No restrictions on shift transitions. This option guarantees that the dependency graph obtained counting links to the dummy root node is planar and connected. |
full | Enforce full connectedness by not only not allowing to reduce the last node in a component, but not allowing to shift the last word if the graph is not connected. The produced graph will be connected and planar even without considering the dummy root node. |
acyclicity | -pacy | bool | true | save |
If acyclicity=true, the parser only generates acyclic dependency graphs.
no_covered_roots | -pcov | bool | false | save |
If covered_roots=true, the parser disallows covered roots (i.e. disallows non-projective structures, while with this option set to false, it allows planar structures that are not projective).
covington
allow_root | -cr | bool | true | save |
If allow_root=true, the parser treats the special root node as a token during parsing, allowing root dependents to be
attached with a RightArc transition; otherwise root dependents are not attached during parsing. In both cases, unattached tokens are attached to the special root
node with the default label after parsing is completed.
allow_shift | -cs | bool | false | save |
If allow_shift=true, Shift is a valid transition, allowing the parser to skip remaining tokens in Left;
otherwise all tokens in Left must be inspected before the next token is shifted.
libsvm
This group contains options that are specific for the LIBSVM learner.
libsvm_options | -lso | string | -s_0_-t_1_-d_2_-g_0.2_-c_1_-r_0_-e_1.0 | save |
There are many LIBSVM options (see LIBSVM Documentation).
Note that all whitespace is replaced by underscore if this option is specified in the command-line prompt.
For example, it could look like this: -lso -s_0_-t_1_-d_2_-g_0.2_-c_1_-r_0_-e_1.0
.
libsvm_external | -lsx | string | | train |
If you have the LIBSVM package installed on your system then it is possible to use the C++ implementation
of LIBSVM learner instead of the internal Java implementation (libsvm.jar
) during learning time. It is very likely
that the external C++ implementation is faster and uses less memory on your system. By specifying this option with the path to
the executable file svm-train
(Microsoft Windows use svm-train.exe
) the parser will train LIBSVM models
with svm-train
instead of using libsvm.jar
.
Note: There can be a slight differences in accuracy between using the internal LIBSVM learner and the external
LIBSVM learner, due to different versions of LIBSVM and the precision in assigning floating-point parameters.
save_instance_files | -lsi | bool | false | save |
If save_instance_files=true, training instance files are saved in the configuration, otherwise
these files are deleted. The training instance files are not used during parsing.
verbosity | -lsv | enum | silent | train |
silent | No output from LIBSVM is logged |
error | Only the error stream of LIBSVM is logged |
all | All output of LIBSVM is logged |
liblinear
This group contains options that are specific for the LIBLINEAR learner.
liblinear_options | -llo | string | -s_4_-c_0.1 | save |
Liblinear have several options (see liblinear Documentation) that
you can specify with this options.
Note that all whitespace is replaced by underscore if this option is specified in the command-line prompt.
For example, it could look like this: -llo -s_4_-c_0.1
liblinear_external | -llx | string | | train |
Path to train
executable file of the liblinear package (Microsoft Windows use train.exe
)
save_instance_files | -lli | bool | false | save |
If save_instance_files=true, training instance files are saved in the configuration, otherwise
these files are deleted. The training instance files are not used during parsing.
verbosity | -llv | enum | silent | train |
silent | No output from LibLinear is logged |
error | Only the error stream of LibLinear is logged |
all | All output of LibLinear is logged |
guide
Contains options that are specific for the guide, which can be seen as an interface (or glue) between the parsing algorithm
and the learner. During learning, the parsing algorithm sends training instances to the guide, which prepares the corresponding feature vectors
that are sent to the learner. During parsing, the parsing algorithm requests the prediction of parser actions from the guide, which means that
the guide prepares the feature vectors that are sent to the classifier (which makes use of the model induced in the learning phase).
features | -F | stringenum | | save |
The features option is used for specifying the feature model specification file, which is an XML file
(see user guide: Feature model) or a text file with the file suffix .par (see
user guide of MaltParser 0.x (C-impl) Feature Models). If
no feature specification file is specified, the parser will use a default feature model specification for the given parsing algorithm that is included in the
MaltParser distribution (included in the malt.jar file).
nivreeager | Nivre arc-eager default model |
nivrestandard | Nivre arc-standard default model |
covnonproj | Covington non-projective default model |
covproj | Covington projective default model |
stackproj | Stack projective default model |
stackeager | Stack projective default model |
stacklazy | Stack projective default model |
planar | Planar arc-eager default model |
2planar | 2-Planar arc-eager default model |
data_split_column | -d | string | | save |
For some learning methods (like LIBSVM) it is impractical to induce a single model based on all training instances. With
the data_split_column, data_split_structure and data_split_threshold options it is possible to define how the guide
should split up the training
instances to train several models. Note: Usually this will result in a slight drop in accuracy but a significant decrease in learning time.
The option data_split_column indicates which input column in the data format specification file should be used for splitting up the training
instances, for example, -d POSTAG
or -d CPOSTAG
. It
is not a good idea to use fine-grained features, such as LEMMA or FORM, since this would result in thousands of models.
data_split_structure | -s | string | | save |
For some learning methods (like LIBSVM) it is impractical to induce a single model based on all training instances. With
the data_split_column, data_split_structure and data_split_threshold options it is possible to define how the guide
should split up the training
instances to train several models. Note: Usually this will result in a slight drop in accuracy but a significant decrease in learning time.
The option data_split_structure specifies
the data structure that should be used for splitting up the traning instances. For example, with Nivre's parsing algorithm
it is possible to use the top token on the stack (-s Stack[0]
) or the next input token (-s Input[0]
);
for Covington's algorithms it should be either -s Left[0]
or -s Right[0]
.
data_split_threshold | -T | integer | 50 | save |
For some learning methods (like LIBSVM) it is impractical to induce a single model based on all training instances. With
the data_split_column, data_split_structure and data_split_threshold options it is possible to define how the guide
should split up the training
instances to train several models. Note: Usually this will result in a slight drop in accuracy but a significant decrease in learning time.
The option data_split_threshold specifies the frequency threshold for training a separate model. For example, -T 100
means that all
training sets that contain less than 100 instances will be merged into a default training set.
The classifier can produce a k-best list of predicted parser actions. The kbest option indicates how many
items the k-best list should contain. If -k -1
, all possible parser actions are ranked in the k-best list.
If -k 1
, there is only one prediction in the k-best list. MaltParser 1.4.1 (behavior ≠ malt0.4) only makes
use of the k-best list when the parser action is not permissible. Later releases of MaltParser will make use of the k-best list
in a more intelligent way. If --malt0.4-behavior=true, this option will be overridden with k=1.
kbest_type | -kt | class | rank | process |
The classifier can produce a k-best list of predicted parser actions.
rank | Only ranked list |
score | Scored list, if the learner produce scores |
learner | -l | class | libsvm | save |
This option specifies the learning method (learner package). MaltParser 1.4.1 includes
the LIBSVM learner and the LIBLINEAR learner.
libsvm | LIBSVM learner |
liblinear | LIBLINEAR learner |
decision_settings | -gds | string | T.TRANS+A.DEPREL | save |
This option specifies how a parser action is combined or divided. By default, arc label(s) and transition are combined into
one individual decision. For more information see the user guide: Prediction strategy.
classitem_separator | -gcs | string | ~ | save |
By default the combination of transition and dependency type into one class is separated by an underscore. If
some dependency label contains an underscore, this could mess up the separation of the class.
Therefore another classitem_separator should be used in this case.
pproj
marking_strategy | -pp | enum | none | save |
Marking strategy for pseudo-projective transformation.
none | No pseudo-projective transformation |
baseline | Projectivizes input data |
head | Projectivizes input data with head encoding for labels |
path | Projectivizes input data with path encoding for labels |
head+path | Projectivizes input data with head and path encoding for labels |
covered_root | -pcr | enum | none | save |
Attachment strategy for covered roots.
none | No covered root transformation |
left | Attach covered roots to the left end of the covering arc |
right | Attach covered roots to the right end of the covering arc |
head | Attach covered roots to the head of the covering arc |
lifting_order | -plo | enum | shortest | save |
Lifting order, in case a dependency graph contains multiple non-projective arcs.
shortest | Lift the shortest arcs first (break ties from left to right) |
deepest | Lift the most deeply nested arcs first (break ties from left to right) |