Eliza.pm
Chatbot::Eliza - A clone of the classic Eliza program
use Chatbot::Eliza;
This module implements the classic Eliza algorithm.
The original Eliza program was written by Joseph
Weizenbaum and described in the Communications
of the ACM in 1967. Eliza is a mock Rogerian
psychotherapist. It prompts for user input,
and uses a simple transformation algorithm
to change user input into a follow-up question.
The program is desigend to give the appearance
of understanding.
This program is a faithful implementation of the program
described by Weizenbaum. It uses a simplified script
language (devised by Charles Hayden). The content
of the script is the same as Weizenbaum's.
This module encapsulates the Eliza algorithm
in the form of an object. This should make
the functionality easy to use in larger programs.
This is all you need to do to launch a simple
Eliza session:
use Chatbot::Eliza;
$mybot = new Chatbot::Eliza;
$mybot->command_interface;
You can also customize certain features of the
session:
$myotherbot = new Chatbot::Eliza;
$myotherbot->name( "Hortense" );
$myotherbot->debug( 1 );
$myotherbot->command_interface;
These lines set the name of the bot to be
``Hortense'' and turn on the debugging output.
When creating an Eliza object, you can specify
a name and an alternative scriptfile:
$bot = new Chatbot::Eliza "Brian", "myscript.txt";
If you don't specify a script file, then the
Eliza module will initialize the new Eliza
object with a default script that the module
contains within itself.
You can use any of the internal functions in
a calling program. The code below takes an
arbitrary string and retrieves the reply from
the Eliza object:
my $string = "I have too many problems.";
my $reply = $mybot->transform( $string );
You can easily create two bots, each with a different
script, and see how they interact:
use Chatbot::Eliza
my ($harry, $sally, $he_says, $she_says);
$sally = new Chatbot::Eliza "Sally", "histext.txt";
$harry = new Chatbot::Eliza "Harry", "hertext.txt";
$he_says = "I am sad.";
# Seed the random number generator.
srand( time ^ ($$ + ($$ << 15)) );
while (1) {
$she_says = $sally->transform( $he_says );
print $sally->name, ": $she_says \n";
$he_says = $harry->transform( $she_says );
print $harry->name, ": $he_says \n";
}
Of course, as with the original Eliza program,
the magic of the algorithm is really in the script.
Each Eliza object uses the following data structures
to hold the script data in memory:
%decomplist
hash: the set of keywords; values: strings containing
the decomposition rules.
%reasmblist
hash: a set of values which are each the join
of a keyword and a corresponding decomposition rule;
values: the set of possible reassembly statements
for that keyword and decomposition rule.
%keyranks
hash: the set of keywords; values: the ranks for each keyword
@quit
``quit'' words -- that is, words the user might use
to try to exit the program.
@initial
Possible greetings for the beginning of the program.
@final
Possible farewells for the end of the program.
%pre
hash: words which are replaced before any transformations;
values: the respective replacement words.
%post
hash: words which are replaced after the transformations
and after the reply is constructed; values: the respective
replacement words.
%synon
hash: words which are found in decomposition rules;
values: words which are treated just like their
corresponding synonyms during matching of decomposition
rules.
my $chatterbot = new Chatbot::Eliza;
new creates a new Eliza object. This method
also calls the internal _initialize method, which in turn
calls the parse_script_data method, which initializes
the script data.
my $chatterbot = new Chatbot::Eliza 'Ahmad', 'myfile.txt';
The eliza object defaults to the name ``Eliza'', and it
contains default script data within itself. However,
using the syntax above, you can specify an alternative
name and an alternative script file.
See the method parse_script_data. for a description
of the format of the script file.
$chatterbot->command_interface;
command_interface opens an interactive session with
the Eliza object, just like the original Eliza program.
If you want to design your own session format, then
you can write your own while loop and your own functions
for prompting for and reading user input, and use the
transform method to generate Eliza's responses.
But if you're lazy and you want to skip all that,
then just use command_interface. It's all done for you.
$string = preprocess($string);
preprocess applies simple substitution rules to the input string.
Mostly this is to catch varieties in spelling, misspellings,
contractions and the like.
preprocess is called from within the transform method.
It is applied to user-input text, BEFORE any processing,
and before a reassebly statement has been selected.
It uses the array
%pre
, which is created
during the parse of the script.
$string = postprocess($string);
postprocess applies simple substitution rules to the
reassembly rule. This is where all the ``I'''s and ``you'''s
are exchanged. postprocess is called from within the
transform function.
It uses the array
%post
, created during the parse of the script.
if ($self->_testquit($user_input) ) { ... }
_testquit detects words like ``bye'' and ``quit'' and returns
true if it finds one of them as the first word in the sentence.
These words are listed in the script, under the keyword ``quit''.
$reply = $chatterbot->transform( $string );
transform applies transformation rules to the user input
string. It invokes preprocess, does transformations,
then invokes postprocess. It returns the tranformed
output string, called $reasmb.
$self->parse_script_data;
parse_script_data is invoked from the _initialize method.
It opens the scriptfile, if any, and reads in the script data.
This module includes a default script file within itself,
so it is not necessary to explicitly specify a script file
when instantiating an Eliza object.
Each line in the script file can specify a key,
a decomposition rule, or a reassembly rule.
key: remember 5
decomp: * i remember *
reasmb: Do you often think of (2) ?
reasmb: Does thinking of (2) bring anything else to mind ?
decomp: * do you remember *
reasmb: Did you think I would forget (2) ?
reasmb: What about (2) ?
reasmb: goto what
pre: equivalent alike
synon: belief feel think believe wish
The number after the key specifies the rank.
If a user's input contains the keyword, then
the ``transform'' function will try to match
one of the decomposition rules for that keyword.
If one matches, then it will select one of
the reassembly rules at random. The number
(2) here means "use whatever set of words
matched the second asterisk in the decomposition
rule."
If you specify a list of synonyms for a word,
the you should use a @ when you use that
word in a decomposition rule:
decomp: * i @belief i *
reasmb: Do you really think so ?
reasmb: But you are not sure you (3).
Otherwise, the script will never check to see
if there are any synonyms for that keyword.
Each line in the script file contains an ``entrytype''
(key, decomp, synon) and an ``entry'', separated by
a colon. In turn, each ``entry'' can itself be
composed of a ``key'' and a ``value'', separated by
a space. The parse_script_data function
parses each line out, and splits the ``entry'' and
``entrytype'' portion of each line into two variables,
``$entry'' and ``$entrytype''.
Next, it uses the string ``$entrytype'' to determine
what sort of stuff to expect in the ``$entry'' variable,
if anything, and parses it accordingly. In some cases,
there is no second level of key-value pair, so the function
does not even bother to isolate or create ``$key'' and ``$value''.
``$key'' is always a single word. ``$value'' can be null,
or one single word, or a string composed of several words,
or an array of words.
Based on all these entries and keys and values,
the function creates two giant hashes:
%decomplist, which holds the decomposition rules for
each keyword, and %reasmblist, which holds the
reassembly phrases for each decomposition rule.
It also creates %keyranks, which holds the ranks for
each key.
Five other arrays are created: B<%pre, %post,
%synon, @initial,> and
@final
.