WWW::Search and AutoSearch
==========================
WHAT IS NEW WITH WWW::Search 1.019? (25-Jun-98)
-----------------------------------
overview: back-end updates
- bug fix: test suite bugs were causing false negatives on
Yahoo, Excite, Magellan, WebCrawler (reported by Martin Thurn,
fixed John Heidemann)
- new feature: the test suite is now run daily (automatically).
Output can be found at
.
- new feature: verbose mode of WebSearch is more verbose
- bug fix: AltaVista was recording the RealName URL on some queries
(bug reported by Vassilis Papadimos )
- bug fix: AltaVista wasn't correctly reporting change_time/size
(bug and fix from Martin Valldeby )
- known bug: WWW::Search doesn't work on MacPerl because of
end-of-line differences. A fix for this problem is in
progress. (Problem identified and fix suggested by
Chris Nandor.)
- known bug: I'm told there are installation problems on Windows.
Suggestions about how to improve Windows support are welcome.
(I don't run Windows and so can't write or test this code.)
Note: WWW::Search may have problems with older libwww's (5.08). If
"make test" dies with an error in RobotUA, upgrade libwww. (Tested
with libwww-5.30.)
WHAT IS WWW::Search?
--------------------
WWW::Search is a collection of Perl modules which provide an API to
WWW search engines. Currently WWW::Search includes back-ends for
variations of AltaVista, Dejanews, Excite, HotBot, Infoseek, Lycos,
Magellan, PLweb, SFgate, Verity, WebCrawler, and Yahoo. We include
two applications built from this library: AutoSearch (an program to
automate tracking of search results over time), and WebSearch, a small
demonstration program to drive the library. Back-ends for other
search engines and more sophisticated clients are currently under
development.
Because WWW::Search depends on parsing the HTML output of web search
engines it will fail of the search engine operators change their
format (an unfortunately frequent occurrence). WWW::Search includes a
test suite for most back-ends which verifies that it's functioning
correctly. As of the day of the release the current back-end
status is:
AltaVista working (in test suite)
Dejanews not working? not in test suite
Excite working (in test suite)
Gopher not working? not in test suite
HotBot working (in test suite)
Infoseek working (in test suite)
Lycos working (in test suite)
Magellan working (in test suite)
PLweb not working? not in test suite
SFgate not working? not in test suite
Verity not working? not in test suite
Simple not working? not in test suite
WebCrawler working (in test suite)
Verity not working not in test suite
Yahoo working (in test suite)
(others are currently under development, see contributors below for details)
WHAT IS AutoSearch?
-------------------
WWW::Search's primary client is AutoSearch. AutoSearch performs a
web-based search and puts the results set in a web page. It
periodically updates this web page, indicating how the search changes
over time. Sample output from WWW::Search can be found at
. Output format is configurable.
See the man page for AutoSearch details, or Demonstration section
below for the quick-start instructions.
REQUIREMENTS
------------
WWW::Search requires Perl5 and libwww-perl.
For information on Perl5, see .
For libwww-perl, see .
Both are also available from the Comprehensive Perl Archive
Network (CPAN). Visit to find a CPAN
site near you.
At this time WWW::Search is tested under Perl version 5.004_04.
AVAILABILITY
------------
The latest version of WWW::Search should always be available from
. Alpha releases are only
available here (not at CPAN).
WWW::Search is also available as part of CPAN. Visit
to find a CPAN site near you.
Feedback about WWW::Search is encouraged. If you're using it for a
neat application, please let us know. If you'd like to (or have)
implemented a new back-end for WWW::Search, let us know so we don't
duplicate work.
INSTALLATION
------------
In order to use this package you will need Perl version 5.002 or
better. You install WWW::Search, as you would install any perl module
library, by running these commands:
perl Makefile.PL
make
make test
make install
See below for a description of what "make test" does.
If you want to install a private copy of WWW::Search in your home
directory, then you should try to produce the initial Makefile with
something like this command:
perl Makefile.PL PREFIX=~/perl
TESTING
-------
The "make test" command compares expected output from WWW::Search with
actual output. It detects two kinds of errors:
- internal parsing:
First it checks to make sure that your system computes
the same results from my system based on some saved
Web queries. This test should always pass; if it doesn't,
send me mail.
- external queries:
Second, it makes real queries against the search engines
and compares them with some saved results.
External queries can fail for several reasons:
- new pages have been added which match my test queries
(not bad)
- changes in the web search engine output which break WWW::Search's
parsers (very bad)
If the external tests fail, please either investigate the error or
send a description of the problem and the output of "make test" to the
maintainer of the back-end for the search engine that fails. You can
find out who maintains the back-end by looking at the man page or code
for the back-end in the lib/WWW/Search directory.
DISCUSSION, BUG REPORTS, AND IMPROVEMENTS
-----------------------------------------
A mailing list for WWW::Search discussion exists. To subscribe, send
"subscribe info-www-search" as the body of a message to
.
Back-end-related bug reports (search engine whatever doesn't work) should
be sent to the author of the back-end (back-end authors are identified
in the corresponding man page and the output of ``make test'').
General bugs should be reported to .
When submitting a bug report, please remember to include
- your version of perl
- your version of WWW::Search
- sample output showing the error
- the output of "make test"
DEMONSTRATION
-------------
After installing the client programs,
try
WebSearch '"Your Name Here"'
to see who's talking about you on the web.
Then (in your web page directory), try
AutoSearch -n 'me on the web' -s '"Your Name Here"' me
and the web page me/index.html will be created summarizing
this information.
Then add
0 3 * * 1 AutoSearch /path/to/your/web/pages/me
to your crontab(1) to update this search once a week.
DOCUMENTATION
-------------
See WWW/Search.pm for an overview of the library.
POD-style documentation is included in all modules
and scripts. These are normally converted to manual pages and
installed as part of the "make install" process. You should also be
able to use the 'perldoc' utility to extract documentation from the
module files directly.
FUTURE PLANS
------------
Some ideas:
- application-level proxy support (I'm looking for a contribution
here from someone who uses/needs proxy support)
- more widespread use of new results tags across all back-ends
- a freeze/restore interface to suspend and resume in-progress queries
- more back-ends
Now that the test suite is done I don't plans to add major new
features, but contributions from others are always welcome. Send me
e-mail if you plan a new back-end and to discuss architectural changes
(to avoid duplicating work).
RELEASE HISTORY
---------------
1.002: (11 October 1996)
- First public release.
1.004: (31 October 1996)
- new: AutoSearch, a client application (see below for details)
- new: WWW::Search is now in CPAN (see GETTING WWW::Search for details)
- bug fix: installation problem (no rule to make CLIENTS/search) fixed
1.005: (12 November 1996)
- new: back-ends for HotBot, Lycos, and several AltaVista variants
- new: application support for search-engine selection
- new: application and library support for search-engine options
1.006: (25 November 1996)
- private beta release, see 1.007 for list of new features
1.007: (17 December 1996)
- new: back-ends for Dejanews (from Cesare Feroldi de Rosa),
Infoseek (also from Cesare Feroldi de Rosa),
and Excite (from GLen Pringle)
- new: more fields in SearchResult (score, dates, etc., see the man page)
(problem found by Cesare Feroldi de Rosa)
- new: better error handling on network failures
(AutoSearch should report errors on its pages,
$search->response() provides an API for error reporting)
- new (internal): user_agent handling has changed
- new: proxy support added to WWW::Search (still needed in applications)
(problem and fix suggested by T. V. Raman)
- bug-fix: numerous documentation updates
(problems found by Larry Virden)
- bug-fix: AltaVista web search was occasionally dropping hits
(problem found by Larry Virden, fixed by Bill Scheding)
- bug-fix: all non-alphanumeric characters are now escaped
(problem found by Larry Virden)
1.008: (8 January 1997)
- private alpha release, see 1.009 for list of new features
1.009: (14 January 1997)
overview: 1.009 is primarily a maintenance release to accommodate
changes to LWP and some search engines.
- change: search application renamed WebSearch (a more specific name)
- bug-fix: the WWW::Search error in formatting is fixed
(problem found by Larry Virden, fix by him and johnh)
- bug-fix: RobotUA handling updated for new LWP in Search.pm
- bug-fix: update for Infoseek (page format changed about 1 Jan 97)
(problem found by Joseph McDonald, fix by Cesare Feroldi de Rosa)
- bug-fix: update for Excite (page format changed about 9 Jan 97)
(problem found by Juan Jose Amor, fix by GLen Pringle)
1.010: (20 August 1997)
overview: an interim release to fix AltaVista
- new: normalized_score, a back-end independent score (from Paul Lindner)
- new: generic options are supported by several back-ends
(specify search engine URL, debugging, etc.)
- new: AltaVista back-end now sets SearchResult::raw
- bug-fix: update for AltaVista (page format changed Jul 97)
(some information wrt fix provided by Guy Decoux)
1.011: (8 October 1997)
- internal alpha release, see 1.012 for list of new features
1.012: (3 November 1997)
- Overview: an alpha release for test-suite testing
- new: for testing, HTTP results can be saved to disk and played back
- new: test scripts (try "make test")
- bug-fix: Lycos works again and is now maintained by John Heidemann
- bug-fix: AltaVista advanced and news searches have been repaired
- bug-fix: some uninitialized value warnings suppressed
(fix suggested by R. Chandrasekar (Mickey))
- new: new back-ends PLweb
- new: documentation for PLweb (contributed by Paul Linder)
- new: new back-ends: Gopher, Simple (contributed by Paul Linder)
- new: WWW::Search mailing list:
to subscribe, send "subscribe info-www-search" as
the body of a message to
1.013, (19 February 1998)
overview: this is an alpha release to include Martin's new back-ends
- bug fix: HotBot back-end updated by Martin Thurn
- new: Yahoo back-end now works, by Martin Thurn
- problem: several back-ends don't work (Lycos)
- problem: several back-ends don't have test suites and
so may or may not work (DejaNews, Excite, HotBot, Infoseek, PLweb,
SFgate, Verity, Yahoo)
- reminder: WWW::Search mailing list:
to subscribe, send "subscribe info-www-search" as
the body of a message to
1.014, (24 March 1998)
overview: this is an alpha release to fix the AltaVista/Lycos back-ends
- bug fix: AltaVista/Lycos back-ends
(problem reported by Bilal Siddiqui )
- known problem: some back-end test suites give intermittent results
(AltaVista::News)
- problem: several back-ends don't have test suites and
so may or may not work (DejaNews, Excite, HotBot, Infoseek, PLweb,
SFgate, Verity, Yahoo)
1.015, (2-Apr-98)
overview: this is an alpha release with several new back-ends
- new: back-ends: Magellan, WebCrawler (thanks to Martin Thurn)
- bug fix: Yahoo/HotBot/Excite back-ends,
with test suites. Many thanks to Martin Thurn.
- bug fix: AltaVista news test suites have been relaxed,
even though the code worked before, the test suites
used to report false negatives.
- bug fix: AltaVista is now more careful to detect the end of
a hit's raw HTML
- new: the test suite has been enhanced to be less sensitive
to changes in what's indexed
- problem: several back-ends don't have test suites and
so may or may not work (DejaNews, Infoseek, PLweb,
SFgate, Verity)
- reminder: WWW::Search mailing list:
to subscribe, send "subscribe info-www-search" as
the body of a message to
1.016, 21-May-98
overview: this is an alpha to fix HotBot/Infoseek
- bug fix: Infoseek/HotBot back ends now work again.
(HotBot problem reported by Alan McCoy ,
both back-ends fixed by Martin Thurn)
- addition: Infoseek test suite
- addition: test output now includes the version number
1.017, 27-May-98
overview: this is the public release since 1.012
- bug fix: Lycos bug fix
1.018, 31-May-98
overview: back-end updates
- bug fix: Excite and WebCrawler (by Martin Thurn),
AltaVista (by John Heidemann)
updated 30-May-98
- known bugs: WWW::Search doesn't work on MacPerl because of
end-of-line differences. A fix for this problem is in
progress. (Problem identified and fix suggested by
Chris Nandor.)
SUPPORT AND CREDITS
-------------------
The WWW::Search architecture is by John Heidemann with feedback
from the other contributors. Components of AltaVista have been
written by several people:
APPLICATIONS:
WebSearch John Heidemann
AutoSearch William Scheding
BACK-ENDS:
AltaVista John Heidemann
Dejanews Cesare Feroldi de Rosa
Excite GLen Pringle
and Martin Thurn
ExciteForWebServers Paul Lindner
(under development)
Gopher Paul Lindner
HotBot William Scheding and Martin Thurn
Infoseek Cesare Feroldi de Rosa and Martin Thurn
MSIndexServer Paul Lindner (under development)
Lycos William Scheding and John Heidemann
Magellan Martin Thurn
PLWeb Paul Lindner
SFgate Paul Lindner
Simple Paul Lindner
Verity Paul Lindner
WebCrawler Martin Thurn
Yahoo William Scheding and Martin Thurn
AutoSearch is based on an earlier implementation by Kedar Jog
with advice from Joe Touch .
Bugs and extensions (to the software and documentation) have been
identified by William Scheding , T. V. Raman
(proxy support), C. Feroldi ,
Larry Virden , Paul Lindner ,
Guy Decoux , R Chandrasekar (Mickey)
, Martin Thurn , Chris
Nandor , Martin Valldeby .
Bugs have reported by Joseph McDonald , Juan Jose
Amor , Bowen Dwelle , Vassilis
Papadimos .
Feedback, bug reports and fixes, and new back-ends should be sent to
John Heidemann . Before submitting a bug report,
please check for any
announcements about known bugs. When sending e-mail, please please
put [WWW::Search] at the beginning of the subject line (or risk me
losing the message in the pile).
COPYRIGHT
---------
Copyright (c) 1996 University of Southern California.
All rights reserved.
Redistribution and use in source and binary forms are permitted
provided that the above copyright notice and this paragraph are
duplicated in all such forms and that any documentation, advertising
materials, and other materials related to such distribution and use
acknowledge that the software was developed by the University of
Southern California, Information Sciences Institute. The name of the
University may not be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
Portions of this README are derived from the README for libwww-perl.
ISPELL
------
LocalWords: AltaVista Lycos Hotbot WebCrawler libwww perl com sn CPAN isi PL
LocalWords: lsam pl pm perldoc README LocalWords AutoSearch Search's html usr
LocalWords: crontab HotBot autosearch Scheding Kedar Dejanews Infoseek lib de
LocalWords: SearchResult LCI wls Cesare Feroldi GLen Pringle pringle monash
LocalWords: au Raman raman Virden lvirden cas org LWP WebSearch RobotUA Amor
LocalWords: joe smartlink jjamor infor es Yahoo Thurn InfoSeek libwww's PLweb
LocalWords: SFgate Lindner Jul wrt Decoux Chandrasekar Linder Martin's mthurn
LocalWords: tasc DejaNews Bilal Siddiqui bilal siddiqui mankato msus Apr larc
LocalWords: mccoy nasa gov paul lindner itu int decoux moulon inra
LocalWords: fr mickeyc linc cis upenn Dwelle hotwired