SWI-Prolog Natural Language Processing Primitives
Jan Wielemaker
HCS,
University of Amsterdam
The Netherlands
E-mail: wielemak@science.uva.nl
This package contains some well known basic routines for natural language processing and information retrieval. The current version of this package is very limited, which makes the name somewhat misleading. Suggestions and contributions are welcome. |
The library library(double_metaphone)
implements the Double
Metaphone algorithm developed by Lawrence Philips and described in
``The Double-Metaphone Search Algorithm'' by L Philips, C/C++ User’s
Journal, 2000. Double Metaphone creates a key from a word that
represents its phonetic properties. Two words with the same Double
Metaphone are supposed to sound similar. The Double Metaphone algorithm
is an improved version of the Soundex algorithm.
The Double Metaphone algorithm is copied from the Perl library that holds the following copyright notice. To the best of our knowledge the Perl license is compatible to the SWI-Prolog license schema and therefore including this module poses no additional license conditions.
Copyright 2000, Maurice Aubrey <maurice@hevanet.com>. All rights reserved.This code is based heavily on the C++ implementation by Lawrence Philips and incorporates several bug fixes courtesy of Kevin Atkinson <kevina@users.sourceforge.net>.
This module is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
The library(porter_stem)
library implements the stemming
algorithm described by Porter in Porter, 1980, ``An algorithm for suffix
stripping'', Program, Vol. 14, no. 3, pp 130-137. The library comes with
some additional predicates that are commonly used in the context of
stemming.
[-+][0-9]+(\.[0-9]+)?([eE][0-9]+) | number |
[:alpha:][:alnum:]+ | word |
[:space:]+ | skipped |
anything else | single-character |
It is likely that future versions of this library will provide tokenize_atom/3 with additional options to modify space handling as well as the definition of words.
The origin of this implementation is unknown. Phrasing of the header suggests the code is meant to be shared, so we assume it is in the public domain. The code has been modified by Jan Wielemaker. He removed all global variables to make the code thread-safe, added the unaccent and tokenize code and created the SWI-Prolog binding.
Installation on Unix system uses the commonly found configure,
make and make install sequence. SWI-Prolog should be
installed before building this package. If SWI-Prolog is not installed
as pl, the environment variable PL
must be set to
the name of the SWI-Prolog executable. Installation is now accomplished
using:
% ./configure % make % make install |
This installs the foreign libraries in $PLBASE/lib/$PLARCH
and the Prolog library files in $PLBASE/library
, where
$PLBASE
refers to the SWI-Prolog `home-directory'.