libswish3 is at the core of multiple Swish3 implementations, and has reached a
stable enough API that a 1.0.0 release seems appropriate.
From the README:
libswish3 is a document parser compatible with the Swish-e 2.4 -S prog API.
libswish3 is a C library for parsing documents into a data structure that can
then be stored and searched with a variety of IR backends.
There are currently four different implementations available of Swish3.
swish_xapian (C++ using libxapian, included in libswish3 distribution)
SWISH::Prog::Xapian (Perl using Search::Xapian)
SWISH::Prog::Lucy (Perl using Apache Lucy)
SWISH::Prog::KSx (Perl using KinoSearch)
All the Perl implementations are available from CPAN.
They each rely on SWISH::3 (the Perl bindings to libswish3) and the core SWISH::Prog project, a Perl
rewrite of the swish-e 2.x C binary and accompanying helper scripts. The
SWISH::Prog distribution includes a 'swish3' command line interface with options
very similar to the swish-e 2.x command line tool.
Xapian, KinoSearch and Apache Lucy all offer robust UTF-8 and incremental
indexing support, as well as the ability to scale to many millions of documents
across multiple servers.
One of the three virtues of programming is Laziness. Beware
of false laziness. Andy Lester writes
on the problem aptly when he describes an interaction with another programmer:
This person was one of those programmers who tried for the premature optimization of saving some typing. He forgot that typing is the least of our concerns when programming. He forgot that programmer thinking time costs many orders of magnitude more than programmer typing time, and that the time spent debugging can dwarf the amount of time spent creating code.
I can vouch for the writer's experience, though for me it has been less about back pain
(though I have that too) than eye strain (going on 7 years now). Biggest of all though
has been having children and working from home: that is the interruption formula in a nutshell.
Update: finally found a fix for this. The problem is that Perl has its own
my_setenv() function that interferes with the native setenv() called by
libswish3.c. The fix was to set the magic Perl var PL_use_safe_putenv as
shown here.
This took many hours and googling to track down. Glad to be done with it (I
hope!).
There's been a ton of work on Swish3 in the last year. I've actually started planning a 1.0 release,
after 5 years of work.
Lately I've been focusing on three things: (1) making the Perl bindings easier to install; (2) indexing
of compressed documents; and (3) supporting XInclude of document fragments. The first is accomplished: you
can install the entire library via CPAN. The last two are aimed at large
doc sets where I want to keep the XML compressed on disk for space reasons, and where I want to re-use
subsets of the document collections in building multiple indexes.
in a project and watching as 1000s of successful tests scroll
by, culminating in the
All tests successful.
message, gives me the same thrill
of satisfaction as when I used to paint houses, and having finished a long day of sweaty labor
at sanding and chipping old paint off, I could stand back and survey the structure,
primed and ready for a fresh coat of paint. It's the anticipation that thrills, in the
same way that a trip to the grocery store and a full fridge, or several loads of clean
laundry folded and stowed safely away in drawers, thrills me. The knowing that I am prepared,
belt cinched tight, all tests successful.
It's been a long week, culminating today in Frozen Perl 2010, a Perl conference for and by Perl hackers, here in the Twin Cities. I gave two talks at today's conference,
one on Swish3 and
the other on Devel::NYTProf and
Search::Tools. Both talks seemed well-received.
In the process of preparing the talks I also released a few new, related
modules to CPAN this week:
Search::Query now has support for SQL and SWISH Dialects. I hope to add
KinoSearch and Xapian dialects soon. The Search::Query::Parser now has
(undocumented and experimental) support for range queries, so that you can say:
foo=( 1..4 )
and that'll be expanded to
foo=( 1 OR 2 OR 3 OR 4 )
when the Dialect query object is stringified. Handy for things like ranges of
dates, which is how I am using it as $work.
Search::Tools, SWISH::API::*
New releases of these older modules as well, with some bug fixes and
refactoring to support the Search::Query.
So, yes. A busy week.
I enjoyed hearing other folks' talks today at Frozen Perl. There was a good
variety: pack/unpack, Unicode, i18n and best practice-related presentations. I
met some new people, renewed friendships with folks I already knew, and drank
lots of free coffee. The cookies were good too.
So I don't surf youtube very much. Or rather, only when my kids are wanting to watch
Wallace and Gromit trailers. So I'm always waaaay behind the times. That said, this video
is a riot.
For the last ten years I have used the color #E3BF70#fddc8e (hex) as my terminal background color. It's a darkish amber color
that is very easy on the eyes. I'm recording it here because every year or so I have to set up a new system
and always have to eyeball the settings till I get something close to what I am used to.
Update: 26 Jan 2009
Here's my .Xdefaults file for my xterm under X11 on OS X.
Contextual Query Language is defined
by the Library of Congress. I discovered it via CQL::Parser.
Brian Cassidy is involved, so it must be good.
I immediately thought "oh shit. Now my new Search::Query module feels late-to-the-party." But on further reading,
I think a CQL dialect in Search::Query makes some sense.
Search::Query is a SQL::Translator-like module for free-text search. I coded it up this week after brewing the idea for some many months. I'm imagining it now as a next-generation Search::QueryParser::SQL, for contexts beyond SQL. Example: I have a query string that works with Xapian and want to convert it to one that works with Swish-e 2.x or KinoSearch. Just parse it with Search::Query::Parser and assign it a target dialect and then call $query->stringify to get the translated version out.
I know the people who read this blog generally do not care about Perl at all (hi Mom!)
but I spend a great deal of time writing code in the language and talking with other
members of the Perl community about our common projects, and so like anyone who has lived
in the Perl world for any length of time, I have an opinion about Perl6. For those not
in the know, Perl5 is the current version of Perl and has been around for over 10 years.
Perl6 is the next major version evolution, but it has been in development for nearly the same
length of time. The problem is that 10 years is a long time for a computer language release
to gestate and many folks whose opinions count (i.e. managers) see that lack of a release
as a sign that Perl Is Dead and not a good choice for their next programming project. So (the
argument goes) Perl6's vaporware status makes it hard for Perl5 programmers to find jobs, because
the "if it ain't new it ain't sexy" ethos of technology counts for more than it should with those
making the money decisions.
The real problem isn't that Perl6 hasn't been released. The real problem is the name Perl6. Perl6
is not a single executable "thing" like Perl5 is; it's an umbrella for several different projects. Right
now I can sit down at just about any modern Unix-like computer and type 'perl' and write some code
that runs. Perl6 doesn't work quite that way. It's a whole new language, not just a major revision to
an existing language. So the version number 5 vs 6 is misleading. That's the problem. Perl is alive and well.
Perl5 continues to be maintained and developed. I get lots of work done every day using it.
Reading through Matt Trout's blog
just now I found this wonderful quote:
Because in free software a question in the form of a well thought out patch is one that almost always gets a constructive answer.
Yes. That's just it. A patch -- real, applicable code -- indicates genuine forethought and effort and I will reward
that kind of conversation every time with equal effort.