peknet :: an eddy in the bit stream         
about peknet
peknet is Peter E Karman musing on technology, politics, religion, books, beer and parenthood.

navigate

credits

Brighter Planet's 350 Challenge

St Paul Minnesota Yellow Pages

Powered by Swish-e

Valid CSS!

proud member of the
Open Source Community

© 2005 peknet dot com

syndicate this site

projects

In this area you'll find other things I'm working on and/or interested in.

But have I mentioned my beautiful boys lately?

File under projects/ Mon Jun 29 19:54:17 CT 2009

Dezi search platform

This week I announced the initial release of Dezi, a new search platform based on Swish3, Apache Lucy, OpenSearch and Plack.

As of about 15 minutes ago, there are now PHP and Perl clients available.

File under projects/swish Sat Oct 1 21:47:46 CT 2011

libswish3 1.0.0 released

I am happy to announce the 1.0.0 release of libswish3:

http://swish-e.org/swish3/libswish3-1.0.0.tar.gz

libswish3 is at the core of multiple Swish3 implementations, and has reached a stable enough API that a 1.0.0 release seems appropriate.

From the README:

libswish3 is a document parser compatible with the Swish-e 2.4 -S prog API. libswish3 is a C library for parsing documents into a data structure that can then be stored and searched with a variety of IR backends.


There are currently four different implementations available of Swish3.
  • swish_xapian (C++ using libxapian, included in libswish3 distribution)
  • SWISH::Prog::Xapian (Perl using Search::Xapian)
  • SWISH::Prog::Lucy (Perl using Apache Lucy)
  • SWISH::Prog::KSx (Perl using KinoSearch)


All the Perl implementations are available from CPAN. They each rely on SWISH::3 (the Perl bindings to libswish3) and the core SWISH::Prog project, a Perl rewrite of the swish-e 2.x C binary and accompanying helper scripts. The SWISH::Prog distribution includes a 'swish3' command line interface with options very similar to the swish-e 2.x command line tool.

Xapian, KinoSearch and Apache Lucy all offer robust UTF-8 and incremental indexing support, as well as the ability to scale to many millions of documents across multiple servers.

You can read more about Swish3 at the devel site.

UPDATE: Mailing list announcement here.

File under projects/swish Wed Sep 21 22:03:59 CT 2011

Search::OpenSearch::Server with REST API

Just uploaded several modules to CPAN that together implement a full REST API for KinoSearch indexes, using Search::OpenSearch::Server::Plack.

% curl -XPOST http://localhost:5000/foo \ -d '<doc><title>bar</title>foo</doc>' \ -H 'Content-Type: application/xml' [response:] { "success":1, "doc":{ "orgs":[], "places":[], "people":[], "topics":[], "summary":"", "title":"bar", "author":[] }, "total":"21581", "code":"200" }


The modules are:
  • Search::OpenSearch 0.11
  • Search::OpenSearch::Server 0.05
  • Search::OpenSearch::Engine::KSx 0.08
  • SWISH::Prog::KSx 0.17
  • SWSIH::Prog 0.49


  • File under projects/swish Thu May 26 13:56:43 CT 2011

    False Laziness

    One of the three virtues of programming is Laziness. Beware of false laziness. Andy Lester writes on the problem aptly when he describes an interaction with another programmer:

    This person was one of those programmers who tried for the premature optimization of saving some typing. He forgot that typing is the least of our concerns when programming. He forgot that programmer thinking time costs many orders of magnitude more than programmer typing time, and that the time spent debugging can dwarf the amount of time spent creating code.

    File under projects/ Thu Dec 23 03:40:32 CT 2010

    Funny depending on whoami

    Only funny if you're a programmer.

    File under projects/ Mon Dec 20 21:25:40 CT 2010

    The Interruptible Programmer

    A brilliant and humane essay on changing work habits.

    Excuse me while I get up and stretch.

    I can vouch for the writer's experience, though for me it has been less about back pain (though I have that too) than eye strain (going on 7 years now). Biggest of all though has been having children and working from home: that is the interruption formula in a nutshell.

    File under projects/ Fri Oct 15 10:43:06 CT 2010

    CPAN test failures

    SWISH::3 0.08_04 is passing all tests all over the CPAN testers universe, so that is encouraging.

    However, some reports (notably on FreeBSD) report false failures because of a Wstat issue.

    I've posted about it at PerlMonks and hope someone out there has an easy fix.

    Update: finally found a fix for this. The problem is that Perl has its own my_setenv() function that interferes with the native setenv() called by libswish3.c. The fix was to set the magic Perl var PL_use_safe_putenv as shown here. This took many hours and googling to track down. Glad to be done with it (I hope!).

    File under projects/swish Mon Oct 11 00:37:31 CT 2010

    Swish3 progress report

    There's been a ton of work on Swish3 in the last year. I've actually started planning a 1.0 release, after 5 years of work.

    Lately I've been focusing on three things: (1) making the Perl bindings easier to install; (2) indexing of compressed documents; and (3) supporting XInclude of document fragments. The first is accomplished: you can install the entire library via CPAN. The last two are aimed at large doc sets where I want to keep the XML compressed on disk for space reasons, and where I want to re-use subsets of the document collections in building multiple indexes.

    File under projects/swish Tue Jun 8 23:33:18 CT 2010

    Open Source Handbook

    Open Source Handbook reviewed.

    File under projects/ Tue Mar 16 23:03:03 CT 2010

    make test

    Invoking
    make test
    in a project and watching as 1000s of successful tests scroll by, culminating in the
    All tests successful.
    message, gives me the same thrill of satisfaction as when I used to paint houses, and having finished a long day of sweaty labor at sanding and chipping old paint off, I could stand back and survey the structure, primed and ready for a fresh coat of paint. It's the anticipation that thrills, in the same way that a trip to the grocery store and a full fridge, or several loads of clean laundry folded and stowed safely away in drawers, thrills me. The knowing that I am prepared, belt cinched tight, all tests successful.

    File under projects/ Wed Mar 3 21:56:14 CT 2010

    Frozen Perl 2010

    It's been a long week, culminating today in Frozen Perl 2010, a Perl conference for and by Perl hackers, here in the Twin Cities. I gave two talks at today's conference, one on Swish3 and the other on Devel::NYTProf and Search::Tools. Both talks seemed well-received.

    In the process of preparing the talks I also released a few new, related modules to CPAN this week:
    Search::OpenSearch
    OpenSearch server glue for KinoSearch and Swish-e 2.x via SWISH::Prog. There's a demo Plack app and ExtJS, using both search engines as part of the slides for my Swish3 talk.

    I think OpenSearch is very cool and look forward to doing more with that spec, including adding more features (e.g. facets) to Search::OpenSearch.
    Search::Query
    Search::Query now has support for SQL and SWISH Dialects. I hope to add KinoSearch and Xapian dialects soon. The Search::Query::Parser now has (undocumented and experimental) support for range queries, so that you can say:
    foo=( 1..4 )
    and that'll be expanded to
    foo=( 1 OR 2 OR 3 OR 4 )
    when the Dialect query object is stringified. Handy for things like ranges of dates, which is how I am using it as $work.
    Search::Tools, SWISH::API::*
    New releases of these older modules as well, with some bug fixes and refactoring to support the Search::Query.
    So, yes. A busy week.

    I enjoyed hearing other folks' talks today at Frozen Perl. There was a good variety: pack/unpack, Unicode, i18n and best practice-related presentations. I met some new people, renewed friendships with folks I already knew, and drank lots of free coffee. The cookies were good too.

    File under projects/swish Sat Feb 6 23:27:19 CT 2010

    The Vendor-Client Relationship

    So I don't surf youtube very much. Or rather, only when my kids are wanting to watch Wallace and Gromit trailers. So I'm always waaaay behind the times. That said, this video is a riot.

    File under projects/ Sat Jan 30 21:42:00 CT 2010

    Terminal Color

    For the last ten years I have used the color #E3BF70#fddc8e (hex) as my terminal background color. It's a darkish amber color that is very easy on the eyes. I'm recording it here because every year or so I have to set up a new system and always have to eyeball the settings till I get something close to what I am used to.

    Update: 26 Jan 2009 Here's my .Xdefaults file for my xterm under X11 on OS X.
    XTerm*background: #fddc8e
    XTerm*foreground: black
    XTerm*faceName: monaco
    XTerm*faceSize: 10
    XTerm*saveLines: 10000
    XTerm*scrollBar: true
    XTerm*rightScrollBar: true
    XTerm*jumpScroll: true
    XTerm*geometry:100x40+0+0
    

    File under projects/ Tue Jan 26 20:13:41 CT 2010

    I like Plack

    Plack is a Perl Web Server written by miyagawa.

    File under projects/ Tue Jan 19 10:56:20 CT 2010

    CQL

    Contextual Query Language is defined by the Library of Congress. I discovered it via CQL::Parser. Brian Cassidy is involved, so it must be good.

    I immediately thought "oh shit. Now my new Search::Query module feels late-to-the-party." But on further reading, I think a CQL dialect in Search::Query makes some sense.

    Search::Query is a SQL::Translator-like module for free-text search. I coded it up this week after brewing the idea for some many months. I'm imagining it now as a next-generation Search::QueryParser::SQL, for contexts beyond SQL. Example: I have a query string that works with Xapian and want to convert it to one that works with Swish-e 2.x or KinoSearch. Just parse it with Search::Query::Parser and assign it a target dialect and then call $query->stringify to get the translated version out.

    File under projects/ Thu Jan 14 22:46:01 CT 2010

    Perl6 and Perl5

    I know the people who read this blog generally do not care about Perl at all (hi Mom!) but I spend a great deal of time writing code in the language and talking with other members of the Perl community about our common projects, and so like anyone who has lived in the Perl world for any length of time, I have an opinion about Perl6. For those not in the know, Perl5 is the current version of Perl and has been around for over 10 years. Perl6 is the next major version evolution, but it has been in development for nearly the same length of time. The problem is that 10 years is a long time for a computer language release to gestate and many folks whose opinions count (i.e. managers) see that lack of a release as a sign that Perl Is Dead and not a good choice for their next programming project. So (the argument goes) Perl6's vaporware status makes it hard for Perl5 programmers to find jobs, because the "if it ain't new it ain't sexy" ethos of technology counts for more than it should with those making the money decisions.

    The real problem isn't that Perl6 hasn't been released. The real problem is the name Perl6. Perl6 is not a single executable "thing" like Perl5 is; it's an umbrella for several different projects. Right now I can sit down at just about any modern Unix-like computer and type 'perl' and write some code that runs. Perl6 doesn't work quite that way. It's a whole new language, not just a major revision to an existing language. So the version number 5 vs 6 is misleading. That's the problem. Perl is alive and well. Perl5 continues to be maintained and developed. I get lots of work done every day using it.

    Matt Trout writes a nice piece about this topic, aimed at the Perl community. I applaud it.

    File under projects/ Mon Dec 7 10:03:13 CT 2009

    Question as Patch

    Reading through Matt Trout's blog just now I found this wonderful quote:
    Because in free software a question in the form of a well thought out patch is one that almost always gets a constructive answer.


    Yes. That's just it. A patch -- real, applicable code -- indicates genuine forethought and effort and I will reward that kind of conversation every time with equal effort.

    File under projects/ Mon Dec 7 10:01:06 CT 2009

    Great American Hackathon

    Just found out about this.

    File under projects/ Mon Dec 7 10:00:24 CT 2009

    SWISH::Prog::KSx and SWISH::Prog::Xapian on CPAN

    Uploaded first pass at both implementations this last week. The announcement to the Swish-e list just went out.

    File under projects/swish Mon Nov 30 22:19:26 CT 2009


    Past entries: 2004 . 2005 . 2006 . 2007 . 2008 . 2009 . 2010 . 2011 . 2012 .