Search upgraded (yet again)

My absence this weekend from moderation and posts has partially been because of going to the excellent Fabian seminar today. However it has largely been due to trying to get a workable search system running.

One that didn’t bring the whole server crashing around my ears every few months and cut the issue of everyone having to stop receiving data for up to a minute when someone used search. One that allowed comments to be accessed as easily as posts, which Search Unleashed achieved (which eventually failed because the re-index took too long). It looks like it is finally working correctly, but I’d like feedback on any errors that show up.

For the technically minded, this is running on the Sphinx search engine with the WordPress Sphinx Search Plugin, and some custom coding in the search sidebar and search page template for our new theme. It runs a delta update for changed items every 5 minutes, and a complete rebuild at 0300.

This is the third time I’ve attempted to get this system running. Each of the previous attempts have been stymied by sphinx disliking our constrained memory during peak hours. But we’ve bumped the RAM up, and it also turns out that there were some conflicting permission issues (grrr) when running the cron jobs with an active website.

Normally I’d have consigned something as annoying to configure as this to the dustbin, but the engine is blindingly fast and perfect for our scaling needs with nearly 5000 posts and 150,000 comments currently onsite. This is expected to keep rising at rates similar or higher than have been the case so far.

Sphinx can reindex our entire database in about 30 seconds (whereas Search Unleashed required hours), and will do the delta every 5 minutes in well under a second. Almost all of the delay in presenting the search result is from assembling the display rather than the search. It is a seriously good engine and has been (so far) worth the effort.

The user interface is a little primitive at present and I also need to store and get your user selected advanced settings in your cookies. But it is good enough to look for loading problems over this week. It has a number of options that will be useful for advanced searching, especially for authors like r0b…

Sphinx extended queries syntax allows the following special operators to be used:

Here’s an example query which uses all these operators:

“hello world” @title “example program”~5 @body python -(php|perl)

More information about extended syntax you can find at Sphinx.

The following field operators are available:

For instance, I tested such things as lprent&fabian lprent fabian, pigs-flying, and (supershitty|super-city|super city|supercity)& franklin.

Using the author you can search for f*ck @author robinsod (that allows you to exclude the people swearing at him).

If you use the advanced and turn pages and comments off, you can resolve a long-standing argument.

If I see that argument again…..

The time to run a search seems to be roughly the same regardless of how complex I make the query.

Anyway, let me know if you find any issues using the form in contact. I’ll be keeping an eye on it for the next 3 or 4 days for performance issues.

Hopefully I’ll now have more time  to write posts and tackle the minor teething issues like the misbehaving Contribute Past captcha, and less time trying to maintain a fragile search mechanism that was slowly fading under increased loads.

BTW: For those of you who were at the Fabian seminar today. I was paying attention to the panel and discussion. But I was also coding during the seminar. This had been bugging me most of the weekend to the point where I had to go to the seminar, tether the iPhone to the net, and finish the damn search page interface.

Update: Been through the site settings for the first time since it last had a server move. You’ll see a significant improvement in performance in the search, and probably in just reading the site.

Update: Found a couple of errors in the post, which you will see corrected with the original in strikeout. There are a number of enhancements to the system (like @author). I added some additional examples while testing.

But I was paying attention to the seminar as well

Powered by WPtouch Mobile Suite for WordPress