Sphinx and Rachinger

Let me see if I can make this clear enough for the non-tech public and jonolists, and for the techheads amongst you.

The search engine on this site is run by Sphinx search. This is an open source general purpose package that every 7 minutes ( a prime number) runs a scan on the databases for this site for any new or amended posts and or comments. It indexes these using an extremely efficient hashing lookup system to provide a very fast, efficient and powerful search system for the site.

Since the site has 16,171 posts and 922,412 comments at present and a database of about 5.2 Gigabytes (a gigabyte is about 1,000,000,000 letters) of text, searching through it by hand is close to impossible. So I put in the most efficient open source package search package that I could find back in 2009. But it had a little flaw.

Having this tool in the site is incredibly valuable for transparency and robust debate between commenters and lurkers. In a site that doesn’t insist that people login, it means that we can tag every handle used so that others may find out what they have previously said.

In listing comments, it means that we can provide a list of every comment that a particular person has done. It allows us to provide a function to find out the visible history of someone who responds to your own comments. And it allows investigation of people, topics, and what people have said.

But it is a powerful hand rolled solution. One that at the time it was written didn’t account for a particular  feature of WordPress. That it could have private posts. This was for the simple reason that we’d never used them. When authors wanted to talk to each other, they had emails, cell numbers, and the usual multitude of channels that activists usually talk to each other with.

As anyone who follows the site knows, we have had quite a number of authors who have shuffled in and out of the activity on the site. As our About says:-

We’re a collective who saw a gap in the New Zealand political blogosphere and decided that we should have a go at filling it here at The Standard blog site. We write here in our personal capacities and the opinions that are expressed on the blog are individual unless expressly stated otherwise (see the policy). We do not write on behalf of any organization.

It is a completely voluntary organisation. As people’s lives change; changing jobs, having kids, getting sick or well, going off to start their own blogs, or just dropping out of politics or blogging – our regular authors and regular commenters change. There are 71 people with author or higher rights on the site. This reflects the turn over of authors and moderators in the last 7 years. At anyone point there are usually less than 10 of us active.

After the election in 2014, when the usual shuffling of authors and moderators happened, we looked at how we’d run the site into the future. One part of that was that I was that I’d shifted jobs just prior to the election and was going to have a lot less time to admin the site. Another part was that we had picked up and were planning to get more new authors and moderators involved, and most of them would probably be from the ranks of people who comment.

So we decided to take the risk of having a private forum area for authors and above to discuss operational crap in. The italics are there because this is the loosest of all possible organisations. What you get on the site is what you decide to do yourself and what others are willing to let you get away with. And that is a bit moot sometimes, the about again

We come from a variety of backgrounds and our political views don’t always match up but it’d be fair to say that all of us share a commitment to the values and principles that underpin the broad labour movement and we hope that perspective will come through strongly as you read the blog.

With the exception of me (since I set the site mechanics up), we were given author logins because we were known to disagree. This private forum area was going to be the first time that more than a handful of authors had ever had a place where they could sit and talk to each other in a semi-private situation about how the site operated and where it was heading. This proposal is a major improvement in organisation over our previous systems which largely consisted of someone coming up with a good idea  and whoever else felt it was (like me) doing some work to see how it went.

The risk was that we wouldn’t particularly like each other once we started cooperating a bit closer and that the site would fracture and fall apart through lack of agreement. My personal guess was that we’d lose about 3-5 authors from having authors and moderators realising that. Arguably we’ve lost a couple.

But it is often hard to see where the interface between blogger fatigue and disagreement starts and ends. Headstrong opinionated egotists who are the type of people who volunteer to write as authors and commenters eventually run out of things to say or just get worn down like anyone else. I’ve had to take several keyboard vacations over the years myself.

So the private forums were setup just after the 2014 election, and proved to have the flaw that the previous code I’d adapted and rewritten for Sphinx to extract all of the posts and comments from The Standard’s database. It didn’t restrict the comments in the second part of the SQL (structured query language) statement below to the posts selected as being published in the first part. The reason was that previously the only comments were in published posts. That was a bug when we started to put comments into private posts.

The content of these author comments is mundane to the n’th degree. It consist of people pointing out issues, trading post ideas and arguing about moderation policies. I’d say that about half of the comments are in some shape manner of form are about how to write posts or calming authors down enough after some commenter drives them nuts.

It is unlikely that there is anything in them that is anything more embarrassing than some of us creaky antiques explaining to others how the inside of political parties really worked, and why we didn’t let the new tom thumbs have direct banning rights.

But these author private comments leaked into searches. At least until Stephanie Rodgers pointed that out to me, and I spent a hour or so doing the fix in bold below and checked (and fixed) a couple of other potential ways to access the same blind spot flaw in my code, wordpress code, and some plugins. I think that the items were visible for a couple of days on search before they got fixed.

However it appeared to have caused Cameron Slater to cum all over his brain and shut down even the minimal amounts of animal cunning that he usually possesses.

But it has an upside. I am rather glad that a student got the equivalent of four months of our operating costs off him. It also has shown that the Dirty Politics engine that was setup by the immoral arseholes of the right is still operating and is probably still deep within illegal activities.

 


But now avert your eyes if you are technically literate because now I get technical. The code fix is highlighted at the end of the SQL


 

[code language=”sql” highlight=”22,26,27″]

SELECT

p.ID*2+1 AS ID, 0 AS comment_ID, p.ID AS post_ID, p.post_title AS title,

u.display_name AS author, p.post_content AS body, t.name AS category,

IF(p.post_type = ‘post’, 1, 0) AS isPost, 0 AS isComment,

IF(p.post_type = ‘page’, 1, 0) AS isPage,

IF(p.post_type = ‘post’, 0, IF(p.post_type = ‘page’, 1, 2)) AS post_type,

UNIX_TIMESTAMP(post_date) AS date_added

FROM

wp_posts AS p

INNER JOIN wp_users u ON (p.post_author = u.ID)

INNER JOIN wp_term_relationships tr ON (p.ID = tr.object_id)

INNER JOIN wp_term_taxonomy tt ON (tt.term_taxonomy_id = tr.term_taxonomy_id AND tt.taxonomy = ‘category’)

INNER JOIN wp_terms t ON (tt.term_id = t.term_id)

WHERE

p.id = ( SELECT max_doc_id FROM wp_sph_counter WHERE counter_id=1 ) AND p.post_status = ‘publish’

UNION ALL

SELECT

c.comment_ID*2 AS ID, c.comment_ID AS comment_ID,

c.comment_post_ID AS post_ID,” AS title, c.comment_author AS author, c.comment_content AS body, ” AS category, 0 AS isPost,

1 AS isComment, 0 AS isPage, 2 AS post_type, UNIX_TIMESTAMP(comment_date) AS date_added

FROM wp_comments AS c,

wp_posts AS pp

WHERE

c.comment_id = ( SELECT max_doc_id FROM wp_sph_counter WHERE counter_id=2 ) AND

c.comment_approved = ‘1’ AND

c.comment_post_ID = pp.ID AND

pp.post_status = ‘publish’

[/code]

Youall understand that fix? Well I am sure that most won’t. And I’d quite sure that most of the pathetic dipsticks who donated me the title “worlds greatest sysop”  (and who keep wanking on it at every available opportunity) are definitely too illiterate and thick to do so.

But surely there are the odd right winger who has enough brains (Barnsley Bill?) could explain it to them. I’d advise them to use simple words and punctuate with grunting at frequent intervals.

Powered by WPtouch Mobile Suite for WordPress