ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


OpenGuides: City Wikis in Perl

by Kake Pugh
07/05/2007

Three and a half years ago, Perl.com published an article of mine describing the very beginnings of the OpenGuides project—an open source web application written in Perl, initially aimed at overcoming the limitations of the UseMod software.

From this modest beginning, we've developed a complete wiki toolkit called, unsurprisingly, Wiki::Toolkit and a custom-built web application, OpenGuides, which provides structure and a UI layer on top of that toolkit. Development is ongoing, and the small core team of programmers has grown to include people of all levels of expertise. As new technologies such as RDF, proper CSS support, Ajax, and the Google Maps API have become available, we've taken the useful bits and incorporated them into OpenGuides.

The improvements have not been only technical; the OpenGuides mailing lists and IRC channel have developed into a close-knit community including programmers, testers, guide admins, and even the more prolific contributors to the various guides. We've held meetups, hackfests, and even pub crawls. It wouldn't be an exaggeration to say that the existence of OpenGuides has made my life better, and I doubt I'm the only one who can say that.

The Growth of the Project

All we wanted, back then in 2002, was to have something we could use to let us write up everything we knew about London—our favorite pubs and restaurants, our insider knowledge of the quirks of its public transport system, our top tips for places to buy knitting yarn. It was only after we started working on our custom software that we realized other people might be able to use it as well and that it might give people living in other cities a custom-built and well-tailored way to write about their own neighborhoods.

As other people learned about our project, new OpenGuides sites sprang up here and there. One of the first non-London cities covered was Oxford, which still boasts two OpenGuides-based sites, one catering specifically to vegans and the other a more general guide to the city. Later additions included Boston and Saint Paul/Minneapolis in the US; Vienna, Oslo, and Bologna in Europe; and Milton Keynes, the Cotswolds, Birmingham, Norwich, and many more in the U.K. While some of these have fallen by the wayside, others are still going strong.

Since OpenGuides isn't really in the same situation as most open source projects—we don't have thousands or even hundreds of direct users, due to the nature of the project—much of its development has been driven by the needs of the individual guides. Essentially, although there may be a large number of people who use the sites built on OpenGuides, any feature requests or niggles that end users have regarding an OpenGuide site go first of all to their local guide's admin team, who can often fix the problem themselves, perhaps by adding to the local documentation, by tweaking the config file or stylesheet, or simply by upgrading the software. Hence, feature requests and bug reports that make it through to the core team tend to be well thought out and carefully described. This means we're very likely to take them seriously!

Design Issues

One major issue has always been that of design. While there's a convincing argument that keeping the design consistent across a family of web sites is good because it means people who're used to one of these sites find it easy to contribute to all the others (the approach taken by most sites running on the MediaWiki software), this is perhaps less of an issue for very local sites like most of those running on OpenGuides. Also, guide admins kept asking for more flexibility in the design.

We're using the Template Toolkit to generate all our HTML, so in theory people could just edit the templates themselves; unfortunately, we ran into various problems with this, not least of them the fact that presentation logic is still logic, and hence can have bugs. The dual solution we came up with (and are still working toward) was first to split our monolithic templates into smaller snippets and make it explicit which ones are safely editable; and second, to base our HTML on the philosophy of the CSS Zen Garden—to make the HTML plain, clear, and semantic, and then tweak the colors, widths, and placement by means of CSS.

The other advantage of cleaning up our template files like this is that it makes it easier to distribute templates in multiple languages. Again, this is something we're still working on, mainly because the demand for it hasn't previously been as great as the demand for other features.

Dealing with Spam

Another issue which appeared in the years since we began working on OpenGuides is the problem of wikispam. Wikis are hugely popular with spammers who want to increase their page rank, since wikis in general tend to have high page rank. The freely editable nature of a wiki means that, unless you have some defense, you can find your lovely web site covered in porn spam in a matter of minutes.

While OpenGuides already has a few anti-spam defenses—retroactive moderation by means of page and page version deletion, and proactive moderation for specific pages—we're currently working on an additional and completely customizable feature whereby a guide admin can choose to plug in her own spam-detection module, which is called before any page is written to the database. If this module says "yes, that's spam," the edit is refused and the user is notified. We'll be writing and distributing various modules for plugging in here, but if an admin wants to write her own, she can do whatever she likes, from a simple regex match on the content (or the categories, locales, username, IP address, etc.), to using Net::Akismet or similar and logging every refused edit for later perusal by admins.

The guide I contribute to, the Randomness Guide to London, is already running on the development code that includes the new pluggable anti-spam measures. In the month since we've been using these measures, we've caught around 1,500 spam edits (with no false positives). One reason for this success is that we've been keeping an eye on the spam that does get through, and tweaking our spam detection module as appropriate.

The code on the OpenGuides side is pretty simple; prior to accepting any edit, OpenGuides checks its config file to see if a spam detection module has been specified. If so, and if the module is loadable, then its looks_like_spam method is called to return a true or false value indicating whether this edit should be considered spam:

# If we can, check to see if this edit looks like spam.
my $spam_detector = $config->spam_detector_module;
my $is_spam;
if ( $spam_detector ) {
    eval {
        eval "require $spam_detector";
        $is_spam = $spam_detector->looks_like_spam(
            node    => $node,
            content => $content,
            metadata => \%new_metadata,
        );
    };
}

If an edit does look like spam, the editor is informed of this fact, and the edit is not saved:

if ( $is_spam ) {
    my $output = OpenGuides::Template->output(
        wiki     => $self->wiki,
        config   => $config,
        template => "spam_detected.tt",
        vars     => {
                      not_editable => 1,
                    },
    );
    return $output if $return_output;
    print $output;
    return;
}

The name of the page, the main (freeform) content, and the structured data associated with the page (the metadata) are all passed to the looks_like_spam method, allowing fine-grained spam detection. One of the most prevalent types of wikispam is an edit with the changelog comment of "Some grammatical corrections." This is easy to match:

sub looks_like_spam {
    my ( $class, %args ) = @_;
    my $comment = $args{metadata}{comment};
    if ( $comment =~ /some grammatical corrections/i ) {
        return 1;
    }
}

OpenGuides itself simply discards the attempted edit, leaving it up to the author of the spam detection module to decide on the most appropriate method of logging the attempt and notifying the guide administrators. On the Randomness Guide to London, we use Email::Send to email us all the details:

use Data::Dumper;
use Email::Send;

sub looks_like_spam {
    my ( $class, %args ) = @_;

    my $content = $args{content};
    if ( $content =~
                 /\b(viagra|cialis|tramadol|vicodin)\b/is ) {
        $class->notify_admins( %args,
                               reason => "Matches $1" );
        return 1;
    }
}

sub notify_admins { 
     my ( $class, %args ) = @_; 
    my $datestamp = localtime( time() ); 
    my $message = <<EOM; 
 From: kake\@earth.li 
 To: kake\@earth.li, bob\@randomness.org.uk 
Date: $datestamp 
 Subject: Attempted spam edit on RGL

Someone just tried to edit RGL, and I said no because it
looked like spam.  Here follows a dump of the details:

EOM
    $message .= Dumper( \%args );

    my $sender = Email::Send->new( { mailer => "SMTP" } );
    $sender->mailer_args( [ Host => "localhost" ] );
    $sender->send( $message );
}

Pages: 1, 2

Next Pagearrow





Sponsored by: