ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Creating Google Custom Search Engines
Pages: 1, 2, 3

I'll use the bulk site form to quickly input several additional recipe sites I want added to my engine.



bulk site form figure
Figure 5. Bulk site form

Note how I've used the wildcard option. In the case of the BBC site, all the recipes start with the same substring, with URLs like http://www.bbc.co.uk/food/recipes/swiftsuppers_vegpasta.shtml. I can put the * right after the first part of the URL, it doesn't need to be separated from the rest by a slash mark.

If you'd like to add sites to your engine as you come across them, then you can install the Google Marker bookmarklet in your browser. This is added as a button to your browser (IE or Firefox). Once in place, you simply click on it when you're at a site you'd like to add to your engine. You'll see the following simple dialog box. Fill it out and press Save to add the site to your engine.

google marker figure
Figure 6. Google Marker

When you're starting out, these form based approaches for maintaining your search engine work. But as you add sites, you'll find it hard to work with multiple sites quickly and add various properties to the sites as you input them. Next we'll look at using XML to specify the sites used in the CSE.

Specifying Your CSE Using XML

Before getting into the details here, I need to issue a warning. When you make changes to your search engine using XML files, you may accidentally break your working engine. So, please make sure to keep original versions of the XML files in a safe place in case you need to restore them.

You can specify all the settings for your custom search engine in XML format. The easiest way to get going with this is to look at the current settings for your engine in XML form.

Go to the Control Panel for your custom search engine and click on the Advanced tab. Here you can download two different types of information. The Context information defines the global settings for the engine. These are the settings that you entered when first creating the engine and things like whether volunteers are allowed or what colors to use when displaying results. The Annotations information is the heart of your search engine. It's has all the information about which pages and sites to include in the results and how these are treated.

First, let's get the current context information for the engine by clicking on the last Download in XML Format button on this tab. This doesn't really download anywhere, it just displays the current context information in another browser window or tab. You can then use your browser to actually save the XML to a file on your system.

<?xml version="1.0" encoding="UTF-8"?>
<GoogleCustomizations>
  <CustomSearchEngine version="1.0" volunteers="true" 
  keywords="homecooking "easy to prepare" "simple cooking"" 
  Title="Simple Recipes Search Engine" 
  Description="Easy-to-prepare recipes for home cooks. If you want to have a meal ready in 60 minutes, look here for good recipes that you can use immediately." 
  language="en" visible="true">
    <Context>
      <BackgroundLabels>
        <Label name="_cse_rlplbd3nkfw" mode="FILTER"/>
        <Label name="_cse_exclude_rlplbd3nkfw" mode="ELIMINATE"/>
      </BackgroundLabels>
    </Context>
    <LookAndFeel nonprofit="false"/>
  </CustomSearchEngine>
</GoogleCustomizations>

Initially, the most important values are those in the <BackgroundLabels> section of the XML. When using XML later to change the sites included in your engine, you'll need the name values that are in the <Label> nodes in this section.

Now we download the data (annotations) in this engine by clicking on the Download in XML button that's in the Annotations section. You can also just visit this Download URL to get the same information. Again, you can save the result browser window to a file.

<?xml version="1.0" encoding="UTF-8"?>
<GoogleCustomizations>
  <Annotations>
    <Annotation about="www.bbc.co.uk/food/recipes/swiftsuppers*">
      <Label name="_cse_rlplbd3nkfw"/>
    </Annotation>
    <Annotation about="www.karenscountrykitchen.com/*">
      <Label name="_cse_rlplbd3nkfw"/>
    </Annotation>
...
    <Annotation about="www.recipeswizard.com/*">
      <Label name="_cse_rlplbd3nkfw"/>
    </Annotation>
    <Annotation about="www.dmoz.org/Home/Cooking/*">
      <Label name="_cse_rlplbd3nkfw"/>
    </Annotation>
</Annotations>
</GoogleCustomizations>

If you have more than one custom search engine in Google, the annotations download will contain the annotations for all of your search engines. They'll be mixed in together based on the latest sites added to either engine.

This is why the label names from the context information are so important. Every <Annotation> node in a specific search engine will have either the FILTER or ELIMINATE mode name value in its <Label> node.

You could select the appropriate nodes for a given engine in a text editor, but if there are more than about 50 nodes it gets tiring really quickly. The easiest way to extract the nodes you want is to use XSLT and XPath to transform the downloaded annotations into just those for the search engine you're working on.

I've created an XSLT file to do this transform based on the name values for this search engine. You can modify it to use the name values for your own search engine.

Pages: 1, 2, 3

Next Pagearrow





Sponsored by: