AJAX content for spiders

GooglebotAfter a conversation with a friend who is also from a computer science background he seemed adamant that it was incredibly difficult to get web spiders to read and cache AJAX data. I disagreed with this and thought of various solutions, the one below seems like the simplest to me.

Why would you want to do this? Well there are a variety of use-cases but the obvious one that springs to mind is if you want to interactive content for users, or want to speed up loading times for them, yet still want robots to cache the data i.e. if you want the data indexed by google.

So after the discussion with them, I began developing some proof of concept code, as found below, I hope it's fairly self-explanitory. If you need more robots to add to the list there are various websites for this (robotstxt, botsvsbrowsers & spidernames)
 

<?php

$interestingCrawlers = array( 'google', 'yahoo', 'msn', 'w3c_validator' ); // Bot user agents in lower case
$pattern = '/(' . implode('|', $interestingCrawlers) .')/';
$matches = array();
$numMatches = preg_match($pattern, strtolower($_SERVER['HTTP_USER_AGENT']), $matches, 'i');

if($numMatches > 0){ // ROBOT DETECTED!!!!!!!!!
   $NoOfResultsToDisplay = 50; // How many results are you going to display to the bot at a time
   $requestID = 0; $offset = 0; $NoResults = 0;//Defaults - no need to change these
   
   if(isset($_GET['id']) && is_numeric($_GET['id'])){ 
      if($_GET['id'] >= 1){
	     $requestID = $_GET['id'];
         $offset = $_GET['id'] * $NoOfResultsToDisplay; 
      }
   }
  
   // connect to database 
   // get database results LIMIT $offset, $NoOfResultsToDisplay;
   // for each result echo the data you want to be cached by the bot
   // update $NoResults with the number or returned rows
   // disconnect from database
   
   if($NoResults >= 1){
      $requestID++;
      $url = "http://".$_SERVER['HTTP_HOST'].$_SERVER['SCRIPT_NAME']."?id=".$requestID;
      echo "<a href=\"".$url."\">next ".$NoOfResultsToDisplay." results</a>";
   }
  
}else{ // YOUR A HUMAN!!! YEY! :)
echo <<<END 

   // echo the AJAX here for normal users to view (just an example)
   <script>
   $("#load_callback").click(function(){  
    $("#result")  
        .html(ajax_load)  
        .load(loadUrl, null, function(responseText){  
            alert("Response:\n" + responseText);  
        });  
   });
   </script>
   // change above code to reflect the AJAX your site requires
   
END;   
}

?>

I hope this has helped you, as always: likes, comments & shares are always welcome and appreciated.

Leave a Reply