<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Technology: Learn and Share &#187; Google mini</title>
	<atom:link href="http://crazytoon.com/category/google-mini/feed/" rel="self" type="application/rss+xml" />
	<link>http://crazytoon.com</link>
	<description>Enterprise level solutions, LAMP, Linux, Apache, MySQL, PHP, Perl, Windows, Cache, Optimization</description>
	<lastBuildDate>Fri, 16 Jul 2010 20:24:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Google mini blues &#8211; 406 ownage!</title>
		<link>http://crazytoon.com/2007/01/15/google-mini-blues-406-ownage/</link>
		<comments>http://crazytoon.com/2007/01/15/google-mini-blues-406-ownage/#comments</comments>
		<pubDate>Tue, 16 Jan 2007 05:16:32 +0000</pubDate>
		<dc:creator>Sunny Walia</dc:creator>
				<category><![CDATA[Google mini]]></category>
		<category><![CDATA[System admin]]></category>

		<guid isPermaLink="false">http://crazytoon.com/2007/01/15/google-mini-blues-406-ownage/</guid>
		<description><![CDATA[We used to use Google mini to index our site to serve search results to our users.  It worked great for a while until sometime back it just stopped indexing.  It would get a fatal error and stop index after couple urls.  It was also not so kind about sending me an [...]]]></description>
			<content:encoded><![CDATA[<p>We used to use Google mini to index our site to serve search results to our users.  It worked great for a while until sometime back it just stopped indexing.  It would get a fatal error and stop index after couple urls.  It was also not so kind about sending me an email to let me know that it has stopped indexing.  It was still serving pages which were indexed prior to this so we didn’t realize it wasn’t indexing new content.</p>
<p>And than one day we logged in to the admin console to see whats going on with it and found out it wasn’t indexing.  I checked the logs and found out that it was stopping due too too many 4xx errors.  After looking at logs on our web server I found out that it was stopping because Apache was giving back 406 response code which is described at w3.org as:</p>
<blockquote><p><small>The resource identified by the request is only capable of generating    response entities which have content characteristics not acceptable    according to the accept headers sent in the request.</small></p>
<p><small>Unless it was a HEAD request, the response SHOULD include an entity    containing a list of available entity characteristics and location(s)    from which the user or user agent can choose the one most    appropriate. The entity format is specified by the media type given    in the Content-Type header field. Depending upon the format and the    capabilities of the user agent, selection of the most appropriate    choice MAY be performed automatically. However, this specification    does not define any standard for such automatic selection.</small></p></blockquote>
<p>I also found on some site that 406 happens if one uses Multiviews (in apache conf).  But nowhere did it talk about how to fix it.  Since we recently switched to using Multiviews, it was a reasonable thing to assume that it might be the cause.  Obviously we didn’t want to go back to not using Multiviews.  I contacted our google support with the question on how to fix it and got a response next day (which in my opinion is slow when you rely on your mini for crucial search functionality).  Following is the response I got from them:</p>
<blockquote><p><small>The 406 Error occurs when the web server wants to send back a<br />
content-type that&#8217;s not included in the Mini&#8217;s Accept header.</small></p>
<p><small>The easiest way to correct this is to add the following line to the &#8220;Additional HTTP Headers for Crawler&#8221; under<br />
Google Mini &gt; Crawl and Index &gt; HTTP Headers for Crawler field on the Crawler Parameters page:</small></p>
<p><small>Accept:text/html,text/plain,application/pdf,text/pdf,application/vnd.ms-excel,<br />
text/vnd.ms-excel,application/rtf,text/rtf,application/msword,text/msword,<br />
application/vnd.ms-powerpoint,text/vnd.ms-powerpoint,<br />
application/x-shockwave-flash,text/x-shockwave-flash,<br />
application/postscript,text/postscript,application/x-gzip,<br />
application/octet-stream,application/*,text/</small></p></blockquote>
<p>And as any good sysadmins would do, I copied pasted the code where they told me to and hoped for the best and started crawling again.  Sure enough, it <em>didn’t</em> work.  406 errors continued to spam the logs everytime I tried crawling.  So back to Google to search for answers.. this time I knew I needed to find answers has to with accept header.  So I looked and tried to figure out what Googlebot sends as accept headers since I know googlebot gets 200 response while crawling.  After looking around I found a site where it lists bunch of search engines and what headers they send and you can test your site against those.  Sure enough, googlebot is updated (google probably figured out this would be a common problem) and handles this differently.  So I added the headers specified on that site:  <strong>Accept:*/*</strong></p>
<p>And ever since than, my mini has been happy crawling our site.  I hope this helps someone out there who is having simliar Google mini problems.  As always, if you know any better ways or have a suggestion/comment, feel free to leave them here.</p>
]]></content:encoded>
			<wfw:commentRss>http://crazytoon.com/2007/01/15/google-mini-blues-406-ownage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
