Partial crawl and limited output
« on: February 17, 2012, 08:11:20 PM »
We’ve been using the XML sitemap generator software successfully for a while, but recently had to move to a different physical server.   

Generally, the move went smoothly but since then,  the sitemap software doesn’t crawl the site properly.    The settings are identical, except for specifying the new server IP address.    There are a few thousand pages on our site, many of which are essentially wordpress posts and Zencart product pages.   This did work until the move.

What happens now is that when I set it to crawl the site, it stacks up about 85 pages (far too few), and with the debug output on I can see that it is scanning into a few of the site folders and pages, but the sitemap output itself only lists the first page.    It says it crawled 123 pages, less than 10% of the site.  Additionally, attempting to view the HTML sitemap version produces a page that’s blank except for ’500’ in the upper left corner.   

I’ve reinstalled the Generator software on our server closely  following the installation directions, including setting the permissions and creating the empty output files.   The results are not improved.

The PHP version on the server is 5.2.17.  the apache version 2.2.21.

Wondered if you could make any suggestions as to what’s stopped the generation.

Regards,
Patrick Keenan
Re: Partial crawl and limited output
« Reply #2 on: February 19, 2012, 09:54:07 PM »
HI, i am getting the exact same problem. 3 days ago i created a sitemap on a new server and it indexed over 650 pages. I have just re-created the sitemap and it only produces 104. Could you share your answer here so that other people could benefit?

regards,
Shai
Re: Partial crawl and limited output
« Reply #3 on: February 19, 2012, 11:44:25 PM »
Oleg, thanks for your fast response and the adjustment seems to have worked perfectly.  The sitemap now indexes 3,366 pages and takes about 2 hrs 20 minutes, much closer to what I expected.

Shai, the adjustment involved changing, under Advanced Settings, 'Detect canonical URL meta tags', from enabled to disabled (clear the checkbox).

I don't know why this works, and while it'd be interesting to know why the server shift triggered the effect, this isn't a science project for me so I'm just glad it functions again!

Regards,
Patrick Keenan

Re: Partial crawl and limited output
« Reply #4 on: February 20, 2012, 07:44:21 AM »
brilliant. That worked for me as well. I wonder what it is and if it has anything to do with a new navigational menu we have implemented. Thanks Keenan.