bbunlock

*
taking ages?
« on: November 27, 2006, 01:16:19 AM »
ok I have bought the sitemap genarator, love the layou and easy of use and all the nice little features BUT its going to take forever to finish my sitemap?

I estamat that there will be aprox 1 to 1.5 millions pages to index on my site and at the current rate will be lucky if it can do it in a week, not only that but allowing it to do it is draging the server load down also so I am left to interupt the generator during the day and only let it run in the background at night, so now take into account it is only going to be running for apox 1/3 of the day means its going to take 3 weeks for it to genrate my sitemap.

thing I dont understand is the strain it is putting on the server, I mean im getting that the http process for this genrator can take upto 90% of the cpu ussage (figures from top in ssh) now the what I dont get is that it should'nt be anything near that becuase at pressent it only seems to be doing 20 pages every 10 seconds? now I can goto my site and click refresh 20 times in 10 seconds and it wouldnt put anywhere near that amount of strain on the server?

it seems the further it goes the through the site the longer it is taking, I have autosave (or whatever the feature is called) enabled to save the data every 3 minutes (180) and it is saving the nice big file in the data folder, so im guesing that the problem is it is taking far to long to add data to this file becuase of the size (90+ meg and growing every 3 mins)?

am I right in this? also of course it is set to create more than one sitemap (each map = 50,000 pages) so i HOPE that when the current state hits 50,000 pages added to sitemap it will actualy createt he first map and then empty the dump file and begin with the next file? if it doesnt then I doiubt that this sitemap will ever get finished becuase like I said it seems to slow down the more pages it scans.

ps would have posted this in the top forum but for some reason unable to create new topic in that forum so this seemed to be the next best choice, also would have sent a private message to the admin direct but once again there seems to be no option for me to do that? I did click the admin profile but no options to send private message.

I am leaving the sitemap going now in the background until tomorow morning, I will then interup the process and let you know whats happening.

regards
Re: taking ages?
« Reply #1 on: November 27, 2006, 10:21:50 AM »
Hello,

please check this topic on how long it takes to generate sitemap: https://www.xml-sitemaps.com/forum/index.php/topic,95.html

Basically, it depends on the speed of your site (since every page is fetched from your site).

You can increase the speed dramatically using "Do not parse" and "Exclude URLs" options.

Quote
thing I dont understand is the strain it is putting on the server, I mean im getting that the http process for this genrator can take upto 90% of the cpu ussag
You should use "Make a delay between requests" to slowdown crawling process (and reduce server load correspondingly)

Quote
am I right in this? also of course it is set to create more than one sitemap (each map = 50,000 pages) so i HOPE that when the current state hits 50,000 pages added to sitemap it will actualy createt he first map and then empty the dump file and begin with the next file?
All sitemap files are created only when whole process is finished. You can limit maximum number of pages to crawl in configuration to get one sitemap file created without waiting for full site crawling.

Quote
ps would have posted this in the top forum but for some reason unable to create new topic in that forum so this seemed to be the next best choice, also would have sent a private message to the admin direct but once again there seems to be no option for me to do that? I did click the admin profile but no options to send private message.
Both options are available to registered customers only, you should use login/password provided to you in automated email after purchase for this.

bbunlock

*
Re: taking ages?
« Reply #2 on: November 27, 2006, 10:26:58 AM »
well dont know why but this is the second night in a row that it has stoped at just after 4am (uk time)? so sitemap only went on for a few more hours.

also now have over 57000 pages added to sitemap but as of yet there is no sitema so it looks like it is going to crawl the whole site before making the indevidual sitemaps.

well at this rate it is going to take forever to create the sitemap? some how I imagine sites with so many pages are not realy idea for sitemaps  :'(

a couple of suggestions for the next version?

1, when it gets to 50,000 pages create the first sitemap then clear the crawl_dump.log to speed things up a little?

still a great product, full of features and easy to use so will use it on my other sites that arnt as large but dosnt look like it is going to work on this particular site becuase of the number of pages, thats not a dig at this product becuase I imagone most sitemap generators would have the same problem

regards

bbunlock

*
Re: taking ages?
« Reply #3 on: November 27, 2006, 10:32:04 AM »
thanks for the reply admin, I actualy deleted the email you sent after paypal payment and didnt reralise an account had already been created for me (oops)

I have exluded some urls and folders from the sitemap, I do have a delay of 1 second after 15 requests.

all in all a great tool and I will use it on the other sites but its just going to take to long on this particular site but thats no supprise becuase of the number of pages on this particular site.

regards