I just purchased and installed unlimited sitemap generator, and am currently running my first execution. The previous sitemap which was created by gsitecrawler from a personal computer, took about a week to finish. That sitemap had over 430,000 entries.
Currently I set the program to continue running, no timeouts or log-off stoppage. Then I clicked a link on the browser tab, and was unable to load the page again which shows how many pages have been spidered.
I am on a linux server and the top command shows about 1.40 cpu load average, which is OK. I am using ls -al to check the logs in the data directory and crawl_dump.log is now about 78Mb after about 1/2 hour, and both crawl_dump.log and crawl_state.log are continuing to increment their last modified date.
I have 2 questions:
1. If there is a problem like cpu usage goes up, how do I stop the process from a linux comand? I cannot identify the process. Will an apache restart, which reloads php, stop the program?
2. Is there any way for me to get an idea how many pages have been indexed, and thereby calculate how much remaining time I have before the sitemap is completed -- without being able to see the php page in a browser? Do the numbers in crawl_state.log contain this information?
Thanks
Mike