Stopped crawling , and restart in the same point
« on: March 06, 2017, 11:29:12 AM »
Hi,

Im trying to setup/create the sitemap, It is a big forum.
But dont know if im doing well. It takes too much time.

Its possible you create/setup my web sitemap?. Do u have any support service for that ( not for installation your soft , just for create the sitemap and get it running).
Thanks.
Re: Stopped crawling , and restart in the same point
« Reply #1 on: March 06, 2017, 11:55:46 AM »
f.e.

i get this situation:

Links depth: 4
Current page: paisajes/44/anochecer-en-la-torre-de-abraham/1974/
Pages added to sitemap: 27962
Pages scanned: 55681 (3,764,044.9 KB)
Pages left: 14676 (+ 69731 queued for the next depth level)
Time passed: 8:59:38
Time left: 2:22:14
Memory usage: 119,168.3 Kb
Resuming the last session (last updated: 2017-03-03 21:33:44)


If i let it crawling in the background, and refresh the page , some minutes after i get this:


Links depth: 4
Current page: paisajes/44/sin-titulo/2153/
Pages added to sitemap: 27961
Pages scanned: 55680 (3,763,926.5 KB)
Pages left: 14677 (+ 69726 queued for the next depth level)
Time passed: 8:59:38
Time left: 2:22:14
Memory usage: 119,256.2 Kb
Resuming the last session (last updated: 2017-03-03 21:33:44)


And the values changes but it doesnt go down with the page left value.
And at this point the sitemap.xml file is empty too.

Re: Stopped crawling , and restart in the same point
« Reply #2 on: March 06, 2017, 01:00:45 PM »
half an hour after:

Links depth: 4
Current page: urbana-y-arquitectura/40/movido-en-venecia/49381/
Pages added to sitemap: 27951
Pages scanned: 55669 (3,762,864.9 KB)
Pages left: 14688 (+ 69678 queued for the next depth level)
Time passed: 8:59:37
Time left: 2:22:22
Memory usage: 118,959.9 Kb
Resuming the last session (last updated: 2017-03-03 21:33:44)
Re: Stopped crawling , and restart in the same point
« Reply #3 on: March 06, 2017, 04:42:09 PM »
Hello,

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Sitemap files are created only after crawling is completed.
We only have installation service available.
Re: Stopped crawling , and restart in the same point
« Reply #4 on: March 06, 2017, 04:44:47 PM »
Thanks.
and why the page left value is bigger when the crawling is going on? ?¿
Re: Stopped crawling , and restart in the same point
« Reply #5 on: March 07, 2017, 11:45:42 AM »
Could you help me?

It´s not crawling?. Could i give you acces to check if the configuration is ok?
Re: Stopped crawling , and restart in the same point
« Reply #6 on: March 07, 2017, 01:49:27 PM »
Hello,

>and why the page left value is bigger when the crawling is going on?

It finds more and more pages on the website while crawling it.

In this case I would recommend to run generator in command line if you have ssh access to your server.
Re: Stopped crawling , and restart in the same point
« Reply #7 on: March 07, 2017, 03:37:56 PM »
We dont have SSH to the server.

If the crawler find more and more URLS, why the sitemap file has always the same size?

Memory usage: 119,256.2 Kb

We´ll keep crawling, but every time we go the the Craw page , shows the same time for the last crawling session.
Re: Stopped crawling , and restart in the same point
« Reply #9 on: March 13, 2017, 05:49:02 PM »
Sorry to say this, but this Sitemap Generator  does not work with websites with more than 50,000 urls.
Wasted money.
« Last Edit: March 13, 2017, 05:52:15 PM by motu_828 »
Re: Stopped crawling , and restart in the same point
« Reply #10 on: March 13, 2017, 08:54:59 PM »
Hello,

many of our customers have sitemaps larger than 50,000 URLs. It requires server resources depending on the website size though.
Re: Stopped crawling , and restart in the same point
« Reply #11 on: March 13, 2017, 10:46:16 PM »
Thats why the crawler stop exactly in the same value?..
Dont think my server has any problem.
Re: Stopped crawling , and restart in the same point
« Reply #12 on: March 14, 2017, 05:35:21 AM »
This happens due to server limits. You need to check your server's error log for details.
Re: Stopped crawling , and restart in the same point
« Reply #13 on: March 14, 2017, 11:32:45 PM »
Hello motu_828,

I had over 70,000 pages on my site and it runs great after I configured properly. I now have over 44,000 and it is still running great. I run mine through my cron, crawling about 750 pages with a multi-second gap. My total crawl takes 12+ hours, and I run it every other week. Before I tweaked my settings, I did have issues getting the crawl to finish. But that was years ago and it has ran properly (on the server) ever since. It just ran today with the following results:

Created on: 14 March 2017, 16:39
Processing time: 12:18:50s
Pages indexed: 44266
Images sitemap: 32544 images
Video sitemap: 11 videos
News sitemap: 212 pages
RSS feed: 271 pages
Mobile sitemap: 44266 pages
Pages processed: 44781
Pages fetched: 44778
Sitemap files: 3
Crawled pages size: 3,178.731Mb
Network transfer time: 11:13:20s
Top memory usage: 0.00Mb

Depending on your configuration panel (like CPanel), you might be able to set up a crawl to run once a week without shell / ssh access. That is how I set mine up initially since crawling through the web seemed a little problematic.

You need to tweak the settings to work best on your server. Depending on your settings, this script may place a high load on your server. What memory can you allocate, how much load can be placed onto the server at any given time, how do you want to save sessions, what do you want to log, etc.  You mentioned it stopping at the same spot, well it is most likely hitting a memory limit at the same amount of pages each time. Try generating less.. like just the main XML file. Try to also not save things like Referring URLs since that adds time and memory. If your server is running out of resources it is because of the size of your site, your settings, and your server... NOT the software.
« Last Edit: March 14, 2017, 11:41:52 PM by jeffconk »
Re: Stopped crawling , and restart in the same point
« Reply #14 on: May 01, 2017, 04:27:09 PM »
I have the same problem but I can see it is working in the background. Just sleep until next day and it will be crawled.

I can see it in crawl page.

Updated on 2017-05-01 15:17:35 (377 seconds ago) , Time elapsed: 0:28:57,
Pages crawled: 4400 (3693 added in sitemap), Queued: 15158, Depth level: 3
Current page: newsongs/312?device=pc (0.6)