It's very bad the method you use for the verification if an URL exists.
You need to use a more efficient way than scanning the array each time, that is really really bad approach.
You need to change the structure of your collection to store a md5 hash of the url, and all this in the `key` part of the array, and this way you don't need to iterate over the array, you just assume it does not exists yet, and you simply assign it. In case if exists it will be only a simple reassign, and this operation takes very short compared to a array scan.
Here is a sample code to help you.
$collection['links']=array();
foreach ($pageUrls as $currentUrl) {
$currentKey=md5($currentUrl);
$collection['links'][$currentKey]=array($currentUrl,1,2,3,$referrer,5, 'or any other variable you want to set');
}
if you do a print_r($collection['links']);
you will see something like
['96E5494F6C488EEC4EDDCA6DF12AF745']=>array(
'licitatie-publica-ro/furnizare-srot-soia--92417-2.html',
'1',
'2',
'3',
'licitatie-publica-ro/agricultura-2.html',
'5',
'or any other variable you want to set'
);
Benefits:
- There won't be two keys with the same hash in the collection.
- You don't need to scan the array each time you want to add a new unique link.
- You take benefit of the array unique keys functionality doing this.
- It's the fastest way doing this.
- Your scripts will use less memory too (do not have to store the iteration copy in memory)
Please tell me if you understood the approach described here, and also please tell how urgent can you optimize the sitemap generator to use the approach described here?