I am trying to use the robot.txt file to exclude certain pages from being indexed by the googlebot.
At the moment I am getting a number of links like this:
/option,com_netinvoice/action,orders/task,order/cid,1/Itemid,170.html
and like this:
/index.php?option=com_content&task=view&id=17&Itemid=1
As there are so many of these in google's index of me
, I am using the disallow command in this format:
Disallow: /*.html
Disallow: /*.php
Disallow: /*itemid
Disallow: /*Itemid
And then the Allow command to allow the 15 or so links that are important.
It works in that the links I want are in my sitemap
, but the ones I don't want are still there
. How come my Disallow: /*.html didn't stop this: /option,com_netinvoice/action,orders/task,order/cid,1/Itemid,170
.html Or Disallow: /*Itemid and Disallow: /*.php didn't stop /index
.php?option=com_content&task=view&id=17&
Itemid=1
Even though these links that I don't want are in my sitemap, will they be disallowed by the googlebot? And will this idea of disallowing everything with the Disallow: /*.html command and allowing my links through using the Allow command cause me problems in some way?
Any thoughts would be really great