Multiple "useless" pages being created by the blog module? Please help.

This is an open forum for any mojoPortal topics that don't fall into the other categories.

This thread is closed to new posts. You must sign in to post in the forums.
2/28/2013 7:07:49 AM
Gravatar
Total Posts 7

Multiple "useless" pages being created by the blog module? Please help.

Hello,

I've come across an odd issue that I just can't figure out and need some help! We're using the mojoPortal blog module on www.patioenclosures.com and we've noticed the blog is creating thousands of "useless" pages over time. Here are some examples:

http://www.patioenclosures.com/Blog/ViewList.aspx?pageid=7&mid=89&pagenumber=12737

http://www.patioenclosures.com/Blog/ViewList.aspx?pageid=7&mid=89&pagenumber=9734

http://www.patioenclosures.com/Blog/ViewList.aspx?pageid=7&mid=89&pagenumber=12748

http://www.patioenclosures.com/Blog/ViewList.aspx?pageid=7&mid=89&pagenumber=15591

http://www.patioenclosures.com/Blog/ViewList.aspx?pageid=7&mid=89&pagenumber=6598

Google shows over 12,000 pages indexed for www.patioenclosures.com, which isn't even close to correct. The site should have only a couple hundred pages indexed in Google. This is "littering" Google's results and could result in a penalty. Is there any chance you could point me in the right direction to make sure the above examples do NOT get created on their own? 

Also, FYI, we're using the same blog module on www.greatdayimprovements.com and haven't noticed this issue occurring over there yet.

Thanks again for your help!

Dave

2/28/2013 7:44:57 AM
Gravatar
Total Posts 18439

Re: Multiple "useless" pages being created by the blog module? Please help.

Hi Dave,

That page is the list view of the blog. It is not included in the blog sitemap for google and since there are really only 4 pages worth of items and no links occur on your site for pages beyond number 4 then google must be following links from an external site. Maybe a competitor is trying to do some seo damage with bad links to the site. I would look in google webmaster tools to find out what or where it is getting those links from.

I can say that that is a rather old version of mojoPortal (2.3.6.7), in newer versions on the list page we have NOINDEX,FOLLOW in the meta so that google won't index that page but only follow links to the post detail pages that should be indexed.

Hope that helps,

Joe

2/28/2013 7:59:18 AM
Gravatar
Total Posts 7

Re: Multiple "useless" pages being created by the blog module? Please help.

Hey Joe,

I apologize, I'm not sure I'm following you. You don't need to link to the page externally or internally for Google to crawl the page. For example, if you paste http://www.patioenclosures.com/Blog/ViewList.aspx?pageid=7&mid=89&pagenumber=9734 in Google you'll see it is in fact indexed. There are no bad backlinks to the site, well at least no more than usual crap that most sites get, but especially none going to these "useless" pages.

Google doesn't always respect the robot.txt and noindex,follow. We're looking to take these pages down completely and make sure they are not created on their own. Is this possible without upgrading to the newer version?

Thanks again for all of your help!

Dave

 

2/28/2013 8:58:46 AM
Gravatar
Total Posts 18439

Re: Multiple "useless" pages being created by the blog module? Please help.

 "You don't need to link to the page externally or internally for Google to crawl the page."

Wrong if google has indexed it google crawled it either from a site map submitted or from an internal or external link, they have no other way of finding pages.

"Google doesn't always respect the robot.txt and noindex,follow"

Also wrong

Upgrading would be advisable to get the noindex/follow meta.  Google will respect it as will al legit search indexes.

You cannot take the page down without breaking the site since it is a dynamic page used for the post list paging and pages 1 - 4 really are legit, the others are just url parameter manipulation of the pagenumber parameter beyond the actual number of pages in the list. That parameter manipulation has to come from a link somewhere google does not make up parameter values and crawl them.

Hope that helps,

Joe

2/28/2013 9:30:11 AM
Gravatar
Total Posts 18439

Re: Multiple "useless" pages being created by the blog module? Please help.

Actually without upgrading you could edit the file /Blog/ViewList.aspx and add the meta in the head section as <meta name='robots' content='NOINDEX,FOLLOW' />

However there have been lots of improvements since that version that make it very worthwhile to upgrade.

2/28/2013 9:39:40 AM
Gravatar
Total Posts 7

Re: Multiple "useless" pages being created by the blog module? Please help.

Ok, so I popped this URL:

http://www.patioenclosures.com/Blog/ViewList.aspx?pageid=7&mid=89&pagenumber=12737

into Google's internal link tool in Webmaster Tools and came up with this:

http://www.mediafire.com/view/?yyde3bzhc1rkqqo

Which shows no internal links are pointing to that page. I did the same thing for the other example pages I mentioned in my original post and got the same results. There are NO internal links pointing to that page. Looked at Open Site Explorer and the Webmaster Tools' "Links to Your Site" tool and found NO external links pointing to these pages either. So, sorry to say it sir, but you are wrong. Under your logic, how would one-page sites get indexed ?

In terms of search engines not respecting the robot.txt and noindex,follow - you are also wrong. I don't really feel like getting examples (and I have several) so I'll just point to Google's help page on the topic where they admit this fact:

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449

Bing is particularly bad at respecting Meta information and search protocols.

I do appreciate all of your help! I will try adding the noindex,follow to the head section of /Blog/ViewList.aspx. I'd love to upgrade, but that decision is just out of my hands. Thanks again!

 

 

2/28/2013 10:10:03 AM
Gravatar
Total Posts 18439

Re: Multiple "useless" pages being created by the blog module? Please help.

Your interpretation of the article http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449 seems quite different than how I interpret it.

"While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results."

So they "might" index it in spite of robots.txt if there are external links to the page, but on the same page:

"To entirely prevent a page's contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag or x-robots-tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index. The x-robots-tag HTTP header is particularly useful if you wish to limit indexing of non-HTML files like graphics or other kinds of documents."

Pasting full urls to pages into google search instead of actual search terms may also cause google to find and index an url just as external links would. So its not a good idea to do that for urls such as the ones you've posted and it still could have been done by the seo people working for the competition trying to damage your seo as a tactic to favor their own. 

But again I say google will respect the noindex meta as it says in their documentation. I also think it is unlikely that they will index a page that has no meaningful content when it is listed in robots.txt. In this case the url is not in robots.txt so you have not proved me wrong on that. If the page has some meaningful content other than links and was linked from an external page then google might index it according to the article if it does not have the noindex meta.

2/28/2013 10:12:07 AM
Gravatar
Total Posts 18439

Re: Multiple "useless" pages being created by the blog module? Please help.

also sounds like that mediafire url might have had links to the urls in question even if it does not have them now.

2/28/2013 10:47:13 AM
Gravatar
Total Posts 7

Re: Multiple "useless" pages being created by the blog module? Please help.

Hey Joe,

how do we get to this page?

www.patioenclosures.com/Blog/ViewList.aspx

I get access denied. Where is the page to edit?

Thanks,
Dave

2/28/2013 10:52:03 AM
Gravatar
Total Posts 18439

Re: Multiple "useless" pages being created by the blog module? Please help.

You would have to edit the file on disk using a text editor, it cannot be edited from the web browser.

ie you could download the file by ftp edit it with a text editor such as notepad then upload it again.

You must sign in to post in the forums. This thread is closed to new posts.