Site map not showing all visible pages - how find them?

If you have questions about using mojoPortal, you can post them here.

You may want to first review our site administration documentation to see if your question is answered there.

This thread is closed to new posts. You must sign in to post in the forums.
8/27/2014 4:15:37 AM
Gravatar
Total Posts 537
feet planted firmly on the ground

Site map not showing all visible pages - how find them?

In a site with a large number of pages, we have some with incorrect security - visible to all users when they should not be. I was hoping to be able to track these down by looking at the site map as an anonymous user. However, the site map doesn't show pages that are located beneath a page that is not visible, even when the page itself is visible.

I can see why, as it's a hierarchical display, though it would be nice if they could be displayed somehow (perhaps below un-clickable nodes?). Is there any other way of finding all these visible pages within the web UI, other than checking the properties of every page, or do I need to get into the database?

I thought I might find them using the sitemap handler /sitemap.ashx, but that includes pages that are not visible to unauthenticated users (aside: it is supposed to? Google/Bing cannot read those pages, by definition).

thanks

8/27/2014 6:32:04 AM
Gravatar
Total Posts 18439

Re: Site map not showing all visible pages - how find them?

sitemap.ashx is filtered by role like all the menus, since googlebot is not in any roles it sees only public pages. 

our add on product Page Manager Pro makes it a little easier because when you click on a page node in the tree it shows whether the page is public or protected at a glance without going into the page settings. you can try that on our demo site to see if it would be of much help. 

if you use the database you would be looking for rows in mp_Pages where AuthorizedRoles contains "All Users;"

8/27/2014 2:27:26 PM
Gravatar
Total Posts 128

Re: Site map not showing all visible pages - how find them?

Thanks, I'll try the page manager and look in the database if needed.

But I think there may be a bug with the sitemap handler, as private pages are definitely showing. If you look near the bottom of the response from https://www.esdm.co.uk/sitemap.ashx you will see pages like https://www.esdm.co.uk/how-to-annual-mats-notifications and you will find they cannot be viewed by anonymous users. This is not client-side caching as I'm issuing the request in Fiddler. I don't think it can be server-side caching, as this page has been private for months. Bug?

Secondly, I'm puzzled about the lastmod values returned by the handler, or rather the Last Modified dates in page properties. How are these calculated? I've just added an HTML feature to a page, and edited its content, and this value has not changed. Reproduced on the demo site. Bug?

 

8/27/2014 4:07:42 PM
Gravatar
Total Posts 18439

Re: Site map not showing all visible pages - how find them?

Can you re-create the sitemap problem on the demo site?

I just created a page there and set the view roles to only Content Authors, then I requested the sitemap.ashx from a different browser where I was not logged in and the protected page was not listed.

I acknowledge the page last mod is not updated consistently, it is updated when page settings are saved. I will log a bug report about that.

8/27/2014 4:22:54 PM
Gravatar
Total Posts 18439

Re: Site map not showing all visible pages - how find them?

actually looking at the code when you edit an html content feature instance it does call

CurrentPage.UpdateLastModifiedTime();

however that will not clear the server sitemap cache but will update it when the cache is cleared or expired.  then it still could be cached in the web browser for a little while as well

possibly we should clear the sitemap cache when we update the timestamp, but will have to ponder that because it impacts other things like the menu and user sitemap, and clearing the cache frequently might hurt performance. and google calls on its own timeframe too so just because we make the sitemap.ashx timestamp change in real time doesn't mean it won't be hours before google sees it.

8/28/2014 3:27:00 AM
Gravatar
Total Posts 537
feet planted firmly on the ground

Re: Site map not showing all visible pages - how find them?

Hi Joe

I have managed to reproduce the sitemap.ashx problem on the demo site. If you look at the https://demo.mojoportal.com/sitemap.ashx output as an anonymous user now you will see:
<url>
<loc>http://demo.mojoportal.com/new-page-2</loc>
<lastmod>2014-08-28T08:06:31Z</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://demo.mojoportal.com/new-page-3</loc>
<lastmod>2014-08-28T08:07:16Z</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
however the first page cannot be viewed by anonymous users. The second page can.

(assuming someone else doesn't muck around with these pages before you get to them).

I don't think this is a fully cached output, as new-page-3 appeared in the output as soon as I added the page and refreshed the sitemap output.

The way I achieved this was to change the view permissions of new-page-2 from being visible to all to being visible only to admins. However that may be a red herring - I now find that on creating a new child page of /new-page-1, the new page (/new-page-4) appears in the sitemap immediately even though it is not visible to anonymous users (as it inherited the permissions of /new-page-1). Yet /new-page-1 is not in the sitemap.  This may be the important point - the pages where this is going wrong are child pages of a page that is not visible to anonymous users, and not in the sitemap.

I hope this gives enough for you to track it down.

 

On the other issue, I cannot persuade the last mod date visible in page settings to change on the demo site when I make edits to the page (e.g. add html feature) or to html content on the page - it only seems to change when I edit the page properties themselves.

 

8/28/2014 11:22:11 AM
Gravatar
Total Posts 18439

Re: Site map not showing all visible pages - how find them?

thanks! I was able to replicate the sitemap.ashx problem on my local machine and fix it so it will be fixed in the next release. In the meantime a workaround is to uncheck the "Include in Search Engine Site Map"  under the SEO tab in page settings.

I got to the bottom of the page modified issue as well.

basically page settings has 2 properties:

LastModifiedUtc //this is the last time a content feature on a page updated the time stamp
LastModUtc // last time page settings saved

what is shown in the page settings page is the latter, what is shown in the sitemap.ashx is the former

8/30/2014 1:29:54 PM
Gravatar
Total Posts 128

Re: Site map not showing all visible pages - how find them?

Glad we bottomed that one out, and I've pulled the fix.

Regarding the two dates for page last edited, if it's a simple thing to do I'd certainly find it useful to see both in Page Settings, perhaps labelled as...

Settings last modified:

Content last modified:

(and the latter would be particularly useful if it reflected both editing within a content feature and also adding/removing content features from the page).

You must sign in to post in the forums. This thread is closed to new posts.