Sunday, 31 May 2009

More on optimization, HTTP 304s etc. – a solution?

In my last post Optimization, BLOB caching and HTTP 304s, I did a fairly lengthy walk-through on an issue I’d experienced with SharePoint publishing sites. A few people commented, mainly saying they’d noticed the same thing, but there have been further developments and findings I wanted to share!

Quick recap

Under certain circumstances some files in SharePoint are always re-requested by the browser despite being present in the browser cache (“Temporary internet files”). Specifically this is observed for files stored in the Style Library and Master Page Gallery, for anonymous users. Although SharePoint responds with a HTTP 304 to say the cached file can indeed be used (as opposed to sending the file itself again), we effectively have an unnecessary round-trip to the server for each file – and there could be many such files when all the page’s images/CSS/JS files are considered. This extra network traffic can have a tangible impact on site performance, and this is magnified if the user is geographically far away from the server.

A solution?

Waldek and I have been tossing a few development matters around recently over e-mail, and he was curious enough to investigate this issue for himself. After reproducing it and playing around for some time, Waldek discovered that flushing the disk-based cache seems to cause a change in behaviour – or in layman’s terms, fixes everything. To be more specific, we’re assuming it’s a flush of the BLOB cache which is having the affect – in both Waldek’s test and my subsequent validation, the object cache was also flushed as well:

FlushDiskCache

After the OK button is hit on this page, the problem seems to go away completely, so now when the page is accessed now for the first time as an anonymous user, the correct ‘max-age’ header is added to the files (as per the BLOB cache declaration in web.config) – contrast the ‘max-age=86400’ header on the Style Library files with what I documented in my last post:

AnonymousCorrectHeadersAfterFlushCache

This means that on subsequent requests, the Style Library files are served directly from the browser cache with no 304 round-trip:

SecondRequestNo304s

This is great news, as it means the issue I described is essentially a non-issue, and there is therefore no performance penalty for storing files in the publishing Style Library.

So what gives?

I’m now wondering if this is just a ‘gotcha’ with BLOB caching and publishing sites. I know other people have run into the original issue due to the comments on my previous post, and interestingly enough one poster said they use reverse proxy techniques specifically to deal with this issue. Could it really be that everybody who sees this behaviour just didn’t flush the BLOB cache somewhere along the way, when it’s actually a required step? Or is the testing that Waldek and I did flawed in some way? Or indeed, was my initial investigation flawed despite the fact others reported the same issue?

I am interested to hear from you on this – if you can reproduce the problem I’ve described with a publishing site you’ve developed, does flushing the BLOB cache solve it for you as described here? Leave a comment and let us know!

Good work Waldek :-)

Sunday, 17 May 2009

Optimization, BLOB caching and HTTP 304s

There's been an interesting mini-debate going on recently in terms of where to store static assets used by your site - images, CSS, JS files and so on. Broadly the two approaches can be characterized as:

  • Developer-centric - store assets on the filesystem, perhaps in the 12 hive
  • Author-centric - store assets in the content database, perhaps in the Style Library which comes with publishing sites

Needless to say these options offer different pros and cons depending on your requirements - Servé Hermans offers a good analysis in To package or not to package: that is the question. However, I want to throw another point into the debate - performance, specifically for anonymous users. Frequently, this is an audience I care deeply about since some of the WCM sites I work on often have forecast ratios of 80% anonymous vs. 20% authenticated users. Recently I was asked to help optimize an under-performing airline site built on MOSS - as usual the problem was a combination of several things, but one of the high-impact items was this decision to store assets in one location over the other. In this post I'll explain what the effect on performance is and why you should consider this when building your site.

The problem

Once they've been loaded the first time, most of the static files a website uses should be served from the user's local browser cache ("Temporary internet files") - without this, the internet would be seriously slow. Consider how much slower a web page loads when you do a hard refresh (ctrl+F5) compared to normal - this is because all the images are forced to be re-downloaded rather than served from the browser cache. Unfortunately, for files stored in some common SharePoint libraries/galleries (i.e. the author-centric approach) SharePoint doesn't deal with this quite right in some scenarios - most of the gain is there, but despite having the image locally, the browser still makes a request for the image - the conversation goes like this (for EACH image on the page!):

Browser: I need this image please - I cached it last time I came at [date/time], but for all I know it's changed since then.
Server: No need dude, it's not changed so just use your local copy (in the form of a HTTP 304 - "Not modified")
Browser: Fair enough, cheers.

This essentially happens because the file was not served with a "cacheability" HTTP header to begin with. Needless to say, this adds significant time to the page load when you have 30+ images/CSS/JS files referenced on your page - potentially several seconds in my experience (under some circumstances), which of course is a huge deal. If say, the user is in Europe but the servers are in the U.S., then suddenly this kind of network chatter is something we need to address. Needless to say, in the majority of cases we're happy to cache these files for a period since they don't all change too often, and we get better performance as a result.

The Solution (for some SharePoint libraries *)

Mike Hodnick points us to part of the solution in his highly-recommended article Eliminating "304" status codes with SharePoint web folder resources. Essentially, SharePoint's BLOB caching feature saves the day since it serves the image with a "max-age" value on the HTTP header, meaning the browser knows it can use it's local copy of the file until this date. This only happens when BLOB caching is enabled and has the max-age attribute like this (here set to 84600 seconds = 24 hours):

<BlobCache location="C:\blobCache" path="\.(gif|jpg|png|css|js|aspx)$" maxSize="10" enabled="true" max-age="86400" />

When we configure the BLOB cache like this we are, in effect, specifying that it's OK to cache static files for a certain period, so the "cacheable" header gets added. HOWEVER, what Mike doesn't cover is that this only happens for authenticated users - files served out of common content DB locations such as the Style Library and Master Page Gallery still do not get served correctly to anonymous users. Note this isn't all SharePoint libraries though - so we need to be clear on exactly when this problem occurs.

* Scope of this problem/solution

Before drilling down any deeper, let's stop for a moment and consider the scope of what we're discussing - a site with:

  • Anonymous users
  • Files stored in some libraries - I'm not 100% sure of the pattern but discuss it later - the Style Library and Master Page Gallery are known culprits however. Other OOTB libraries such as SiteCollectionImages do not have the problem.

If you don't have this combination of circumstances, you likely don't have the problem. For those who do, we're now going to look closer at what's going on, before concluding with how we can work around the issue at the end.

Drilling deeper

For a site which does have the above combination of circumstances, we can see the issue with Fiddler - as an anonymous user browsing to page I've already visited, I see a stack of 304s meaning the browser is re-requesting all these files:

BlobCachingDisabled_304s

However, if I'm authenticated and I navigate to the same page, I only see the HTTP 200 for the actual page, no 304s:

BlobCachingEnabled_No304s

Hence we can conclude it works fine for authenticated users but not for anonymous users.

So what can we do for our poor anonymous users (who might be in the majority) if we're storing files in the problematic libraries? Well, here's where I draw a blank unfortunately. Optimizing Office SharePoint Server for WAN environments on TechNet has this to say on the matter:

Some lists don't work by default for anonymous users. If there are anonymous users accessing the site, permissions need to be manually configured for the following lists in order to have items within them cached:

  • Master Page Gallery
  • Style Library

Aha! So we need to change some permissions - fine. This seems to indicate that it is, in fact, possible to get the correct cache headers added to files served from these locations. Unfortunately, I simply cannot find what permissions need to be changed, and nobody on the internet (including the TechNet article) seems to detail what. The only logical setting is the Anonymous Access options for the list - these are all clear by default, but adding the 'View Items' permission (as shown below) does not change anything:

AnonPermissions

As a sidenote, the setting above is (I believe) effectively granting read permissions to the identity which is used for anonymous access to the associated IIS site. So in IIS 7.0, I'm fairly sure you'd achieve the same thing by doing this:

AddPermsIUsr

So the problem does not go away when anonymous users are granted the 'View Items permission, and what I find interesting about this is that a closer look with Fiddler reveals some inconsistencies. The image below shows me browsing to a page anonymously for the first time, and to save you the hassle we can derive the following findings:

  • Files served from the 'SiteCollectionImages' library are given the correct max-age header (perhaps expected, since not one of the known 'problem libraries' e.g. Style Library)
  • Files served from the '_layouts' folder are given a different max-age header (expected, settings from the IIS site are used here)
  • Some files in the Style Library are in fact given a the correct max-age header! (not expected) 

MixedHeaders_Anonymous

So the 2 questions which strike me here are:

  • Why are some files being served from 'Style Library' with the correct header when most aren't?
  • Why can SharePoint add the 'max-age' header to files in the 'SiteCollectionImages' library but not the 'Style Library'?

The first one is a mystery to me - it's perhaps not too important, but I can't work it out. The second one might be down to how the libraries are provisioned - the 'Style Library' is provisioned by declarative XML in the 'PublishingResources' Feature, whereas the 'SiteCollectionImages' library is provisioned in code using the same Feature's activation receiver. Could this be the key factor? I don't know, but I'd certainly be interested if anyone can put me straight - either on this or the mystery "permissions change" required to make BLOB caching deal with libraries such as the 'Style Library'.

Conclusion

The key takeaway here is that for sites which want to take advantage of the browser caching for static files (for performance reasons) and have anonymous users, we need to be careful where we put our images/CSS/JS files as per Mike Hodnick's general message. If we want to use the author-centric approach and store things in SharePoint libraries, we need to consider which libraries (and test) if we will have the 304 problem. Alternatively, we can choose to store these files on the filesystem (the developer-centric approach) and use a virtual directory with the appropriate cacheability settings to suit our needs. My suggestion would be to use a custom virtual directory for full control of this, since the default settings on the '_layouts' directory ("cache for 1 year") are unlikely to be appropriate.

Monday, 4 May 2009

Fix to my Language Store framework for multi-lingual sites

In my last post, I talked about a fix to my Config Store framework for an issue which manifested itself on certain SharePoint builds, with Windows 2008 and a recent cumulative update seeming to be the trigger. Some of you may know that I produced a sister project to this one called the 'Language Store', which is designed to help build multi-lingual SharePoint sites - since this framework is built off the same underlying XML and plumbing, this solution was also affected.  So this post is just a short one to say that the fix has now been applied to the Language Store framework, and the new version is now available on Codeplex at http://splanguagestore.codeplex.com.

Problem/solution

The problem was effectively that items in the SharePoint list could no longer be edited - well, in fact they could be updated using code, but the list form .aspx pages were not showing the fields correctly so items couldn't be edited in the UI. Since it kind of defeats the point of SharePoint to have to write code to update list items (!), this was a big issue on affected builds. Interestingly some users reported working around the issue by removing/re-adding the content type from the list in the browser, but happily this is no longer necessary since the root issue has now been resolved. The problem was traced to some incorrect XML in my FieldRef elements - see the last post Fix to my Config Store framework and list provisioning tips for the full info.

General recap - the Language Store

If you're still reading, I figure some folks would welcome a reminder/intro on what the Language Store actually does - it's not about replacing SharePoint's variations functionality which is commonly used on multi-lingual sites. I noticed Spence gave it a better name in an e-mail recently where he described it as a 'term store' for multi-lingual sites - this actually captures what it does far better than my name for it. Effectively the idea is to provide a framework for the many small strings of text which are not part of authored page content which need to be translated and displayed in the appropriate language. As an example, here is a page from the BBC site where I've highlighted all the strings which may need to be translated but which don't belong to a particular page:

BBCExample

There are many of these in a typical multi-lingual site, and to help deal with this requirement the Language Store framework provides the following:

  • SharePoint list/content type/site columns etc.
  • API to retrieve items with a single line of code
  • Granular caching for high-performance
  • Packaged as a .wsp for simple deployment
  • All source code/XML freely available

If you want to find out more, see Building multi-lingual SharePoint sites - introducing the Language Store. The solution can be downloaded from the Codeplex site at http://splanguagestore.codeplex.com.

Apologies to existing users who were affected by the issue.