Drupal Does Not Respect https:// When Caching

One of the problems I recently discovered with the Drupal cache is that it doesn’t properly handle https transactions when caching happens. Let’s take a look at understanding the problem, an interim solution for the database cache, and finding a long term solution.

The Problem

page-cache-https-issue.png

The page cache properly respects https because it uses a full url including the protocol when generating a cache id. But, Drupal uses multiple layers of caching with the html in a page. For example, the content of blocks can be cached. What if the html for a block has absolute URLs pointing back to the site and those are generated on the http version of the site. Then this is cached. Then this cache is used to create the https version of the page. Then those cached absolute URLs are not using https.

Media Module, Where I Found The Problem

The example that caused me to understand this problem was the media module. With the media module you can embed images and video into a text area (like the body of an article). Media tags are converted to html by the filter system and the results are cached. Images, for example, use absolute URLs (this is part of core and is a good thing for some use cases). If the cache for this text was generated on the http version of the site the path to the image on the https version of the page will use http as the protocol.

This opens up all kids of possible issues. Just imagine someone in a coffee shop thinking they are on https only to have cookies sent back to the server for that image in plain text. Or maybe you have a better imagination than me and can think of something more sinister.

Nested Caching

The issues is nested caching of html and how many of the nested caches don't respect the protocol like the page cache does. This can apply to any html that's cached across any modules.

Fixing Database Caching In The Short Term

To work around this problem I've started a database caching layer that works around this problem by respecting https for most caches. I don't really like this solution and there are some definite hacks in how it works. The real solution should be to have the code saving and retrieving from the html caches be smart enough to include the protocol in the caches. Unfortunately, a cache backend can't fix the places the cache is used.

A Long Term Solution

An issue for a long term solution is already open. If you have any experience in this area, it impacts your sites, or you want to help out please head over there so we can fix this moving forward.