Cache-busting CSS and Javascript with Python

If you’ve ever deployed a new version of your website only to have users report “nothing changed” or “the layout is broken,” you’ve probably run into the classic browser caching problem. Even with all the right HTTP headers, browsers can be stubborn about holding onto old CSS and JS files. This can lead to frustrating support issues, especially when you’re troubleshooting what customers are actually seeing.

There are plenty of server-side tricks for cache-busting, but sometimes you don’t have access to the server configuration, or you want a solution that’s portable and code-driven. Here’s a Python-based approach I’ve used in production to ensure users always get the latest assets.

The Problem: Stale Assets After Deployment

Browsers cache static assets aggressively. That’s usually a good thing, but it becomes a headache when you deploy updates to your CSS or JS and users’ browsers stay latched on to the previous versions. Even with cache-control headers, you can’t always guarantee a fresh load, especially if proxies or CDNs are involved.

The Solution: Hash-based Cache Busting

The idea is simple: whenever you deploy, generate a unique filename for each asset based on its content. For example, main.css becomes main-<md5hash>.css. When the file changes, so does the hash, guaranteeing a unique hash for each version. The new filename forces the browser to fetch the new file. All you need is a way to generate these filenames and reference them in your templates.

1. Building the Asset Manifest

The first step is to process your asset directories at deploy time, generate hashed filenames, and build a manifest mapping original filenames to their cache-busted versions. Here’s a script that does exactly that:

import os
import hashlib
import shutil
import json
import glob

def md5_file(filepath):
    with open(filepath, "rb") as f:
        return hashlib.md5(f.read()).hexdigest()

def process_asset_dir(base_dir, asset_type, mapping):
    """Recursively process a directory, copying files to a cache/ subdir with MD5 names."""
    cache_dir = os.path.join(base_dir, "cache")
    os.makedirs(cache_dir, exist_ok=True)

    for entry in os.scandir(base_dir):
        # Skip dotfiles, .scss, and the cache dir itself
        if entry.name.startswith(".") or entry.name.endswith(".scss") or entry.name == "cache":
            continue

        if entry.is_file() and entry.name.endswith(f".{asset_type}"):
            stem = entry.name[: -(len(asset_type) + 1)]  # filename without extension
            hash_str = md5_file(entry.path)
            cache_filename = f"{stem}-{hash_str}.{asset_type}"
            cache_path = os.path.join(cache_dir, cache_filename)

            shutil.copy2(entry.path, cache_path)

            # Map relative source path → relative cache path
            rel_source = os.path.relpath(entry.path, base_dir)
            rel_cache  = os.path.join("cache", cache_filename)
            mapping[rel_source] = rel_cache

        elif entry.is_dir():
            process_asset_dir(entry.path, asset_type, mapping)

def remove_stale_cache_files(base_dir, asset_type, current_mapping):
    """Delete any cache files that are no longer referenced in the mapping."""
    active = set(os.path.basename(v) for v in current_mapping.values())
    cache_dir = os.path.join(base_dir, "cache")
    if not os.path.isdir(cache_dir):
        return
    for filepath in glob.glob(os.path.join(cache_dir, f"*.{asset_type}")):
        if os.path.basename(filepath) not in active:
            os.remove(filepath)

def build_asset_manifest(www_dir, output_path):
    mapping = {}

    for asset_type in ("js", "css"):
        asset_dir = os.path.join(www_dir, asset_type)
        if os.path.isdir(asset_dir):
            process_asset_dir(asset_dir, asset_type, mapping)

    # Write the manifest
    with open(output_path, "w") as f:
        json.dump(mapping, f, indent=2)
    os.chmod(output_path, 0o775)

    # Remove stale files only after the new manifest is written
    for asset_type in ("js", "css"):
        asset_dir = os.path.join(www_dir, asset_type)
        if os.path.isdir(asset_dir):
            remove_stale_cache_files(asset_dir, asset_type, mapping)

    return mapping

The code above runs during deployment and can be implemented into your CI/CD pipeline or triggered manually. To actually use it, call the build_asset_manifest function. This will walk through your js/ and css/ directories, copy each file to a cache/ subdirectory with a hash in the filename, and write a manifest JSON file mapping original filenames to their cache-busted versions.

# --- Usage Example ---
build_asset_manifest(
    www_dir="/var/www/html",
    output_path="/var/www/templates/inc/asset_manifest.json"
)

A note on stale files:

If you stop at only generating new hashed filenames on every deploy, you’ll quickly accumulate a lot of old, unused files in your cache directory. This can clutter your server and waste disk space. That’s why the script includes logic to remove stale cache files. After writing the new manifest, it calls the remove_stale_cache_files function, which scans the cache directory and deletes any files that are no longer referenced. This keeps your asset cache clean and ensures only the current versions remain on disk.

2. Updating the Template Layer

The following function can live in a utility module that’s available to your template engine (Jinja2, Django, Flask, etc.). You only need to call asset_url('main.css') or asset_url('app.js') in your templates, and it will automatically resolve to the correct hashed filename.

import json
import os

_manifest = None

def asset_url(relative_path, manifest_path="/var/www/templates/inc/asset_manifest.json"):
    global _manifest
    if _manifest is None:
        with open(manifest_path) as f:
            _manifest = json.load(f)

    hashed_path = _manifest.get(relative_path, relative_path)
    return "/" + hashed_path.replace(os.sep, "/")

On first call, this code loads the manifest JSON into memory. For each asset, it looks up the cache-busted filename. If the asset isn’t found in the manifest (e.g., during development), it falls back to the original filename.

Why Not Use Query Strings?

Some readers may point out that you can achieve cache busting by simply appending a query string to your asset URLs, like main.css?v=1.2.3. While this works in many cases, it’s not foolproof — especially if you’re using a CDN or certain proxy configurations. Many CDNs and intermediate caches will ignore query strings for static assets, or may not treat them as unique resources. By changing the actual filename, you guarantee that every layer (browser, CDN, proxy) sees it as a new file and fetches it fresh. This approach is more robust and production-safe, especially for larger or distributed deployments.

What About Race Conditions?

If you are deploying to a load balanced environment you may encounter issues with active requests trying to access a hashed filename after it’s been deleted. This happens when the manifest on one server hasn’t updated yet, resulting in a static asset request being sent to another server. If you find yourself in this situation, you have three options:

Build the hashed files and manifest before you deploy your codebase to production. This can be done during your build process or inside your CI/CD pipeline.
Modify the build script in this article to maintain the previous version of the manifest and hashed files.
Do nothing and allow the asset_url function to gracefully fall back to the unhashed filename.

Conclusion

This approach is robust, portable, and doesn’t require any server configuration changes. It’s saved me countless hours of support headaches and ensures that users always get the latest assets after a deployment.