Lecture 1: Introduction & Course Overview

Open visualization in new tab

This lesson orients students with the role of caching systems in modern IT architectures, introduces hardware requirements and software configurations, and provides an overview of the open-source caching solutions landscape, framing the scope of the course.

Learning Objectives

Prerequisites


Section 1: Caching Systems Basics

The Fundamental "Why": A World Without Caching

Imagine a massive, global library where every book is stored in a single, central vault. Every time you want to read a book, no matter how popular, you must send a request to this central vault. The librarian retrieves it, sends it to you, and you send it back when you're done. Now, imagine millions of people are all trying to get the most popular books at the same time. The librarian would be overwhelmed, requests would queue up, and the time it takes to get even a single page would grow from seconds to minutes, or even hours. This is the internet without caching.

In this analogy, the central vault is your primary database or "origin server"—the ultimate source of truth for your data. The user is the client application, and the trip to the vault represents the network latency and processing time required to fulfill a request. Caching is the simple, yet profound, idea of placing smaller, local libraries (or even just a "most popular books" shelf at the front desk) closer to the readers. This local copy, or "cache," can serve requests for popular items almost instantly, drastically reducing the load on the central vault and providing a much faster experience for the reader.

At its core, caching is a performance optimization technique that stores a copy of data in a temporary, high-speed storage location (the cache) to serve future requests for that same data more quickly. The primary goal is to reduce latency—the time delay between a request and a response. In modern computing, latency is the enemy of user experience and system scalability. It arises from multiple sources:

Caching tackles all three. By storing a pre-computed result in a fast, nearby location (often in RAM), it can eliminate the need for repeated, expensive operations. This not only speeds up response times for the end-user but also significantly reduces the load on backend systems like databases and application servers, allowing them to serve more users with the same hardware resources.

Core Concepts: Strategies and Policies

Implementing a cache is more complex than just creating a copy of data. The real challenge lies in managing what data to store, how to keep it reasonably up-to-date, and what to do when the cache runs out of space. These challenges are addressed through various caching strategies and eviction policies.

Caching Strategies

A caching strategy defines the interaction between your application, the cache, and the primary data store (database). The choice of strategy has significant implications for performance, data consistency, and code complexity.

  1. Cache-Aside (Lazy Loading): This is the most common caching strategy. The application logic is responsible for managing the cache.
    • Flow: The application first attempts to read data from the cache.
      • Cache Hit: If the data is found, it is returned directly to the application. This is the fast path.
      • Cache Miss: If the data is not in the cache, the application reads the data from the database (the origin), stores a copy in the cache for next time, and then returns it.
    • Pros: Resilient to cache failures (the application can still function, albeit more slowly, by going to the database). The cache only stores data that is actually requested, preventing it from being filled with unused data.
    • Cons: The first request for any piece of data will always be a "cache miss," resulting in higher latency for that initial request (a "cold start"). The application code is more complex as it contains logic for both cache and database interactions. Data in the cache can become stale if it's updated in the database directly without invalidating the cache.
  2. Read-Through: In this strategy, the cache itself is responsible for fetching data from the database. The application treats the cache as its primary data source.
    • Flow: The application requests data from the cache.
      • Cache Hit: The cache returns the data.
      • Cache Miss: The cache automatically fetches the data from the underlying database, stores it, and then returns it to the application.
    • Pros: Simplifies application code, as the logic for handling cache misses is abstracted away into the cache provider.
    • Cons: Requires a cache provider that supports this pattern. The initial latency penalty for a "cache miss" still exists.
  3. Write-Through: This strategy focuses on keeping the cache and database consistent during write operations.
    • Flow: The application writes data to the cache, and the cache synchronously writes that data to the database before returning success to the application.
    • Pros: High data consistency. Data in the cache and database is never out of sync. Reads are fast (as they come from the cache), and data is durable (as it's in the database).
    • Cons: Higher write latency, as every write operation has to complete in two places (cache and database). This introduces a potential single point of failure if the cache goes down.
  4. Write-Back (Write-Behind): This strategy prioritizes write performance.
    • Flow: The application writes data only to the cache, which immediately confirms the write. The cache then asynchronously writes the data to the database in the background, often in batches.
    • Pros: Extremely low write latency, as the application doesn't have to wait for the database write. Can absorb high-velocity write bursts by batching updates to the database.
    • Cons: Risk of data loss. If the cache fails before the data is written to the database, those writes are lost forever. This makes it unsuitable for critical data like financial transactions but excellent for things like updating a user's "last seen" timestamp.

Cache Eviction Policies

A cache has a finite size. When it becomes full, a decision must be made about which existing item to discard (evict) to make room for a new one. This decision is governed by an eviction policy.

Common Caching Topologies

Where you place your cache matters. Caching can be implemented at various layers of a typical web application stack.

In this course, we will focus primarily on dedicated, open-source server-side caching systems that you configure and manage, which provide the most flexibility and power for optimizing application performance.

Example: Cache-Aside Pattern in Pseudo-code

This illustrates the logic an application would use to fetch user data with a cache-aside strategy.


function getUser(userId):
    // 1. Try to get the user from the cache
    user = cache.get("user:" + userId)

    // 2. Check if it was a cache miss
    if user is null:
        // 3. If so, get the user from the database
        user = database.query("SELECT * FROM users WHERE id = ?", userId)
        
        // 4. Store the result in the cache for next time
        // Set a TTL of 1 hour (3600 seconds)
        cache.set("user:" + userId, user, ttl=3600)
    
    // 5. Return the user (from cache or database)
    return user

Did You Know?

The term "cache" comes from the French word cacher, meaning "to hide." Its first use in a computing context is often credited to a 1967 paper for the IBM System/360 Model 85. The paper described a "slave memory," which was a small, fast memory buffer intended to "hide" the latency of the much slower main memory, effectively acting as a cache between the CPU and RAM. The concept has since been applied at every level of computer architecture.

Section 1 Summary

Reflective Questions

  1. When would a Write-Back caching strategy be a risky choice for an e-commerce application's shopping cart? What about for tracking user clicks on a webpage?
  2. Imagine a news website. What kind of TTL would you set for the homepage content versus an article from 5 years ago? Why?
  3. How might a Content Delivery Network (CDN) and a server-side application cache work together to serve a user's request for a dynamic, personalized webpage?

Section 2: Hardware Overview for Caching Servers

Introduction: The Physical Foundation of Speed

While caching logic and software are critical, they are only as effective as the hardware they run on. A caching server is fundamentally a specialized piece of infrastructure designed for one primary purpose: extremely fast data access. Unlike a general-purpose application server or a storage-heavy database server, the hardware configuration of a caching server is uniquely skewed towards memory performance. Every component, from the CPU to the network card, must be selected and configured to support this goal. A misconfiguration or bottleneck in any one area can undermine the entire system's performance, turning a potential speed-of-light solution into a frustrating chokepoint.

This section delves into the four pillars of caching server hardware: CPU, Memory (RAM), Storage, and Networking. We will explore the specific role each component plays, the key characteristics to look for, and how the requirements change based on the chosen caching software and workload. As Jainandunsing (2025) notes, different caching solutions have vastly different hardware appetites; a lightweight system like Redis can run on minimal hardware, while a heavyweight data grid like Apache Ignite demands significantly more resources. Understanding these nuances is key to building a cost-effective and high-performing caching tier.

The Central Processing Unit (CPU): The Cache's Brain

The CPU in a caching server is responsible for executing the caching software's logic. This includes handling incoming client connections, processing requests (gets, sets, deletes), managing data structures in memory, enforcing eviction policies, and handling background tasks like data persistence. The demands on the CPU can vary significantly.

Cores vs. Clock Speed

A long-standing debate in server hardware is the trade-off between having many cores (multi-core) versus having fewer cores that run at a higher frequency (high clock speed). For caching servers, the "correct" choice depends heavily on the software's architecture.

As a general rule, a modern server-grade CPU (Intel Xeon, AMD EPYC) with a balance of a decent number of cores (e.g., 8-16) and a good base clock speed (e.g., 2.5+ GHz) is a safe starting point. For very high-throughput systems, more cores will be necessary. For example, Jainandunsing (2025) points out that a minimal Apache Ignite setup requires 2 cores, acknowledging its more complex, JVM-based, and multi-threaded nature compared to the 1-core recommendation for a basic Redis instance.

Memory (RAM): The Heart of the Cache

RAM is, without question, the most critical component of a caching server. The entire purpose of most caching systems is to keep the "hot" dataset in memory, avoiding slow disk access. The amount, speed, and type of RAM directly dictate the cache's capacity and performance.

Sizing Your RAM

The first question is always: "How much RAM do I need?" This requires estimating:

  1. Object Size: The average size of a single item you plan to cache (e.g., a user session JSON object might be 2 KB).
  2. Object Count: The total number of items you need to keep in the cache at any given time.
  3. Overhead: The caching software itself uses memory for its own data structures (like hash tables for key lookups), connection buffers, and operational overhead. This can range from 10% to 50% or more of the data size.

For example, if you need to cache 1 million user sessions, each averaging 2 KB, your data size is 2 GB. Factoring in overhead, you would likely need a server with at least 4 GB of RAM dedicated to the cache, plus additional RAM for the operating system. Jainandunsing (2025) provides practical minimums, suggesting 256-512 MB for a small Redis instance caching 20 user sessions, while a Couchbase Server for the same task requires a 4 GB minimum due to its more complex database-like architecture.

Type and Speed

Modern servers use DDR4 or DDR5 RAM. While faster RAM (higher MHz) can provide a marginal performance benefit, the most critical feature for a production caching server is ECC (Error-Correcting Code) RAM. ECC RAM can detect and correct single-bit memory errors in-flight. A standard non-ECC module might flip a bit due to cosmic rays or electrical interference, leading to data corruption. In a caching server, this could mean serving incorrect data, corrupting session information, or causing the entire server process to crash. The reliability offered by ECC RAM is non-negotiable for any serious production environment.

Storage (Disk): The Persistence Layer

It may seem counterintuitive to discuss disk storage for an "in-memory" cache, but storage plays a vital supporting role.

Role of Storage

Type of Storage

The speed of your storage directly impacts startup time and persistence performance.

For a caching server, using at least a SATA SSD is highly recommended. An NVMe SSD provides the best performance and is the preferred choice for systems where recovery time is critical.

Network Interface Controller (NIC): The Gateway

The NIC is the physical port that connects the caching server to the rest of the network. All requests and responses flow through it. An under-provisioned NIC can easily become the primary performance bottleneck, even if the CPU and RAM are powerful.

Speed and Throughput

NICs are rated by their speed:

Redundancy

For high availability, it's common practice to use two or more NICs in a "bonded" or "teamed" configuration. This can be set up in an active-passive mode for failover (if one NIC or switch port fails, traffic automatically moves to the other) or in an active-active mode to aggregate bandwidth, providing higher throughput and redundancy simultaneously.

Example: Hardware Build Comparison

This table contrasts a minimal, low-cost build with a robust enterprise-grade configuration for a caching server.

Component Minimal Build (e.g., Redis for a small project) Enterprise Build (e.g., Ignite Cluster for high traffic)
CPU 1-2 Cores @ 2.0+ GHz (e.g., Raspberry Pi 4, low-end VM) 16-32+ Cores @ 3.0+ GHz (e.g., AMD EPYC, Intel Xeon Gold)
RAM 1-4 GB (non-ECC is acceptable for dev) 128-512+ GB ECC DDR5
Storage 32 GB MicroSD card or small SATA SSD 2 x 1 TB NVMe SSD (RAID 1 for OS/persistence)
Network 1 Gbps Ethernet 2 x 25 Gbps SFP28 (Bonded for redundancy/throughput)

Did You Know?

The concept of a Content Delivery Network (CDN) was born out of a challenge at MIT in the mid-1990s to solve the "World Wide Wait." A startup called Akamai was formed, which commercialized the idea of placing caching servers at the "edge" of the internet, physically closer to users. This was a pioneering application of hardware-based caching on a global scale, and it fundamentally changed how the internet delivers content.

Section 2 Summary

Reflective Questions

  1. Why is ECC RAM particularly important for a caching server that stores user session data, compared to a developer's local workstation?
  2. You have a fixed budget to build a caching server for a highly read-intensive workload using Redis. Would you prioritize spending more on a CPU with a higher clock speed or on doubling the amount of RAM? Justify your choice.
  3. If your caching server crashes and has no disk persistence configured, what is the immediate and cascading impact on your application's database servers?

Section 3: The Open-Source Software Landscape

Introduction: Choosing the Right Tool for the Job

Selecting the right hardware provides the potential for performance, but it is the caching software that realizes it. The open-source world offers a rich and diverse ecosystem of caching solutions, each with its own philosophy, feature set, and performance characteristics. Choosing the right software is a critical architectural decision that depends on your specific use case, scalability requirements, data consistency needs, and operational expertise. A solution that is perfect for simple session caching might be entirely inadequate for a distributed, transactional financial system.

This section provides a high-level overview of the major open-source caching solutions that we will explore in-depth throughout this course. We will categorize them based on their architecture and primary function, and compare their strengths and weaknesses. This survey, informed by comparative analyses like that of Jainandunsing (2025), will equip you to make informed decisions about which caching technology best fits a given problem.

Categories of Caching Software

We can broadly group the popular caching solutions into four categories:

  1. In-Memory Key-Value Stores: These are the simplest and often the fastest types of caches. They store data in a simple key-value format and are prized for their low latency and ease of use.
  2. In-Memory Data Grids (IMDGs): These are more advanced, distributed systems that pool the memory of multiple servers into a single logical data fabric. They offer features far beyond simple caching, such as distributed computations, SQL querying, and transactional consistency.
  3. Hybrid NoSQL Databases with Caching Tiers: These are full-fledged databases designed with a memory-first architecture. They provide both the speed of an in-memory cache and the durability and query capabilities of a persistent database.
  4. HTTP Accelerators / Reverse Proxies: These are specialized servers that sit in front of web applications. Their primary role is to intercept HTTP requests and serve cached responses directly, without ever hitting the application server for static or semi-static content.

A Tour of the Leading Solutions

Redis: The Swiss Army Knife

Redis (REmote DIctionary Server) is arguably the most popular key-value store in the world. Its defining feature is its support for a rich set of data structures beyond simple strings, including Lists, Sets, Sorted Sets, Hashes, and HyperLogLogs. This makes it more of a "data structure server" than a simple cache.

Memcached: The Pure Speed Specialist

Memcached is one of the original high-performance, distributed memory object caching systems. It has a singular focus: to be a simple, blazing-fast, in-memory bucket for keys and values.

Apache Ignite: The Heavyweight Data Grid

Apache Ignite moves beyond simple caching into the realm of In-Memory Data Grids (IMDGs) and distributed computing platforms.

Couchbase Server: The Database as a Cache

Couchbase is a distributed NoSQL document database that is built with a "memory-first" architecture. Every operation happens in a managed memory layer first, with persistence to disk happening in the background.

Varnish Cache: The HTTP Accelerator

Varnish is not a general-purpose cache; it is a specialized reverse proxy designed exclusively to accelerate HTTP traffic.

NGINX: The Versatile Web Server

NGINX is a world-famous web server, reverse proxy, and load balancer. One of its many features is a capable, if simple, caching module.

Example: Varnish VCL for Bypassing Cache

This snippet from a Varnish Configuration Language (.vcl) file shows how to instruct Varnish to not cache a request if it sees a cookie named "sessionid", which typically indicates a logged-in user who needs dynamic content.


# VCL subroutine called after a request has been received.
sub vcl_recv {
    # If the request has a cookie header and it contains "sessionid"
    if (req.http.Cookie ~ "sessionid=") {
        # Do not attempt to look this request up in the cache.
        # Pass it directly to the backend server.
        return (pass);
    }

    # Otherwise, proceed with the default cache lookup.
    return (hash);
}

Did You Know?

Memcached was originally developed by Brad Fitzpatrick at Danga Interactive for LiveJournal in 2003. LiveJournal was one of the largest social networking sites of its time, and its database was struggling under the immense read load. Memcached was created as a simple, distributed hash table in memory to offload these reads, and its success was instrumental in enabling web applications to scale to millions of users. It was later open-sourced and became a foundational technology for giants like Facebook, YouTube, and Twitter.

Section 3 Summary

Reflective Questions

  1. Your team is building a new mobile banking app where data consistency and transactional integrity are paramount. Why might you choose Apache Ignite over Redis for caching user account balances?
  2. If your primary goal is to reduce server load by caching entire, publicly accessible article pages on a high-traffic news website, would you choose Varnish or Memcached? Explain your reasoning.
  3. When would it make sense to use NGINX for caching instead of deploying a dedicated Redis server? What are the trade-offs?

Glossary of Key Terms

Cache
A temporary, high-speed storage layer that stores a subset of data so that future requests for that data are served up faster than is possible by accessing the data's primary storage location.
Latency
The delay between a user's action and a web application's response to that action. Caching is a primary tool for reducing latency.
Cache Hit / Cache Miss
A "cache hit" occurs when requested data is found in the cache. A "cache miss" occurs when it is not, requiring a fetch from the primary data store.
TTL (Time To Live)
A value or setting for a piece of data in a cache that specifies how long it should be considered valid before it is automatically deleted or marked as stale.
LRU (Least Recently Used)
A cache eviction policy that discards the least recently used items first to make room for new data when the cache is full.
CDN (Content Delivery Network)
A geographically distributed network of proxy servers that cache content close to end-users, reducing network latency.
Key-Value Store
A simple data storage paradigm that uses a unique key to retrieve an associated value, like a dictionary. Redis and Memcached are examples.
IMDG (In-Memory Data Grid)
A distributed system that pools RAM from multiple computers to create a single, large in-memory data store, offering advanced features like distributed processing and high availability.
Reverse Proxy
A server that sits in front of web servers and forwards client requests to those servers. Reverse proxies are often used for load balancing, SSL termination, and caching (e.g., Varnish, NGINX).

References

Jainandunsing, K. (2025). Caching servers hardware requirements & software configurations. Tech Publishing.

Back to Course Index