Articles, Reports, Books

Make Projects: Small Form Factor PCs
Duane Wessels and Matthew Weaver
O'Reilly Media
November 2006

Make Projects: Small Form Factor PCs is the only book available that shows you how to build small-form-factor PCs -- from kits and from scratch -- that are more interesting and more personalized than what a full-sized PC can give you. Included in the book are projects for building personal video recorders, versatile wireless access points, digital audio jukeboxes, portable firewalls, and much more. This book shows you how to build eight different systems, from the shoebox-sized Shuttle system down to the stick-of-gum-sized gumstix.

Squid: The Definitive Guide
Duane Wessels
O'Reilly and Associates
January 2004

Squid is the most popular Web caching software in use today, and it works on a variety of platforms including Linux, FreeBSD, and Windows. Written by Duane Wessels, the creator of Squid, Squid: The Definitive Guide will help you configure and tune Squid for your particular situation. Newcomers to Squid will learn how to download, compile, and install code. Seasoned users of Squid will be interested in the later chapters, which tackle advanced topics such as high-performance storage options, rewriting requests, HTTP server acceleration, monitoring, debugging, and troubleshooting Squid.

High Performance Benchmarking with Web Polygraph
Alex Rousskov, Duane Wessels
Software Practice and Experience, Volume 34, Issue 2, Pages 187-211
January 2004

This paper presents the design and implementation of Web Polygraph, a tool for benchmarking HTTP intermediaries. We discuss various challenges involved in simulating Web traffic and in developing a portable, high performance tool for generating such traffic. Polygraph's simulation models, as well as our experiences with developing and running the benchmark, may be useful for Web proxy developers, performance analysts, and researchers interested in Web traffic simulation.

Wow, Thats a Lot of Packets
Duane Wessels, Marina Fomenkov
Proceedings of PAM 2003
April 2003

Organizations operating Root DNS servers report loads exceeding 100 million queries per day. Given the design goals of the DNS, and what we know about today's Internet, this number is about two orders of magnitude more than we would expect.

With the assistance of one root server operator, we took a 24-hour trace of queries arriving at one of the thirteen root servers. In this paper we analyze these data and use a simple model of the DNS to classify each query into one of nine categories. We find that, by far, most of the queries are repeats and that only a small percentage are legitimate.

We also characterize a few of the ``root server abusers,'' that is, clients sending a particularly large number of queries to the root server. We believe that much of the root server abuse occurs because the querying agents never receive the replies, due either to packet filters, or to routing issues.

Related to the above paper, SDSC made a press release about the research, which led to a slashdot posting. I collected my favorite comments. I was also interviewed by CircleID Network.

Running An Authoritative-Only BIND Nameserver
Duane Wessels
ISC Technical Note Series
December 2002

Nameservers (BIND) fulfill two functions: serving authoritative data for delegated zones, and relaying queries and responses for non-authoritative zones. In the interest of security, operators generally should not use a single nameserver for both functions. This note explains why, and how, you should configure BIND to implement these functions separately.

Web Caching
Duane Wessels
O'Reilly and Associates
June 2001

In this book, I talk about applying caching techniques to the World Wide Web, and try to convince you that web caching is a worthwhile endeavor. You'll see how web caches work, how they interact with clients and servers, and the role that HTTP plays. You'll learn about a number of protocols that are used to build cache clusters and hierarchies. In addition to talking about the technical aspects, I also spend a lot of time on the issues and politics. The web presents some interesting problems due to its highly distributed nature.

RFC 2756: Hyper Text Caching Protocol (HTCP/0.0)
Paul Vixie and Duane Wessels
January 2000

This document describes HTCP, a protocol for discovering HTTP caches and cached data, managing sets of HTTP caches, and monitoring cache activity. This is an experimental protocol, one among several proposals to perform these functions.

RFC 2655: CIP Index Object Format for SOIF Objects
T. Hardie, M. Bowman, D. Hardy, M. Schwartz, D. Wessels
August 1999

The Common Indexing Protocol (CIP) allows servers to form a referral mesh for query handling by defining a mechanism by which cooperating servers exchange hints about the searchable indices they maintain. The structure and transport of CIP are described in (Ref. 1), as are general rules for the definition of index object types. This document describes SOIF, the Summary Object Interchange Format, as an index object type in the context of the CIP framework. SOIF is a machine-readable syntax for transmitting structured summary objects, currently used primarily in the context of the World Wide Web.

Cache Digests
Alex Rousskov and Duane Wessels
Proceedings of the Third International Web Caching Workshop
June 1998

This paper presents Cache Digest, a novel protocol and optimization technique for cooperative Web caching. Cache Digest allows proxies to make information about their cache contents available to peers in a compact form. A peer uses digests to identify neighbors that are likely to have a given document. Cache Digest is a promising alternative to traditional per-request query/reply schemes such as ICP.

We discuss the design ideas behind Cache Digest and its implementation in the Squid proxy cache. The performance of Cache Digest is compared to ICP using real-world Web caches operated by NLANR@. Our analysis shows that Cache Digest outperforms ICP in several categories. Finally, we outline improvements to the techniques we are currently working on.

Visualization of the Growth and Topology of the NLANR Caching Hierarchy
Bradley Huffaker, Jaeyeon Jung, Duane Wessels, and K Claffy
Proceedings of the Third International Web Caching Workshop
June 1998

This paper presents Cache Digest, a novel protocol and optimization technique for cooperative Web caching. Cache Digest allows proxies to make information about their cache contents available to peers in a compact form. A peer uses digests to identify neighbors that are likely to have a given document. Cache Digest is a promising alternative to traditional per-request query/reply schemes such as ICP.

We discuss the design ideas behind Cache Digest and its implementation in the Squid proxy cache. The performance of Cache Digest is compared to ICP using real-world Web caches operated by NLANR@. Our analysis shows that Cache Digest outperforms ICP in several categories. Finally, we outline improvements to the techniques we are currently working on.

ICP and the Squid Web Cache
Duane Wessels and K Claffy
IEEE Journal on Selected Areas in Communication
April 1998, Vol 16, #3, pages 345-357

We describe the structure and functionality of the Internet Cache Protocol (ICP) and its implementation in the Squid Web Caching software. ICP is a lightweight message format used for communication among Web caches. Caches exchange ICP queries and replies to gather information to use in selecting the most appropriate location from which to retrieve an object.

We present background on the history of ICP, and discuss issues in ICP deployment, efficiency, security, and interaction with other aspects of Web traffic behavior. We catalog successes, failures, and lessons learned from using ICP to deploy a global Web cache hierarchy.

NOTE: This paper has a bug in section 2 (Related Work). The two numbers in this sentence should be reversed:

The average retrieval time decreased by a factor of 1.6 compared to no caching, and by a factor of 2.5 for demand-driven caching.

The correct version is:

The average retrieval time decreased by a factor of 2.5 compared to no caching, and by a factor of 1.6 for demand-driven caching.

Tutorial: Configuring Hierarchial Squid Caches
Australian Unix Users Group
September 1997, Brisbane, Australia

Squid and ICP: Past, Present, and Future
Proceedings of the Australian Unix Users Group
September 1997, Brisbane, Australia

RFC 2186: Internet Cache Protocol (ICP), version 2
Duane Wessels and k claffy
September 1997

This document describes version 2 of the Internet Cache Protocol (ICPv2) as currently implemented in two World-Wide Web proxy cache packages. ICP is a lightweight message format used for communicating among Web caches. ICP is used to exchange hints about the existence of URLs in neighbor caches. Caches exchange ICP queries and replies to gather information to use in selecting the most appropriate location from which to retrieve an object.

RFC 2187: Application of Internet Cache Protocol (ICP), version 2
Duane Wessels and k claffy
September 1997

This document describes the application of ICPv2 (Internet Cache Protocol version 2, RFC2186) to Web caching. ICPv2 is a lightweight message format used for communication among Web caches. Several independent caching implementations now use ICP, making it important to codify the existing practical uses of ICP for those trying to implement, deploy, and extend its use.

Intelligent Caching for World-Wide Web Objects
Proceedings of INET'95
June 1995, Hawaii

The continued increase in demand for information services on the Internet is showing signs of strain. While the Internet is a highly distributed system, individual data objects most often have only a single source. Host computers and network links can easily become overloaded when a large number of users access very popular data.

Proxy-caching is currently a popular way to reduce network bandwidth, server load and to improve response time to the user. The original caching proxy, from CERN, is probably still the most widely used. This paper describes software developed by the author that investigates some alternative techniques for caching World-Wide Web objects. This software complements traditional proxy-caching by allowing servers to explicitly grant or deny permission to cache an object, and with support for server-initiated callback invalidation of changed objects.

Intelligent Caching for World-Wide Web Objects
Masters Thesis, University of Colorado, Boulder
February, 1995

This thesis describes some software designed to improve access to World-Wide Web (WWW) data on the global Internet. The tools used for retrieving WWW objects allow users to be unaware of where the data actually resides. Huge inefficiencies result when objects are repeatedly transmitted across relatively slow wide area network (WAN) connections. A solution to this problem is to install object caches at strategic places in the network. Caches are implemented on proxy servers which act as intermediaries between local clients and remote servers. Frequently accessed objects will already be in the cache thereby speeding delivery time to clients and reducing WAN traffic.

Caching proxy servers for the World-Wide Web already exist, most notably the original CERN server. The CERN implementation leaves a lot of room for improvement. The software developed and described here is designed to give a better response to clients and impose less of a load on the server host. Internet servers maintain no state information about client accesses. This leads to an NFS-like model of caching where clients are responsible for maintaining cache consistency. This thesis investigates an AFS-like model whereby server sites issue callbacks to the sites which keep cached copies of the server's data. A cache negotiation protocol is described which gives information providers control of how, and for how long, their data may be distributed throughout remote network caches.

The following problems are addressed: The development of a single-process, non-blocking proxy server for lower system load; The maintenance of up-to-date and accurate cache data with minimal network overhead; The design of efficient and robust algorithms for cache management.


Ongoing Works

Squid Frequently Asked Questions

Web Caching Resources