Squid Proxy
5th Dec 2010, 15:38:59
This is everything I know about Squid all in one place.
The last time wrote about Squid -- over five years ago -- it was at version 2.5. Much has changed since then and my setup looks very different these days. Now that bandwidth is not nearly so scarce as it was in 2005, I don't use Squid to cache anything to disk.
Here's how I set up the perfect Squid install for my purposes:
My platform of choice is Debian Linux. For my proxy setup I am using 'Squeeze', since it includes Squid 3.1. Squid version 3.1 has many enhancements, but most important for me is the inclusion of IPv6 support. You could just as easily use 'Lenny'.
Install the squid3
package rather than squid
, unless you know you need
the older Squid version 2.7.
# apt-get install squid3
The default squid.conf
is very well commented, but it is overkill for a simple and
efficient setup. It can serve as a useful resource for looking up what certain configuration
directives do though, so we'll move it sideways:
# cd /etc/squid3 # mv squid.conf dist-squid.conf
In my view, this is the absolute minimal working Squid configuration one can have:
acl manager proto cache_object acl localhost src 127.0.0.1/32 acl to_localhost dst 127.0.0.0/8 # These are our local networks which will have permission to access the cache acl localnets src 172.16.0.0/24 src 2001:470:903f::/64 acl SSL_ports port 443 acl Safe_ports port 80 # http acl Safe_ports port 21 # ftp acl Safe_ports port 443 # https acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port 1025-65535 # unregistered ports acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 591 # filemaker acl Safe_ports port 777 # multiling http acl CONNECT method CONNECT http_access allow manager localhost http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports http_access allow localnets http_access allow localhost http_access deny all http_port 3128 # Defaults to off for bandwidth management and access logging # If access logging or traffic shaping like delay pools are needed, turn this off! pipeline_prefetch on coredump_dir /var/spool/squid3 # Prevent stale data being served from cgi scripts # (probably does nothing in my setup because I don't cache, but can't hurt) hierarchy_stoplist cgi-bin ? refresh_pattern ^ftp: 1440 20% 10080 refresh_pattern ^gopher: 1440 0% 1440 refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 refresh_pattern . 0 20% 4320 # I don't do any access logging for privacy and security reasons cache_access_log none # Only needed for troubleshooting disk cache problems cache_store_log none cache_mgr trouble@toastputer.net # default is 256 MB. Controls amount of RAM to use as cache, not overall limit! cache_mem 96 MB visible_hostname proxy1.spruce.toastputer.net # direct-site contains sites which don't seem to play nicely and I can't be bothered to fix acl direct-site dstdomain .facebook.com always_direct allow direct-site # The following headers are useful for troubleshooting faults, but are really more of a risk to # privacy in my environment, so they are disabled request_header_access Via deny All request_header_access X-Forwarded-For deny All request_header_access Proxy-Connection deny All
Using the above config, I have squid running comfortably in 256MB of RAM in a Xen paravirtualised virtual machine. If you just want a minimal Squid proxy, you can stop here.
Blocking Advertisements or Other Content
This is pretty easy and it doesn't even require a redirector script like adzapper
any more. I just use the list at pgl.yoyo.org, since this
blocks the most obnoxious adverts effectively enough for me.
I use the following script to fetch the list:
#!/bin/sh # Fetch the list /usr/bin/wget -O /etc/squid3/yoyo \ 'http://pgl.yoyo.org/adservers/serverlist.php?hostformat=squid-dstdom-regex&showintro=0&mimetype=plaintext' \ || { echo "wget failed"; exit 1; } # Reload squid squid3 -k reconfigure exit
We don't want to abuse the free service that this nice gentleman offers, so I have set a crontab entry to check for a new version once every eight days.
# m h dom mon dow command 00 04 * * */8 /usr/local/bin/getyoyolist >> /dev/null 2>&1
Once the script has run, these lines can be added to squid.conf
so that squid will use
the yoyo blacklist.
http_port 8080 acl ads dstdom_regex "/etc/squid3/yoyo" acl ad-filtered myport 3128 # block ads for requests to dstdomains in 'ads' AND where user is on port 3128 # 'ads' acl must be last so that it is the acl picked up by deny_info later http_access deny ad-filtered ads # Where a request is blocked due to 'ads' acl, return an empty file not an error deny_info http://adzapper.toastputer.net/zaps/empty ads
Now your squid offers a filtered service on port 3128 and an unfiltered service on port 8080. I
have set Squid to serve up an empty file in place of the adverts, whilst you're welcome to use mine,
you should really point deny_info
at web server you control. If deny_info
is not set, Squid will return an error page instead of the blocked file, which may be desirable for
troubleshooting when you need to confirm that an object is indeed being blocked.
Whilst this approach can be extended to block any content you wish simply by adding more ACLs, I recommend that you look at the following two products if your needs are more complex:
- SquidGuard - a powerful filtering plugin. Useful if you need to block long lists of whole sites and present your users with pretty pages explaining why.
- DansGuardian - true content filtering, more like WebSense(TM) and co. Will filter based on the content of a page, not just URL. Highly configurable.
Both of these approaches will be slower and require more system resources than plain old Squid.
Logging
Don't do any logging unless you really need to or you are prepared to accept the performance
penalty. You must turn off the pipeline_prefetch
, since this is incompatible with
logging.
##is incompatible with access logging: #pipeline_prefetch on #cache_access_log none cache_log /var/log/squid3/cache.log cache_access_log /var/log/squid3/access.log cache_store_log none #This can help troubleshooting, but leave commented out for production use - it degrades performance #cache_store_log /var/log/squid3/store.log
Caching
Consider carefully whether you really want to have a disk cache. The hit rate is very low (about 3% of requests are ever served from the cache). Each object held in the cache requires a certain amount of RAM so that Squid can keep track of it, so this results in either tying up a lot of RAM, or a massive performance penalty if the server begins to hit swap space.
My Squid setup is configured to cache only in RAM. This means that the 'hottest' objects will be served quickly, but Squid doesn't eat through huge amounts of RAM trying to keep track of a large disk cache.
That said, if you have a large number of users who frequently request the same content, or you are so bandwidth limited that 3% is a big deal to you, of course you can cache. We must start with some tedious but important planning.
Firstly, we need to establish how much RAM Squid will require. On 64-bit architectures, Squid will us 14MB per 1GB of disk cache. In this example, I'm using a 120GB partition, so I know that Squid will need about 1.6GB of RAM, purely to keep track of its own cache. My server will have 4GB of RAM, so I know that I can spare this amount. Otherwise, I would need to reduce the size of my cache to match the available memory.
A Squid cache is divided up into first level and second level directories. This is necessary because it would take Squid far to long to locate the file it needed if they were all in the same directory. So, the second consideration is to calculate how many level 1 and level 2 directories are needed for our 120GB partition using this formula:
(((x / y) / 256) / 256) * 2 = z
Let x
be the size of the cache in kB. Let y
be the average size of
objects in the cache in kB
(if you don't know this value, 13kB is considered to be a reasonable choice). z
will equal the
number of level 1 directories required.
Squid gets extremely upset if it runs out of space in its cache_dir
, so I am going to
leave plenty of headroom here! For starters, my '120GB' disk is actually more like 111GB
when measured in base-2 rather than the base-10 manufacturers use. Squid will need some space to
write swap and other temporary files, so I am going to allocate only 100GB, leaving 11GB free for
these purposes. (100 * 1024) * 1024 = 104857600kB, so:
(((104857600 / 13) / 256) / 256) * 2 = 246.153846
At long last we have to proper values to plug in to our cache_dir directive:
# location size in MB L1 L2 cache_dir ufs /var/spool/squid3 102835 247 256
By default, Squid will only cache files 4MB or smaller. This is a good optimisation for performance, but bad if you are looking to save bandwidth. Squid can be instructed to cache more aggressively, for example:
# default is 4096kB maximum_object_size 1 GB # tarballs tend not to change without their filename changing to a different version number: refresh_pattern -i \.gz$ 4320 100% 43200 reload-into-ims refresh_pattern -i \.bz2$ 4320 100% 43200 reload-into-ims refresh_pattern -i \.dmg$ 4320 100% 43200 reload-into-ims refresh_pattern -i \.bin$ 4320 100% 43200 reload-into-ims # cache Windows updates for your Windows users: refresh_pattern -i windowsupdate.com/.*\.(cab|exe) 4320 100% 43200 reload-into-ims refresh_pattern -i download.microsoft.com/.*\.(cab|exe) 4320 100% 43200 reload-into-ims refresh_pattern -i uk.download.windowsupdate.com/.*\.(cab|exe) 4320 100% 43200 reload-into-ims # AVG updates: refresh_pattern guru.avg.com/.*\.(bin) 4320 100% 43200 reload-into-ims;
Bandwidth Restriction
Squid has a method of preventing a single user or small group of users from hogging all the bandwidth, or indeed to prevent your web users as a whole from swamping your Internet link. The feature is called 'delay pools'.
Important: Note carefully the difference between 'b' (one bit) and 'B' (one byte/eight bits). Squid uses only B (bytes) per second, whereas Internet links are normally talked about in terms of bits (b) per second. Things will get confusing very quickly if you mix them up!
If you want to impose an overall limit on Squid's bandwidth of, say, 6Mbps then this can be done very simply:
##is incompatible with delay pools: #pipeline_prefetch on delay_pools 1 delay_class 1 1 delay_access 1 allow all # 6Mbps = 768,000Bps (768 kilobytes per second) delay_parameters 1 768000/768000
This is fine, but it will only limit bandwidth in a simplistic way. It's still possible for one user to hog all of that bandwidth to the detriment of other users. It's possible to prevent this, but it's necessary to have a more detailed knowledge of how Squid deals with bandwidth.
The overall bandwidth available to Squid comes from a delay pool which holds 200MB. This pool refills at a rate of 20mbps. This means that our users as a whole may download 200MB at a rate in excess of 20Mbps before any bandwidth controls will activate. This helps Squid to respond to short spikes in demand of the sort than can occur after a network outage or similar event.
Each of our users has a bandwidth bucket which they may dip into the pool. Each bandwidth bucket holds 20MB. An individual user can download a 20MB file at unrestricted speed, provided that there is sufficient bandwidth left in the delay pool. After this 20MB is bucket exhausted, or the delay pool becomes empty, the user will be limited to 2mbps.
The result for the end user is that small file downloads will be very fast, so normal web browsing will be very responsive. Those who download large files all day will find their connection rate limited so that they won't be able to impinge on other users' bandwidth.
Here's how it looks in the Squid config after all bits have been converted to bytes:
##is incompatible with access logging: #pipeline_prefetch on # I have one delay pool delay_pools 1 # It is a class two pool, designed for a class C (/24) network delay_class 1 2 # Limits are expressed as: pool number, overall limits (fill rate/capacity), # per host limits (fill/cap): # Pool 1 fills at 20Mbps and holds 200MB. Each host bucket fills at 2Mbps and holds 10MB. # pool bucket delay_parameters 1 2621440/209715200 262144/104857600 # Pool 1 is accessed by the IP range described by the 'Public' acl delay_access 1 allow All
I've chosen these numbers mainly because the maths is easy. An element of trial an error will be needed to make this work for you.
More Than One Squid
If your Squid proxy stops for any reason, you're likely to have lots of users complaining. You can guard against this by having multiple servers running Squid and using DNS to round robin between them. But what about the cache? If we don't tell each squid about the other, each will end up maintaining independent but similar caches. Squid has a mechanism to deal with this. Here's an example of how it would be configured on two Squids, proxy1 and proxy2.
On proxy1:
# Make squid listen for HTCP requests: htcp_port 4827 # Tell it about the other Squid: # proxy-only tells squid not to cache stuff it requests from this peer - that would be pointless cache_peer proxy2.spruce.toastputer.net sibling 3128 4827 proxy-only htcp # The other squid should only access stuff we have cached to avoid 'tromboning'. acl othersquid 172.16.0.8/32 miss_access deny othersquid
On proxy2:
# Make squid listen for HTCP requests: htcp_port 4827 # Tell it about the other Squid: # proxy-only tells squid not to cache stuff it requests from this peer - that would be pointless cache_peer proxy1.spruce.toastputer.net sibling 3128 4827 proxy-only htcp # The other squid should only access stuff we have cached to avoid 'tromboning'. acl othersquid 172.16.0.7/32 miss_access deny othersquid