Caching openSUSE repos with Squid

... and OpenWrt, in a way that requires no specific modification of clients.

Squid: squid-cache.org

How to

  • On clients in /etc/zypp/repos.d/*.repo change all https to http. (It's safe to do, because all packages are signed by repo key.)

  • On OpenWrt install bash, curl, squid and luci-app-squid and attach some external disk for extra space. Alternatively you can install Squid on any other machine in your network.

  • Add to squid.conf:

    maximum_object_size 1024 MB
    cache_dir aufs /mnt/data/squid <CACHE_SIZE> 16 256
    refresh_pattern \.rpm$ 10080 90% 43200
    http_port 3129 intercept
    url_rewrite_program /path/to/redirect.sh
    

    For <CACHE_SIZE> see cache_dir documentation.

  • redirect.sh:

    #!/bin/bash
    
    doo_regex='^http://download.opensuse.org/'
    
    while read -r url extras; do
        if [[ "$url" =~ $doo_regex ]]; then
            location="$(curl -s --head "$url" \
                | grep -E "^Location: " \
                | sed -e 's/^Location: \(.*\)\r$/\1/' \
                      -e 's/^https:/http:/')"
            if [[ -n "$location" ]]; then
                echo "OK url=\"${location}\""
            else
                echo "ERR"
            fi
        else
            echo "ERR"
        fi
    done
    
  • Configure OpenWrt port forwarding:

    This will redirect all outgoing http connections to Squid intercepting socket.

    /etc/config/firewall:

    config redirect
            list proto 'tcp'
            option name 'squid'
            option target 'DNAT'
            option src 'lan'
            option src_dport '80'
            option dest 'lan'
            option dest_ip '192.168.1.1'
            option dest_port '3129'
    

    Change dest_ip if you have installed Squid on different host.

Backstory

My internet connection over LTE and with monthly transfer limits forced me to optimize my downloads. I have two computers running openSUSE Tumbleweed, which as a rolling release distro has a lot of updates. Downloading them twice is an obvious waste of resources.

But one of those computers is my laptop, which is not always in my home network. So I wanted to have something, that requires no specific configuration on client side. Bonus points for running 100% on my router with OpenWrt.

First I found this guide:

https://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid

It's quite complicated due to parsing the list of mirrors and generating config for url rewriter. But it was a good starting point.

openSUSE's primary download server download.opensuse.org is an instance of MirrorBrain. It's not hosting any data, but it redirects client to a nearest mirror. In practice it looks like this:

$ curl -s --head http://download.opensuse.org/[...]/some.rpm
HTTP/1.1 302 Found
Date: Wed, 17 Mar 2021 21:24:05 GMT
Server: Apache/2.4.43 (Linux/SUSE)
X-MirrorBrain-Mirror: ftp.gwdg.de
X-MirrorBrain-Realm: other_country
Link: <http://download.opensuse.org/[...]/some.rpm.meta4>; rel=describedby; type="application/metalink4+xml"
Link: <https://ftp.gwdg.de/pub/opensuse/[...]/some.rpm>; rel=duplicate; pri=1; geo=de
Link: <http://widehat.opensuse.org/[...]/some.rpm>; rel=duplicate; pri=2; geo=de
Link: <http://mirror.karneval.cz/pub/linux/opensuse/[...]/some.rpm>; rel=duplicate; pri=3; geo=cz
Link: <http://ftp.lysator.liu.se/pub/opensuse/[...]/some.rpm>; rel=duplicate; pri=4; geo=se
Link: <http://mirror.tspu.ru/opensuse/[...]/some.rpm>; rel=duplicate; pri=5; geo=ru
Location: https://ftp.gwdg.de/pub/opensuse/[...]/some.rpm
Content-Type: text/html; charset=iso-8859-1

The HTTP redirect points to url in Location line, but there are also additional Link urls. They are used by zypper to connect and download parts of a package from multiple mirrors at the same time.

Unfortunately Squid cannot cache such partial downloads. To remove those additional urls, the redirect.sh script gets the Location from upstream server and tells Squid to send its own redirect response.

And that's it! The only thing that could "break" caching is download server returning different Location for every request. But so far I haven't seen it do this.