Learn with Go: How does an HTTP Proxy work?
You try to download something from the internet at work, but the access is blocked. Your proxy does not trust the “evil” binary data.
Many of us have experienced something like this before, and as such have had to deal with an HTTP proxy.
But what exactly is an HTTP proxy and how does it work?
In the following blog post, I explain how you can implement a simple HTTP proxy that supports TLS interception using Go.
Why use HTTP Proxies?
Many servers and computer users are in company networks from which direct access to the internet is not possible, or only possible to a very limited extent.
Accessing the internet is only allowed for specific, explicitly named IP adresses or via an HTTP proxy, so that no data can leave or enter the internal network without being monitored.
If a client wants to access a host on the internet, it first has to explicitly configure an HTTP proxy. All HTTP traffic then passes through the proxy so that it sees which pages are called and what data are exchanged, and can use this information to filter or manipulate HTTP requests based on policies. For example it can check data for viruses, block certain hosts from a block list, or not accept certain types of data. In many configurations, the HTTP proxy also requires that the clients authenticate themselves, which allows for user-specific policies.
Implementing our own Proxy
A good way to understand how something works is to make it yourself, so we will be building our own HTTP proxy. It will not be suitable for production use and its primary intent is to illustrate how things work.
A proxy is actually a normal HTTP server that receives HTTP requests and forwards them to the destination server. Unlike a normal HTTP server, however, a proxy gets not only the path of the resource but the whole URL of the destination (see example request below).
At the end of this tutorial we will be able to call pages such as http://example.com via our HTTP proxy. We will start the proxy under localhost:8080 and test access via the proxy using curl:
curl --proxy http://localhost:8080 http://example.com
The request sent to the proxy on localhost:8080 would look something like this (whole URL):
GET http://example.com/ HTTP/1.1 Accept: */* Proxy-Connection: Keep-Alive User-Agent: curl/7.87.0
A request directly sent to example.com would look like this (only path /):
GET / HTTP/1.1 Host: example.com User-Agent: curl/7.87.0 Accept: */*
In the following sections the implementation is described step by step. Having a little bit of Go know-how would definitely help, but it should also be understandable for anyone with basic programming knowledge.
A snapshot of the source and a diff is linked in the title of each step so that you can look at the exact code changes.
Step 1: Simple Forwarding (code, diff)
First, we initialize a new Go project.
mkdir go-proxy cd go-proxy/ go mod init go-proxy
Next, we create main.go with the following code. In the main function we start an HTTP server with http.ListenAndServe. Requests to this server are processed by the function forward.
This function takes the requests and sends them to the actual target server using the RoundTrip method of DefaultTransport.
The response from the target server we want to pass to the client and hence we copy the headers and status code into the response (w), which is sent back to our client.
Finally we also copy (stream) the body from the target server to the client.
package main
import (
"io"
"log"
"net/http"
"os"
)
func forward(w http.ResponseWriter, r *http.Request) {
// send request to destination
resp, err := http.DefaultTransport.RoundTrip(r)
if err != nil {
log.Print(err)
return
}
// copy headers to response
for header, values := range resp.Header {
for _, value := range values {
w.Header().Add(header, value)
}
}
w.WriteHeader(resp.StatusCode)
// copy http body
io.Copy(w, resp.Body)
}
func main() {
err := http.ListenAndServe(":8080", http.HandlerFunc(forward))
if err != nil {
log.Print(err)
os.Exit(1)
}
}
To test the proxy we can build and run it as follows:
go build ./go-proxy
With curl we can now send a request via the proxy:
curl --proxy http://localhost:8080 http://example.com
Yay. That was easy!
But what about HTTPS connections? For example a request to https://example.com?
$ curl --proxy http://localhost:8080 https://example.com curl: (35) OpenSSL/3.0.7: error:0A00010B:SSL routines::wrong version number
This does not seem to work. To understand more precisely what is going on here, we need to log the requests.
Step 2: Logging (code, diff)
To log the URL and headers for each request, we create a logging handler:
func logRequest(next http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
requestDump, _ := httputil.DumpRequest(r, false)
log.Printf("url=%s\n%s", r.URL, requestDump)
next.ServeHTTP(w, r)
}
}
Our previous forward handler can now be wrapped in our new logRequest handler. In Go this pattern where multiple HTTP handlers are wrapped is called middleware pattern.
handler := logRequest(forward)
After building the code and restarting the proxy we can repeat the request from before using HTTPS. We can now see that for an HTTPS request we do not have a GET request, but a CONNECT request.
chromium --proxy-server=http://localhost:8080
CONNECT example.com:443 HTTP/1.1 Host: example.com:443 Proxy-Connection: Keep-Alive User-Agent: curl/7.87.0
With a CONNECT request, the client asks the HTTP proxy to make a TCP connection to the target server and forward subsequent data through the TCP connection to the target server without expecting it to be an HTTP request.
That means in the example from above, the HTTP proxy should open a TCP connection to the destination server example.com:443 and signal the client with a HTTP/1.0 200 OK that the connection has been established. The TCP connection to the client and destination server is kept open and subsequent data is forwarded one-to-one. The HTTP client then starts the actual TLS handshake directly with the destination server.
Step 3: CONNECT Method (code, diff)
To support the CONNECT method, we implement the tunnel handler. In the tunnel handler we establish the TCP connection to the destination (serverConn). Then we return OK to the client (w.WriteHeader(http.StatusOK)) and after that we take over the TCP connection to the client (clientConn) with w.Hijack(). This way we can detach the underlying TCP connection from the HTTP handler abstraction (http.ResponseWriter).
At the end we copy the data from the client to the target server and vice versa on the TCP connections one to one (io.Copy).
func tunnel(w http.ResponseWriter, r *http.Request) {
dialer := net.Dialer{}
// connect to destination (e.g. example.com:443)
serverConn, err := dialer.DialContext(r.Context(), "tcp", r.Host)
if err != nil {
log.Printf("failed to connect to upstream %s", r.Host)
http.Error(w, http.StatusText(http.StatusServiceUnavailable), http.StatusServiceUnavailable)
return
}
defer serverConn.Close()
// obtain underlying client TCP connection
hj, ok := w.(http.Hijacker)
if !ok {
log.Print("hijack of connection failed")
http.Error(w, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
return
}
w.WriteHeader(http.StatusOK)
clientConn, bufClientConn, err := hj.Hijack()
if err != nil {
log.Print(err)
http.Error(w, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
return
}
defer clientConn.Close()
// tunnel the actual data
go io.Copy(serverConn, bufClientConn)
io.Copy(bufClientConn, serverConn)
}
In addition, we include a switch in our initial handler so that CONNECT requests are handled by the tunnel handler:
handler := logRequest(func(w http.ResponseWriter, r *http.Request) {
if r.Method == "CONNECT" {
tunnel(w, r)
} else {
forward(w, r)
}
})
After building and restarting our proxy, HTTPS requests work:
curl --proxy http://localhost:8080 https://example.com
So, we are able to implement an HTTP(S) proxy in a very short time with only a few lines of code.
TLS Termination
Unfortunately, for HTTPS requests after the CONNECT request, we can now no longer see what data is being transferred because the data is encrypted by the TLS connection between the client and the target server. Our proxy could therefore no longer detect and block “bad” data.
Furthermore, clients can now use the CONNECT method to make arbitrary connections to other servers and then use protocols other than HTTP. With the following command, for example, you could establish an SSH connection via the proxy.
ssh my-ssh-host -o "ProxyCommand=nc -X connect -x 127.0.0.1:8080 %h %p"
Thats not what we want. Therefore, we want to make sure that we are also able to inspect HTTPS traffic. To do this, we need to terminate the TLS connection that is established after the CONNECT request and re-establish another TLS connection to the target server.
To terminate the TLS connection on the proxy, we need a valid certificate for each hostname to which the client wants to connect. Normally, we can only obtain certificates for domains which we control.
Step 4a: Proxy CA (code, diff)
To obtain certificates for any domain, we need to create our own Certificate Authority (CA) for the proxy. However, it will then be a prerequisite that all clients trust our newly created proxy CA. This means that the proxy CA must be added to the client’s trust store.
We add a new function createCA, to create a private key and the corresponding CA certificate and save them in PEM format. We also add a flag -create-ca so that we either can run the proxy or create a new CA.
After rebuilding the code, we can create our proxy CA with the following command:
./go-proxy -create-ca
This will create the files proxy-ca.crt (certificate) and proxy-ca.key (private key).
Step 4b: Certificate generator (code, diff)
Once we have our own proxy CA, we can implement a certificate generator (certGenerator). The method Get of the generator can be used to create a certificate and key for any hostname. The generated certificates are signed with the proxy CA.
func (cg *certGenerator) Get(hostname string) (*tls.Config, error)
The certificate and key are returned as part of a tls.Config struct, which we can later use in a TLS server.
Step 4c: Intercept Handler (code, diff)
In this step, we replace the tunnel handler that handled the CONNECT requests with a new implementation that terminates the TLS connection with the client (interceptHandler.ServeHTTP). In this handler, we use the getCert function of the certificate generator to obtain a certificate and key for the hostname the client requested. As with the tunnel handler, we return OK to the client and take over the client TCP connection with w.Hijack(). Now, instead of simply forwarding the data at the TCP level, we wrap the TCP connection in a TLS connection using the TLS config we just obtained using getCert. The decrypted connection is then passed to an internal http.Server using handleConnection. This way, parsing the HTTP protocol does not have to be reimplemented. The internal http.Server marks the request as an HTTPS request and then uses the same implementation of forward that is used for plain HTTP requests.
We can build and run the updated version and test it with curl. It is important that we trust our proxy certificate with the --cacert option.
After this command, we have to add the proxy CA in the certificate settings. This can be done under Settings -> Privacy and security -> Security -> Manage certificates or directly by typing chrome://settings/certificates in the address bar. Then under the Authorities tab click on Import and select the file proxy-ca.crt.
Step 5: Hostname Filter (code, diff)
Now we have the basic framework and can start filtering or modifying HTTP traffic as we wish.
The only thing left to do is to implement a new HTTP handler in which we place our filter logic.
As an example, let’s look at a very rudimentary handler that we can use to block a list of hostnames. For initialization we pass a list of hostnames. If the hostname is in the list, the request is blocked. Otherwise, the next handler is called which in our case is the forwarding to the target server.
func blockHostFilter(blockedHostnames []string, next http.HandlerFunc) http.HandlerFunc {
hosts := map[string]struct{}{}
for _, host := range blockedHostnames {
hosts[host] = struct{}{}
}
return func(w http.ResponseWriter, r *http.Request) {
if _, ok := hosts[r.URL.Host]; ok {
msg := fmt.Sprintf("access to %s blocked by proxy", r.URL.Host)
http.Error(w, msg, http.StatusForbidden)
return
}
next(w, r)
}
}
In the main function we can also add another flag -block-list which allows us to set the blocked hosts on startup.
We can then test the filter as follows:
./go-proxy -block-list=youtube.com,twitter.com
A request to one of the pages now returns the following:
$ curl --cacert proxy-ca.crt --proxy http://localhost:8080 https://youtube.com access to youtube.com blocked by proxy
Conclusion
As already mentioned, the proxy is of course not really production ready. The generated certificates should be cached and there are many other points that could be improved.
Nevertheless, with Go we were able to develop an HTTP proxy that does TLS interception surprisingly easy. For testing I surfed for a longer time via the proxy, which worked without problems. Even YouTube and other sites where large amounts of data are involved were no problem for the proxy.
If you are intressted in learning Go or you wish to get support
Links
Get in touch with us if you need support with a Go project. And if you are interested in learning Go then check out the Go Basics training from our trainings partner acend.