Curl is one of those quintessential *nix tools that adheres beautifully to the “one tool, one task” philosophy. curl exists to give us the ability to issue requests against web servers. As sysadmins we’re usually concerned with how the web server responds to requests rather than how the actual page renders so a CLI tool like curl is quick and easy. It also lets us spoof things like user agents and referers in case we want to see how the web site responds to different browsers or different referers.
Let’s look at this site:
$ curl http://slumpedoverkeyboarddead.com | head
|
1 2 3 4 5 6 7 8 9 10 11 |
<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <title>Slumped Over Keyboard Dead</title> <meta name="description" content="That's how they'll find me one day..." /> <meta name="HandheldFriendly" content="True" /> |
I get the entire page back, but I’ve spared you all that output. I’m almost never interested in the whole page content, I am usually interested in just the HTTP headers and what the server does with my request. So, let’s look at just the headers:
$ curl -I http://slumpedoverkeyboarddead.com
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
HTTP/1.1 200 OK Server: Sucuri/Cloudproxy Date: Sat, 06 Feb 2016 20:19:13 GMT Content-Type: text/html; charset=utf-8 Content-Length: 15637 Connection: keep-alive Cache-Control: public, max-age=0 ETag: W/"W+WwcG6qvlsxR2qpWylR9Q==" Vary: Accept-Encoding X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff |
Now we’re getting somewhere. The server returns a 200 OK response so I know the server is healthy and it will give me content if I asked for it.
A lot of sites will return a 301 redirect to a new location rather than a 200. You could theoretically use the value of the Location: header in the 301 and issue another curl command against it, or you could just tell curl to follow the redirects:
$ curl -I -L www.phoenixhollow.ca
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
HTTP/1.1 301 Moved Permanently Server: Sucuri/Cloudproxy Date: Sat, 06 Feb 2016 20:22:18 GMT Content-Type: text/html Content-Length: 178 Connection: keep-alive Location: http://phoenixhollow.com/ HTTP/1.1 200 OK Server: Sucuri/Cloudproxy Date: Sat, 06 Feb 2016 20:22:18 GMT Content-Type: text/html Connection: keep-alive X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff |
From this, we can see that the .ca domain redirects to the .com version and the final destination is happy with a 200 response.
What about HTTPS SSL secured sites with invalid certificates? No problem, use the -k switch to tell curl to accept them anyhow so you can get on with your life.
Note: props to Sean Walberg for correcting my curl-fu; I was habitually using -k in all cases of SSL which is not needed with valid certs.
$curl -I https://expired.badssl.com/
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
curl: (60) SSL certificate problem: certificate has expired More details here: http://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. |
OK:
$curl -I -k https://expired.badssl.com/
|
1 2 3 4 5 6 7 8 9 10 11 |
HTTP/1.1 200 OK Server: nginx/1.6.2 (Ubuntu) Date: Mon, 15 Feb 2016 00:57:39 GMT Content-Type: text/html Content-Length: 469 Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT Connection: keep-alive ETag: "5674c418-1d5" Cache-Control: no-store Accept-Ranges: bytes |
Maybe I need something more exotic. A friend is complaining that when he clicks a Facebook link to my site, he is being redirected to some other site. But when I load the offending page, it works fine for me. We can tell curl to go get that page with a Facebook referer set and see if that changes the behaviour. Hint: if it does, there’s a high chance my site is infected with something bad.
$ curl -I -L --referer "https://www.facebook.com/my-awesome-referer" http://slumpedoverkeyboarddead.com
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
HTTP/1.1 200 OK Server: Sucuri/Cloudproxy Date: Sun, 07 Feb 2016 14:00:13 GMT Content-Type: text/html; charset=utf-8 Content-Length: 15637 Connection: keep-alive Cache-Control: public, max-age=0 ETag: W/"W+WwcG6qvlsxR2qpWylR9Q==" Vary: Accept-Encoding X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff |
I don’t have an example page set up to redirect based on referer, but a quick look in the access log shows it was ingested:
|
1 2 |
Feb 07 10:00:07 boxybox access.log: 185.93.229.6 - - [07/Feb/2016:09:00:07 -0500] "GET / HTTP/1.1" 304 0 "https://www.facebook.com/my-awesome-referer" "curl/7.42.1" |
By the way, if you’re not using Papertrail to centralize your system logs, you’re missing out. But I digress.
Now on to the last line, see the curl/7.42.1 bit? That is the User Agent string and it tells the web server what application was used to send the request. Primarily, web browsers are used to access the web so this field usually has a string identifying which browser was used, but in this case I didn’t use a browser, but we can see that by default, curl sends that User Agent.
Maybe I want to hide that. Or maybe, as in the previous example, some page is behaving differently based on what User Agent it sees. The second example is reasonably common for web delivered malware targetted at a specific operation system. For example, malware can be told to look at the User Agent string of a request and only download bad software to Windows PCs, but not Linux or Mac computers. So let’s spoof the User Agent so I look like Firefox on Windows (I am actually using Chrome on ChomeOS):
$ curl -I -L --referer "https://www.facebook.com/my-awesome-referer" -A "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100 101 Firefox/40.1" http://slumpedoverkeyboarddead.com
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
HTTP/1.1 200 OK Server: Sucuri/Cloudproxy Date: Sun, 07 Feb 2016 14:07:41 GMT Content-Type: text/html; charset=utf-8 Content-Length: 15637 Connection: keep-alive Cache-Control: public, max-age=0 ETag: W/"W+WwcG6qvlsxR2qpWylR9Q==" Vary: Accept-Encoding X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff |
Taking a look at the logs, we can see that I no longer appear to be curl’ing, I seem to be using Firefox on Windows.
|
1 2 |
Feb 07 10:08:50 boxybox access.log: 185.93.229.6 - - [07/Feb/2016:09:08:50 -0500] "GET / HTTP/1.1" 200 15637 "https://www.facebook.com/my-awesome-referer" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1" |
As with almost all *nix tools, curl has a million possible uses and many more options than I’ve covered here. These are just the most common ones for me right now. Having the power to grab web content under arbitrary conditions can be invaluable in the troubleshooting process.