Thursday, May 30, 2013

HTTP HEAD Method in Ruby

The HTTP HEAD method is used to retrieve metadata about a resource. The precise definition in the HTTP RFC is:

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

An easy way to issue a HEAD request is with telnet:

$ telnet google.com 80
HEAD / HTTP/1.0

Trying 74.125.137.138...
Connected to google.com.
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.0 200 OK
Date: Thu, 30 May 2013 15:34:17 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=cf14264bbb89b22a:FF=0:TM=1369928057:LM=1369928057:S=oGwwGGwnb0msliO-; expires=Sat, 30-May-2015 15:34:17 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=gSJdDozeK31k5KcBf4CHxScq8j8BRe1qFd-HLKbbGEYUzXwdds12xnhbASMgn5Mqczh7XuzyVHIvi1412tZRfilVK_XppMumZEarcK_DCDsNMbd4S88yGcYBPeIyVHuY; expires=Fri, 29-Nov-2013 15:34:17 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN

Connection closed by foreign host.

It is also easy to use Rubys open-uri library to get this same metadata:

head.rb

#!/usr/bin/env ruby

require 'open-uri'

def do_head(uri)
  begin
    open(uri) do |http|
      puts "uri: #{http.base_uri}"
      puts "charset: #{http.charset}"
      puts "content-encoding: #{http.content_encoding.join(', ')}"
      puts "content-type: #{http.content_type}"
      puts "meta: "
      http.meta.keys.sort.each { |k| puts "\t#{k}: #{http.meta[k]}" }
      puts "last-modified: #{http.last_modified}"
      puts "Status: #{http.status.join(' - ')}"
    end
  rescue => e
    puts "#{e}"
  end
end

ARGV.each do |uri|
  do_head(uri)
  puts '-' * 50
end

The head.rb script allows you to specify multiple URLs, and prints out the metadata for each URL, or an error message if there is a problem accessing a URL. As an example, let's get the metadata for http://wikipedia.org and http://w3.org.

head.rb example

$ ./head.rb http://wikipedia.org http://www.w3.org
uri: http://www.wikipedia.org/
charset: utf-8
content-encoding: 
content-type: text/html
meta: 
 age: 11
 cache-control: s-maxage=3600, must-revalidate, max-age=0
 connection: close
 content-length: 47746
 content-type: text/html; charset=utf-8
 date: Thu, 30 May 2013 15:52:31 GMT
 last-modified: Thu, 23 May 2013 21:12:56 GMT
 server: Apache
 vary: Accept-Encoding
 x-cache: HIT from cp1019.eqiad.wmnet, HIT from cp1009.eqiad.wmnet
 x-cache-lookup: HIT from cp1019.eqiad.wmnet:3128, HIT from cp1009.eqiad.wmnet:80
 x-content-type-options: nosniff
last-modified: 2013-05-23 17:12:56 -0400
Status: 200 - OK

uri: http://www.w3.org
charset: utf-8
content-encoding: 
content-type: text/html
meta: 
 accept-ranges: bytes
 cache-control: max-age=600
 content-length: 32182
 content-location: Home.html
 content-type: text/html; charset=utf-8
 date: Thu, 30 May 2013 15:52:43 GMT
 etag: "7db6-4ddf0827d9380;89-3f26bd17a2f00"
 expires: Thu, 30 May 2013 16:02:43 GMT
 last-modified: Thu, 30 May 2013 14:42:38 GMT
 p3p: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
 server: Apache/2
 tcn: choice
 vary: negotiate,accept
last-modified: 2013-05-30 10:42:38 -0400
Status: 200 - OK

No comments:

Post a Comment