The HTTP HEAD method is used to retrieve metadata about a resource. The precise definition in the HTTP RFC is:
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.
An easy way to issue a HEAD request is with telnet:
$ telnet google.com 80 HEAD / HTTP/1.0 Trying 74.125.137.138... Connected to google.com. Escape character is '^]'. HEAD / HTTP/1.0 HTTP/1.0 200 OK Date: Thu, 30 May 2013 15:34:17 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=ISO-8859-1 Set-Cookie: PREF=ID=cf14264bbb89b22a:FF=0:TM=1369928057:LM=1369928057:S=oGwwGGwnb0msliO-; expires=Sat, 30-May-2015 15:34:17 GMT; path=/; domain=.google.com Set-Cookie: NID=67=gSJdDozeK31k5KcBf4CHxScq8j8BRe1qFd-HLKbbGEYUzXwdds12xnhbASMgn5Mqczh7XuzyVHIvi1412tZRfilVK_XppMumZEarcK_DCDsNMbd4S88yGcYBPeIyVHuY; expires=Fri, 29-Nov-2013 15:34:17 GMT; path=/; domain=.google.com; HttpOnly P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info." Server: gws X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN Connection closed by foreign host.
It is also easy to use Rubys open-uri library to get this same metadata:
head.rb
#!/usr/bin/env ruby require 'open-uri' def do_head(uri) begin open(uri) do |http| puts "uri: #{http.base_uri}" puts "charset: #{http.charset}" puts "content-encoding: #{http.content_encoding.join(', ')}" puts "content-type: #{http.content_type}" puts "meta: " http.meta.keys.sort.each { |k| puts "\t#{k}: #{http.meta[k]}" } puts "last-modified: #{http.last_modified}" puts "Status: #{http.status.join(' - ')}" end rescue => e puts "#{e}" end end ARGV.each do |uri| do_head(uri) puts '-' * 50 end
The head.rb script allows you to specify multiple URLs, and prints out the metadata for each URL, or an error message if there is a problem accessing a URL. As an example, let's get the metadata for http://wikipedia.org and http://w3.org.
head.rb example
$ ./head.rb http://wikipedia.org http://www.w3.org uri: http://www.wikipedia.org/ charset: utf-8 content-encoding: content-type: text/html meta: age: 11 cache-control: s-maxage=3600, must-revalidate, max-age=0 connection: close content-length: 47746 content-type: text/html; charset=utf-8 date: Thu, 30 May 2013 15:52:31 GMT last-modified: Thu, 23 May 2013 21:12:56 GMT server: Apache vary: Accept-Encoding x-cache: HIT from cp1019.eqiad.wmnet, HIT from cp1009.eqiad.wmnet x-cache-lookup: HIT from cp1019.eqiad.wmnet:3128, HIT from cp1009.eqiad.wmnet:80 x-content-type-options: nosniff last-modified: 2013-05-23 17:12:56 -0400 Status: 200 - OK uri: http://www.w3.org charset: utf-8 content-encoding: content-type: text/html meta: accept-ranges: bytes cache-control: max-age=600 content-length: 32182 content-location: Home.html content-type: text/html; charset=utf-8 date: Thu, 30 May 2013 15:52:43 GMT etag: "7db6-4ddf0827d9380;89-3f26bd17a2f00" expires: Thu, 30 May 2013 16:02:43 GMT last-modified: Thu, 30 May 2013 14:42:38 GMT p3p: policyref="http://www.w3.org/2001/05/P3P/p3p.xml" server: Apache/2 tcn: choice vary: negotiate,accept last-modified: 2013-05-30 10:42:38 -0400 Status: 200 - OK
No comments:
Post a Comment