HTML ファイルの全ノードの XPath を取得する

ひさびさに Ruby を使ってみる。
HTML ファイルを読み込んで、ノードを走査してノードごとの XPath を取得する、ただそれだけのもの。

require 'kconv'
require 'rubygems'
require 'hpricot'

text = ''
File.open('wassr_user.html', 'r') do |f|
  text = f.read.toutf8
end

def walk_node_children(node, xpaths)
  node.each_child do |child_node|
    case child_node
    when Hpricot::Elem
      walk_node_children(child_node, xpaths)
      xpaths << child_node.xpath
#   when Hpricot::Text
#     puts "#{node.class}"
#   else
#     puts "#{node.class}"
    end
  end
end

xpaths = []

walk_node_children(Hpricot(text), xpaths)
 
xpaths.each do |xpath|
  puts "[#{xpath}]"
end