Rails 1.2.0 RC2 duplicate lines Simian Report

I have been evaluating the excellent and super fast Similarity Analyser by Red Hill Consulting to generate a duplicate lines reports from Rails 1.2.0 RC2 source code (excluding the tests) with a threshold of 6 lines.

It works great except that simian treats successive "end" as dupplicate lines :-((( such as: end 172 end 173 end 174 end 175 end 176 end

I am working on some code to remove these lines from the report. I thought I could share with the community a first version so I can get some feeback to improve the tool.

I have written some code to generate an html report with links to the track source code browser (http://dev.rubyonrails.org/browser/branches/1-2-pre-release/) for each duplicate block.

Full report with checkout from 17/01/2007 on

Jean-Michel