I have been evaluating the excellent and super fast Similarity Analyser
by Red Hill Consulting to generate a dupplicated lines reports from
Rails 1.2.0 RC2 source code (excluding the tests).
On the 10th of January 2007, Rails 1.2.0 RC2 has 4125 duplicate lines
in 793 blocks in 231 files
* actionmailer has 584 duplicate lines in 107 blocks in 20 files
* actionpack has 718 duplicate lines in 154 blocks in 58 files
* actionwebservice has 241 duplicate lines in 51 blocks in 21 files
* actionrecord has 1529 duplicate lines in 301 blocks in 45 files
* activesupport has 418 duplicate lines in 78 blocks in 44 files
* railties has 635 duplicate lines in 102 blocks in 43 files
Detailled reports on
Now the debate is opened about what to do with these reports !!!
Whatever happens, I wouldn't expect it to happen before the release of
1.2. Submit a bug report with the full report and target it to 2.0.
Also, a post to rails-core about this would probably be a good idea,
though you might want to wait until 1.2 is out.
I have been evaluating the excellent and super fast Similarity Analyser
by Red Hill Consulting to generate a dupplicated lines reports from
Rails 1.2.0 RC2 source code (excluding the tests).
On the 10th of January 2007, Rails 1.2.0 RC2 has 4125 duplicate lines
in 793 blocks in 231 files
* actionmailer has 584 duplicate lines in 107 blocks in 20 files
* actionpack has 718 duplicate lines in 154 blocks in 58 files
* actionwebservice has 241 duplicate lines in 51 blocks in 21 files
* actionrecord has 1529 duplicate lines in 301 blocks in 45 files
* activesupport has 418 duplicate lines in 78 blocks in 44 files
* railties has 635 duplicate lines in 102 blocks in 43 files
Now the debate is opened about what to do with these reports !!!
Jean-Michel
Whatever happens, I wouldn't expect it to happen before the release of
1.2. Submit a bug report with the full report and target it to 2.0.
Also, a post to rails-core about this would probably be a good idea,
though you might want to wait until 1.2 is out.
Looking at the bottom lines of the reports, it appears that Simian regards Rails as being about 90% DRY. Not at all bad.
Now if I were in the core team, I would not appreciate having a load of statistics dumped into Trac and called a defect. Output from tools like Simian requires intelligent interpretation.
I suggest that Jean-Michel should prioritise the reported duplications by the expected saving if the duplicated code were to be refactored out - roughly (N-1)*(M-1) where N is the number of duplicated lines and M is the number of occurrences. The N-1 is because a call to the extracted method will still be required at each place where the code is duplicated, and the M-1 is because M occurrences will be reduced to 1.
Then, in priority order, he should look at each candidate for refactoring and try to come up with an intention-revealing name for the extracted method.
But if you actually go to the "duplicated" source lines, you find many of the 4-line duplicates to be
1: end
2:
3: def something
4: @something
in different classes or subclasses. My resonse to "what to do with these reports" is to ignore them. Unless the threshold is made >= 6 (instead of 4), or the tool was sophisticated enough to discount lines that are blank or contain only "end" (or "begin" or "rescue" or "else", etc.), the report is too bloated to be very helpful. It reminds me of debates about whether counting LOC (lines of code) in C should include lines having only "{" or "}".