Wednesday, December 03, 2008

Benchmarking Makes it Better

This blog post is pulled from the Profiling and Optimizing Ruby tutorial I wrote for IBM DeveloperWorks a couple of years ago. If you'd like to reuse it, please note that the initial publication at DeveloperWorks must be attributed. I've reposted it here so that people can access it without signing up for a DeveloperWorks account.

The original tutorial used the following "shell_grabber.rb" script as example code:


file = "/etc/passwd"
File.open(file).each do |line|
  if line.match('/bin/bash') then
    print line.split(':')[0]
    puts " uses bash"
  end
end

A Benchmarking Primer

Before you can start replacing code, you should have an idea about how your intended replacement performs. The benchmark library is the tool you'll want to use for this. It's simple to use, you'll just write a benchmarking script that includes the methods you want to compare. The Benchmark class provides three instance methods to make this easier; bm, bmbm, and measure.

The bm method provides basic interface to the benchmark method. It takes an optional label_width argument. Each method tested can have a label specified. You define individual methods to be tested as shown here:


require 'benchmark'
n = 5_000_000
Benchmark.bm(15) do |x|
  x.report("for loop:")   { for i in 1..n; a = "1"; end }
  x.report("times:")      { n.times do   ; a = "1"; end }
  x.report("upto:")       { 1.upto(n) do ; a = "1"; end }
end

The bmbm method runs your benchmarking code twice, to help avoid complications from garbage collection. The first run is called the rehearsal, and is reported as part of the results. bmbm methods are defined in the same way that bm methods are. The only thing you'd need to change from listing 6 to use bmbm instead of bm methods is the call itself. To use the bmbm method, you'd write Benchmark.bmbm(15) do |x|.

Simple timing information can also be gathered using the measure method. (measure is used behind the scenes by the bm and bmbm methods to create their reports.) measure returns a Benchmark::Tms object that can be converted to a string for printing. A Ruby script using Benchmark#measure to time the construction of a string is shown below:


require 'benchmark'
puts Benchmark.measure { "foo" << "bar" << "baz" }

Comparing Code With benchmark.rb

Now that you've seen the basics of benchmarking, let's put it to use. We've already used time and profile.rb to look at the performance of our shell_grabber.rb script. Just pull out the part that does the work and put it into a new script like this.


require 'benchmark'
Benchmark.bm(25) do |x|
  x.report("original code") { 
    File.open("/etc/passwd").each do |line|
      if line.match('/bin/bash') then
        $stderr.print line.split(':')[0]
        $stderr.puts " uses /bin/bash"
      end
    end
  }
end

There's a change to the original code that are worth discussing. Instead of using bare print and puts statements, they're replaced with $stderr.print and $stderr.puts. You don't want to clutter the screen with hundreds or thousands of lines of output. This has the potential to dramaticly affect the timing of your code, so if you make a change like this to one method in your benchmark you'll want to make the same change to all of them. Take a look at just how much of a difference doing a $stdout.puts instead of a puts can make:


$stdout.puts  0.320000   0.040000   0.360000 (  0.396455)
naked puts    0.200000   0.020000   0.220000 (  0.262174)

Using Benchmarking Results

Using benchmarking results will help you choose which algorithm or which idiom will give you the best results. It's important that you don't throw off your results with improper tests though. For example, the benchmarking script shown above with the for, upto, and times methods only measured the performance of different forms of iteration. In this case it really doesn't matter what we do inside the iteration, as long as it's the same for each version. The output from that code modified to use Benchmark#bmbm is shown here:


Rehearsal --------------------------------------------------
for:             2.210000   0.030000   2.240000 (  2.288663)
times:           2.570000   0.060000   2.630000 (  2.623878)
upto:            2.530000   0.050000   2.580000 (  2.650115)
----------------------------------------- total: 7.450000sec

                     user     system      total        real
for:             2.080000   0.050000   2.130000 (  2.148350)
times:           2.590000   0.010000   2.600000 (  2.743801)
upto:            2.550000   0.030000   2.580000 (  2.588629)

Reading the output, you can see that there's a small but noticeable difference between the upto and times constructions, but that the real win comes with the for version — in this case, nearly a 20% improvement.

Notes

The importance of large datasets and/or multiple benchmarking runs was underscored while I was writing this section. In my initial version of the script, I only iterated through the loop 50,000 times. Then, when I ran the results the first time, the upto version of the iterator ran the fastest. When I moved up to 5,000,000 times through the loop and ran it a couple more times, I ended up hitting a consistent result of the for loop being better.

Dropping the number of iterations down to 5,000 continued to yield the best results for the for loop version in the non-rehearsal run, but would occasionally show the upto version as faster. Dropping down to 500 iterations showed near random results. In these cases, it appears that the time required for the run was so small (0.000606 seconds in one run) that external processes had an overwhelming influence on the benchmark.

Another dirty secret of benchmarking is that you'll need to do some extra work to ensure that your results are statistically valid. Please see my blog post Benchmarking, Lies, and Statistics for more details about this.

Finally, tune in tomorrow for a follow-up post on Benchmarking and Refactoring.

2 comments:

funny_falcon said...

Hey, in fact more fair bench would be:

require 'benchmark'
n = 5_000_000
Benchmark.bm(15) do |x|
x.report("for loop:") { a=0; for i in 1..n; a = "1"; end }
x.report("times:") { a=0; n.times do ; a = "1"; end }
x.report("upto:") { a=0; 1.upto(n) do ; a = "1"; end }
end

unless it is Ruby1.9

web design company said...

Very Nice post