Ruby enumerators are slow... and wonderful
Ruby Enumerators are a really cool abstraction. They can be used in many ways. I particularly love that, in the standard library, most methods that take a block to iterate over something will return an enumerator when invoked without it. For instance, Enumerable#map
or CSV.for_each
use this convention.
I hadn’t found a practical use case for them until I needed to process several CSV files advancing through them at the same time. I could grab an enumerator for each CSV file and make them advance as I wanted. It worked like a charm, but I noticed it was very slow. I decided to profile and compare them with their internal iterator counterpart:
require 'benchmark/ips'
COUNT = 500000
Benchmark.ips do |x|
x.report('Enumerable') do
total = 0
COUNT.times do |i|
total += i
end
end
x.report('Enumerator') do
total = 0
enumerator = COUNT.times
while true
begin
total += enumerator.next
rescue StopIteration
break
end
end
end
end
Results (Ruby 2.4.0):
Enumerable 37.073 (± 5.4%) i/s - 186.000 in 5.030412s
Enumerator 1.588 (± 0.0%) i/s - 8.000 in 5.040230s
As you see, Enumerators are much slower that the corresponding internal iterator. I can’t use them in my case since CSV processing speed was key to the global performance of the system, but I still love them.