Saturday Sep 22, 2007

Kinda hard to argue with our good deeds?

Very cool.
From RailsConf Europe, by none other then Mr. Hansson - http://www.loudthinking.com/posts/11-sun-surprises-at-railsconf-europe-2007

Friday Aug 24, 2007

Control-C Control-V? Dont! - aka "Thinking in Ruby . . ."

Code reuse doesn't mean copy-paste, of course. But there are times when you do it because you have always done things that way – and thus don't use better tools when they are available. Habit is a dangerous thing.

I had to write a program to concurrently load data into a file store(MogileFS, if you were curious). The program had to load user data, and event calendar data.

The number of processes that my loader could spawn, to upload this data into this network file system, was the constraint.

The algorithm for the loader is straight forward:

  1. -> start with M users, N events. C is the number of processes that you can optimally create to load data(hardware constraints).

  2. -> iterate C times, select M/C users with each iteration, and fork/exec a process to load the M/C users into the store.

  3. -> wait for these processes to exit(since your concurrency should not exceed C)

  4. -> iterate C times, select N/C events with each iteration, and fork/exec a process to load the N/C events into the store.

That sounds trivial enough - I want to use Ruby, and here is how the program might look(I'm leaving out irrelevant details of loading into my data store for brevity):

bash-3.00# more loader.rb
concurrency = 4
num_users = 40
num_events = 80

puts "Adding Users. . ."
count = 0
concurrency.times do
  count_new = count + num_users/concurrency - 1
  exec "/usr/bin/echo Users #{count} #{count_new} > /dev/null" if fork.nil?
  puts "range is #{count} to #{count_new}"
  count = count_new + 1
end

(1..concurrency).each do |i|
  puts "Adding users - Process id: #{Process.wait} finished. #{i}/#{concurrency} complete."
end

puts "Adding Events. . ."
count = 0
concurrency.times do
  count_new = count + num_events/concurrency - 1
  exec "/usr/bin/echo Events #{count} #{count_new} > /dev/null" if fork.nil?
  puts "range is #{count} to #{count_new}"
  count = count_new + 1
end

(1..concurrency).each do |i|
  puts "Adding events - Process id: #{Process.wait} finished. #{i}/#{concurrency} complete."
end

bash-3.00#

That's pretty cool, and we're happy with the way we use the iterators that Ruby provides. This saves us some boiler plate code. And we like not having to use the $ prefix.

But wait, there is another Rubyism that could have worked it's way into this code - the code in the 'yellow' block and the code the 'blue' block look very similar, except for a “puts Adding Users . . .” statement that becomes “puts Adding Events. . .” in the second block, and the reference to num_users in the first block which becomes num_events in the second block.

Indeed, this is because of the rather instinctive ControlC-ControlV that went with the creation of the second block. But luckily, we spotted that instinct to stick to old ways. Ruby helps minimise copy-paste operations, it's code block feature comes to the rescue! The way to remove the copy-paste operation is to recognise that the copied block does exactly the same operation, but on events, rather than users. Ruby Hashes and code blocks come to the rescue, and here is what a more Ruby-friendly implementation looks like:


bash-3.00# more loader.rb
concurrency = 4
num_users = 40
num_events =
80

sets_to_load_into_mogile =
 {
 "Users" => num_users,
 "Events" => num_events
 }

sets_to_load_into_mogile.keys.each { |set|
 #puts "key is #{set} and value is #{sets_to_load_into_mogile[set]}"

puts "Adding #{set}. . ."
count = 0
concurrency.times do
  count_new = count + sets_to_load_into_mogile[set]/concurrency - 1
  exec "/usr/bin/echo  #{set} #{count} #{count_new} > /dev/null" if fork.nil?
  puts "range is #{count} to #{count_new}"
  count = count_new + 1
end

(1..concurrency).each do |i|
  puts "Adding #{set} - Process id: #{Process.wait} finished. #{i}/#{concurrency} complete."
end
}
bash-3.00#

The code in yellow is the hash that defines what kinds of sets need to be loaded(ie., Users, and Events). The values define the cardinality of such sets.

The sets_to_load_into_mogile.keys variable is an array of keys in the hash. We iterate over these, and run the code block as many times as needed with the right customizations.

Quite simple. Once we think in Ruby. And in this case that means looking out for the tendency to use Control-C/Control-V where Ruby offers such a strong alternative.

About

prashant

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today