Jul 21 2009

Some random libraries

Random programming libraries that I have discovered in trying to solve some problems lately…

C++
Acedia: acedia is designed to be an easy to use C++ library. It provides an ERLANG like actor implementation with message passing and pattern matching. It’s main goal is to support development of distributed software – also across a network.
POCO: The POCO C++ Libraries (POCO stands for POrtable COmponents) are open source C++ class libraries that simplify and accelerate the development of network-centric, portable applications in C++.

Ruby
Sinatra: Embeddable website DSL
Nanite: Nanite is a new way of thinking about building cloud ready web applications. Having a scalable message queueing back-end with all the discovery and dynamic load based dispatch that Nanite has is a very scalable way to construct web application back-ends.
Journeta: Journeta is a dirt simple library for peer discovery and message passing between Ruby applications on a LAN.

That’s all for now. I’ve been working on some interesting stuff, developing a client registration and authentication system. I started out in C++ using boost::ASIO to handle server connections, but then I just got annoyed, switched to Ruby, and have been running along ever since. I am using Revactor (with a few personal tweaks in the code) to handle my components, SQLite3 for a simple database (though, I might have to switch to MySQL for production), and Sinatra for a web-interface (running on Thin with Shotgun). All in all, I must say that simply plugging in working components instead of having to develop the resources makes production so much easier… As always, hardware is cheaper than programmer hours.

Speaking of which, why did I not notice Amazon’s Auto Scaling and Load Balancing features before? If you don’t have the hardware resources to build your own cloud (Eucalyptus with Xen), Amazon really does seem like a very cheap alternative. I mean, think of all the money you save not having to hire I.T. guys to figure out why your harddrive failed in the middle of the night. Amazon does it all for you!

Hopefully, some sort of Cloud Computing standard will come out soon. The only current problem with Amazon’s model is that it doesn’t give the clients any leverage. Start using Amazon and you can’t make the threat to leave and go to Google (well, easily at least). Worse yet, if Amazon goes out of business, you are in a world of hurt. Eucalyptus adopted the Amazon scheme, so at least there is a bit of leverage saying that you will just buy your own hardware — but it would be much better if we could get some pricing competition going on.

  • Share/Bookmark

Jun 15 2009

Ruby and CUDA and Thrust, Oh My!

Thought I would have a bit of fun to see if I could get CUDA and Thrust working with Ruby. Since no CUDA bindings exist, and I didn’t want to give up the power of Thrust, I decided to emply RubyInline to create wrappers. Got it a nice little demo working in under 30 minutes.

vectors.cu

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
 
#include <iostream>
 
#include "vector_test.h"
 
extern "C" int vector_test() {
   thrust::host_vector H(4);
   H[0] = 14;
   H[1] = 20;
   H[2] = 38;
   H[3] = 46;
 
   std::cout << "H has size "<< H.size() << std::endl;
 
   for(int i = 0; i < H.size(); i++)
      std::cout << "H[" << i << "] = " << H[i] << std::endl;
 
   H.resize(2);
 
   std::cout << "H now has size " << H.size() << std::endl;
 
   thrust::device_vector D = H;
 
   D[0] = 99;
   D[1] = 88;
 
   for(int i = 0; i < D.size(); i++)
      std::cout << "D[" << i << "] = " << D[i] << std::endl;
 
   return 0;
}

vector_test.h

extern "C" int vector_test();

Compile a shared library: nvcc –shared -o libvectors.so vectors.cu -I/usr/local/cuda/include -L/usr/local/cuda/lib -lcudart

cuda_inline.rb

require 'rubygems'
require 'inline'
 
class MyWrapper
   inline(:C) do |builder|
      builder.include '"vector_test.h"'
      builder.add_compile_flags '-x c++', '-lstdc++', '-L/usr/local/cuda/lib', '-L.', '-lvectors', '-I.'
      builder.c '
         void wrapper() {
            vector_test();
         }'
   end
end
t = MyWrapper.new()
t.wrapper

Note here that I have to specify for RubyInline to look in the current directory for headers and libraries!

So, as long as libvectors.so and vector_test.h is in the directory you are running cuda_inline.rb from, everything is gravy! Maybe not the most efficient tool-chain, but it works!

$ ruby cuda_inline.rb
ld warning: in ./libvectors.so, file is not of required architecture
H has size 4
H[0] = 14
H[1] = 20
H[2] = 38
H[3] = 46
H now has size 2
D[0] = 99
D[1] = 88

  • Share/Bookmark

Feb 24 2009

Ruby’s Eigenclass and Metaclass

If you are like me, you have probably dabbled in Ruby meta-programming, but never really found a place to use it, so never got very far. While we all love _why’s unique writing, his article on meta-programming was a bit opaque, and less than enlightening (let’s be honest — sometimes we just need a dry, straightforward technical reference).

Enter these two gems. ‘The Singleton Class’ and ‘The Metaclass’, written by Patrick Farley. Genius stuff.

  • Share/Bookmark

Feb 12 2009

Rails Voodoo: How does before_filter work?

Sometimes I am working with Rails and just take things for granted. Like before_filter and after_filter. For those who don’t use rails, basically, before_filter and after_filter are called before and after actions are processed. So instead of having to look-up my user from the database in every single action, I can create a before_filter that handles it for me, making the user available in all my actions.

Now what is cool about this is that actions have no special creation system. You just create a new class method, and *bam*, you have a new action for your controller. In that case, what sort of voodoo is going on to prepend and append the filters to our methods? Some sort of ‘alias’ trickery? But that won’t work unless the filters are evaluated after the methods, and that certainly isn’t true for our filters…

Let’s see how deep the rabbit hole goes. Looking at ActionController::Base, we find the perform_action method, which, as you can imagine, performs an action. After extracting the action name and determining whether it is a public action, it tries to ‘send’ the action. Herm, no reference to filters anywhere here.

Next, let’s look at ActionController::Filter. Here, we find a base Filter type, which is a ActiveSupport::Callbacks::Callback. Might as well take a look at that… and all we find is a typical callback object. Well, that got us nowhere. Looking back at the filters source, we can see that when the module is included, it stuffs a whole mess of methods in the base. And that is where the gold is. Look down at the InstanceMethods module. See right there, squeezed in the self.included(base), we find it. The meat and potatoes. alias_method_chain. It seems to be aliasing process and perform_action — the heart of ActionController. But what the hell is alias_method_chain? A bit of poking and prodding and googling, and we find this nice little article which explains to us that alias_method_chain :perform_action, :filters really ends up as alias_method :perform_action_with_no_filters, :perform_action and alias_method :perform_action, :perform_action_with_filters. Likewise for process. Right below, what do we find?

protected
   def process_with_filters(request, response, method = :perform_action, *arguments) #:nodoc:
      @before_filter_chain_aborted = false
      process_without_filters(request, response, method, *arguments)
   end
 
   def perform_action_with_filters
      call_filters(self.class.filter_chain, 0, 0)
   end
 
private
   def call_filters(chain, index, nesting)
      index = run_before_filters(chain, index, nesting)
      aborted = @before_filter_chain_aborted
      perform_action_without_filters unless performed? || aborted
      return index if nesting != 0 || aborted
      run_after_filters(chain, index)
   end

And there we go. A little inheritance and alias magic, and the case is closed.

  • Share/Bookmark

Feb 3 2009

Object Paradigm versus Procedural Paradigm

Why can’t we just be friends?

Clojure seems to be the buzz language of the moment, fighting for the esoteric language spotlight with Scala, Arc, Groovy, et al. Poor Erlang, it was sooooo 2008.

Clojure is a bit of a conundrum. Sitting upon the JVM, the very organs that the ‘everything is an object’ Java rests upon, Clojure is not object oriented. Except when it is, because it talks to Java libraries and what-not. While Java left a horrible taste in my mouth, I fully enjoy Ruby’s ‘everything is an object’ approach. Want to create a method that allows me to change the number 2 into a date offset from 4004 B.C.? Simply crack open that Fixnum class and add a new method. Now we can call 2.to_date and it works. Truly object oriented.

But does this always make sense? Not quite. This is the odd balance you can sometimes find in C++ libraries. This struck me while I was playing in Matlab. You see, the original purpose of an object was to encapsulate state and behavior. A dog barks. A matrix, on the other hand, does not add. Matrices are added. How are we supposed to capture this? It isn’t a behavior of the matrix itself, but it does use the state of the matrix. Matrix addition is a more … platonic concept. Nevertheless, how many times have we seen m.dot_product(m2)? So then I began to question: “Does 2 convert itself to date? Does that truly represent a behavior of 2? Or any Fixnum in general?” Not quite. So, in our Java ‘everything is an object’ world, should we have a ‘converter’ class? Does that make any sense? When was the last time you picked up your trusty ‘converter’ and used it? I’ll tell you when: never. Rather, you just did it in your head. Or on paper. Either way, it was just a calculation — there was no behavior that belonged to any object. It just … was.

Yet without objects, we can run into a brick wall. 1+1=2. [1 2]+[2 3]=[3 5]. Humans recognize which definition of + to use, but our compiler won’t. Well, unless we decide to give up dynamic typing and go fully static. Then we can. Otherwise, we get all sort of namespace collisions. Do we use the fixnum + or the matrix +? If we go static, what sort of losses do we take? Cool things like Ruby’s ActiveRecord library all of a sudden doesn’t seem to work. Poor method_missing just won’t work when we don’t know what form the arguments will arrive in.

So what can we do? C++ does it through some ‘exceptional’ cases, allowing you to define operators in the general namespace (or something like that; I don’t really know the inner specifics … I have better things to do than read the standard). So it allows us to do Matrix + Matrix instead of Matrix.add(Matrix). Nice. But what if we have a dynamic language? Well, then we might be in some trouble: we can’t tell which method to use.

Or, better, what if we have a hybrid language? Using meta-data, we can determine the type of arguments, and then map them to the appropriate function. Now if we simply tell the interpreter/compiler which method corresponds to which types of arguments, we are golden. Except for that whole ‘more work at run-time’ thing. Unfortunately, the only way to give the compiler any sort of predictive ability is to make the language static.

So now I can define a class for a dog, and implement a method ‘bark’ that truly defines the behavior of a dog. On the other hand, we can now define an (infix) + operator in the ‘platonic’ namespace that allows us to define the behavior of adding two matrices. That makes a lot more sense logically.

  • Share/Bookmark

Jan 26 2009

Rails, EC2, and backgroundrb

This gem of an article has been out for quite a while now, and really hits the nail on the head of showing how easy using backgroundrb is.  Or, at least, was.  You see, backgroundrb has gone through some updates and the article is a bit out of date.

So I have spent the last day or two trying to get my locally hosted rails application to talk to a locally hosted backgroundrb client.  After hours and hours of banging my head against the keyboard, gnufied in #backgroundrb (freenet) that I was doing absolutely nothing wrong, but rather it was a known bug with the Mac OS X installation.  Hurrah.

So I decided to test out the whole shebang on EC2 to see if it would work.  And it did.  Now some of you might be wondering how I got such a wonderful little process up and running, so I thought I would share.  Some of you might be able to follow along at home without using EC2.

PLEASE NOTE THAT THESE STEPS ARE FOR TESTING PURPOSES ONLY!  PLEASE ENSURE THAT FOR LIVE DISTRIBUTIONS, YOU EMPLOY CORRECT SECURITY PRACTICES (i.e. don’t be in development mode and put a password on your database, et cetera).

For testing, I chose the basic ruby AMI on EC2.  I then installed rails, chronic, packet, and the mysql gem (–with-mysql-config=/usr/bin/mysql_config).  After, I installed svn (yum install svn).  This gave me all the tools I needed to get up and running.

Next, I created a new rails project (‘rails ec2_client -d mysql’).  You do this because backgroundrb exists within the context of a rails project, even if run in stand-alone.  I navigated my way into vendor/plugins and downloaded backgroundrb (svn co http://svn.devjavu.com/backgroundrb/trunk).  After renaming trunk to backgroundrb (mv trunk backgroundrb), I navigated back to the root of the project directory (cd ../..).

Next, I loaded up mysql and created a development database (‘create database ec2_client_development;’).  After exiting mysql, I then installed the backgroundrb plugin (rake backgroundrb:setup) and migrated the database (rake db:migrate).  Next, I created my worker (script/generate worker some_model).  I opened up the generated worker file (vi lib/workers/some_model_worker.rb) and gave him a name (set_worker_name :some_model.  Please see here for more info).

I then decided to give my worker a little functionality (def ping; cache[job_key] = “pong”; end;).  After, I had to find out my host-name, so I followed the instructions in the tutorial listed above.

wget -q -O /tmp/public-ip http://169.254.169.254/latest/meta-data/public-ipv4
wget -q -O /tmp/public-hostname http://169.254.169.254/latest/meta-data/public-hostname
hostname -F /tmp/public-hostname
echo $(hostname) > /etc/hostname
$(hostname)

Simple enough.  Next, I loaded up my backgroundrb server (script/backgroundrb start -e development -h $(hostname)).  Done and done.  Remember that $(hostname) value!

Before I forget, make sure you have enabled the permissions for your ec2 instances to be allowed to access the port backgroundrb is running in (ec2-authorize default -p 11006).

Now, for your local rails project to chatter with the newly created server is pretty easy.  Here is a sample application I used.

class TestController < ApplicationController
   def index
      host_ip = "ec2-67-202-7-84.compute-1.amazonaws.com"
      port = 11006
      worker = MiddleMan.worker(:some_model)
      result = worker.ping(:host => "#{host_ip}:#{port}", :job_key=> "test")
      render :text => result
   end
end

Nice and simple.  Everything works out gravy.  NB: The ‘host_ip’ here should be the same as the $(hostname) value printed in the EC2 terminal above.

Oh, except it doesn’t.  What is this?  Some sort of error connecting to the server?  What is it doing trying to connect to 0.0.0.0…

You see, when you also installed backgroundrb in your RAILS application (so you could use MiddleMan, remember), it tried to create its own backgroundrb server.  So when the rails application loads, it opens its own backgroundrb config file and takes the server there.  So, a simple alteration to vendor/plugins/backgroundrb/lib/backgroundrb/bdrb_cluster_connection.rb had me comment out ‘establish_connections’ in initialize.

I also wanted my backgroundrb servers to scale dynamically with PoolParty! (as I have mentioned in previous posts).  This was a problem, since backgroundrb only lets me define my servers statically in the configuration file.

Have no fear!  A simple alteration to find_connection in the same file had me off and running:

def find_connection host_info
   conn = @backend_connections.detect { |x| x.server_info == host_info }
   if !conn
      klass = Struct.new(:ip,:port)
      ip = host_info.split(':')[0]
      port = host_info.split(':')[1].to_i
      @bdrb_servers << klass.new(ip,port)
      conn = Connection.new(ip,port,self)
      raise NoServerAvailable.new("BackgrounDRb server is not found running on #{host_info}") unless conn
      @backend_connections << conn
   end
   return conn
end

Now, if it doesn’t find the connection, it creates it (unless it can’t connect, of course).

Now I simply created a Scale controller, which allows backgroundrb servers to register themselves through a simple HTTP::GET (all done at AMI initialization before backgroundrb is started) that logs the host-name.

Now, I haven’t fully figured out how to keep state between MiddleMan’s backend_connections and the database table (whether I actually need the database table yet, I am not 100% sure of) — but getting the ping/pong connection working was a nice little pick-up after several hours of dejection.

Hope someone finds this useful.

  • Share/Bookmark

Jan 21 2009

Easy does it…

I am working on a project that requires some dynamic scaling.  The goal was to have Rails farm out complex processes to Amazon EC2 shards via backgroundrb, and have PoolParty! manage the dynamic scaling.   All good, and what-not, but how was I going to get the new EC2 shards, dynamically loaded by PoolParty!, to get recognized by my web-server?  My first thought was Rinda — specifically, RingyDingy (saving me time on managing connections), but I couldn’t think of a good way of getting the EC2 clients to find the Ring server on my webserver.  Furthermore, how would Rails talk with the Ring server?  Even if I switched out RingyDingy and used Rinda::RingFinger to specify my webserver for the EC2 clients, how would Rails connect?  My first thought was to create a third server that was a backgroundrb server / ring server, and allow Rails to request clients from it.

Then someone on #ruby-lang said I should just have the EC2 clients register and de-register themselves via HTTP GET, and just manage them via a table in my database.

As always, K.I.S.S.

  • Share/Bookmark

Jan 15 2009

Aren’t exceptions supposed to make things safer?

Weren’t things easier in the days of error codes?  Then along came exceptions and took errors to a whole new level.  Unfortunately, they also changed the rules of flow control, and somehow made writing ‘safe’ code more difficult.

Cleaning up after an error, especially a fatal error, is very important.  Exceptions, ironically, can make this an even more difficult task, as this article astutely points out.  For those of you too lazy to make the jump, basically imagine a situation where you call several functions in a row that may throw exceptions, each of which requires a unique clean-up.  To handle such a situation, you would require nested try/catch statements, which is just ugly.

Or, better, we can use call-backs to restore state to pre-exception status.  In the article, this was achieved by using a ScopeGuard object, which held a function pointer that would be called on destruction — but only in the case where the object wasn’t ‘dismissed’ first.  If the object is dismissed, it need not be called.

Being a rubyist, I immediately tried to port this concept over to Ruby.  At face, it doesn’t seem quite as useful at first: the ‘Ruby’ way typically has dangerous objects monitor and clean up after themselves using yield statements.  For example, when we open a file, we typically pass a block to be executed with the open file — but only in the case that the file is successfully opened.

On the other hand, imagine the case where we are trying to synchronize writes between two resources.  Perhaps we are using memory as a cache, but also storing to disk.  Let us also assume that failure to write to either resource ends up in an exception, forcing the other resource to roll-back to its previous state.  Handling the nesting of exceptions could get ugly.  Enter scope_guard.

Instead of creating a class like the above article does, I made my scope guard a function that takes an object, function name, and parameters.  If a block is given, the function yields, wrapping the statement in a begin/rescue block.  In the case of an exception, our ‘undo’ function is performed on the object.  Ruby makes it all simple and easy to do.

Imagine usage:

def write(cache, disk, data)
   cache.write(data)
   scope_guard(cache, :roll_back) {
      disk.write(data)
   }
end

Now you can see that if the disk fails to write the data, the scope-guard ensures that the cache is rolled-back.  In the other case, where the cache fails to write, the exception raises before the data is written to disk, so state is maintained.

The system isn’t without flaws, however.  It falls short in one of the simplest of cases: an integer counter.  If I am trying to maintain a count of objects I write to disk, I have to use a wrapper around the basic integer object, because operations on Fixnum return a new object instead of altering the object in question.  Not the end of the world, but not very clean either.  I am currently trying to wrap my head around possible elegant solutions for this.

  • Share/Bookmark