Jun 16 2010

Weighted Random Array

I had an array that I was trying to do weighted random pulls from. There are a few basic algorithms for this online, but my issue was that they were not efficient for very, very large tables — most were O(n) for each time you wanted to pull an element. So I wrote my own wrapper class that would handle the task for me using Alias Tables.

class WeightedRandomArray < Array
      def initialize(other_array, weights)
        total_weights = weights.inject(0.0) { |t,e| t+e }
        proportions = weights.map { |e| e / total_weights }
 
        elements = other_array.zip(proportions)
 
        # construct an alias table for faster access
        @table = []
 
        n = elements.size
        elements.map! { |a,w| [a, w*(n-1)] }
        elements.sort! { |e1,e2| e1[1] <=> e2[1] }
 
        while elements.size > 2
          p = elements[0][1]
          elements[-1][1] -= (1.0 - p)
          @table << [p, elements[0][0], elements[-1][0]]
          elements.delete_at(0)
          elements.sort! { |a,b| a[1] <=> b[1] }
        end
 
        p = elements[0][1]
        elements[-1][1] -= p
        @table << [p, elements[0][0], elements[-1][0]]
      end
 
      def random_element
        entry = @table[(Kernel.rand * @table.size).floor]
        if Kernel.rand < entry[0]
          return entry[1]
        else
          return entry[2]
        end
      end
    end

So now the work is all up-front — and while it is considerable, so long as you will be doing enough random draws, the O(1) time to get a random entry should trump…

  • Share/Bookmark

Jun 8 2010

Passing around Ruby blocks

I have some code that I am bundling together that required me to tackle a rather strange problem. Basically, I had one function that took a block, and another function that wrapped around that function. It looks something like this:

def inner(*args)
   yield args[0], args[1]*args[2]
end
 
def wrapper
   inner(3,4,5)
end

The question is, how do I pass a block to wrapper and have it get passed to inner? Google wasn’t much help here. My solution looks like:

def call_block
   yield 4, 5, 6
end
 
def wrap_block(&blk)
   call_block { |*args| blk.call(*args) }
end
 
wrap_block { |x,y,z| puts x+y+z }

Would love to see a smarter solution…

  • Share/Bookmark

May 19 2010

Immutable Ruby?

Got bored and hacked around.  Found this sort of interesting…
class Object
  def self.new( *args, &blk )
    o = allocate
    o.instance_eval{initialize( *args, &blk )}
    o.freeze
    o
  end
end
Forcing objects to be frozen after initialization?  Almost sounds … functional.
  • Share/Bookmark

Jan 29 2010

Random thoughts of the day

I always bounce through a fairly regular pattern when it comes to programming.  It goes a little something like: prototype an idea with Ruby code.  Find out it is slow.  Get frustrated.  Start writing code in C++.  Take a lot longer, but be impressed by how the strict type system tends to force me to write better programs.  Get frustrated with code verbosity.  Start looking for something else to program in that mixes the expressiveness of Ruby with the speed of C++ (one day I will realize that these two are probably mutually exclusive).  Decide that a more functional language will suit my needs.  OCaml?  Erlang?  Maybe something on the JVM!  Scala?  Clojure?  Get frustrated with build system (I don’t care what you say, Maven is just stupid) or lack of libraries.  Go back with my tail between my legs to Ruby.  Start looking at all the Ruby options: MRI, MacRuby, JRuby …

But wait!  How is it that I have somehow always overlooked Rubinius?  JIT compilation on LLVM?  Future multi-VM implementation (hello sandbox!)?  Holy cow!  That is AWESOME.  Rubinius is definitely my VM of choice from now on.

P.S. Totally unrelated, but check out Ruby Processing — it is pretty damn cool.

  • Share/Bookmark

Dec 3 2009

Nerd Link of the Day: Create your own language with Ruby, TreeTop, and LLVM-Ruby

Orange: A great little blog entry about creating your own language using the grammar description gem TreeTop and llvmruby, a gem that gives you access to all that llvm goodness (JIT or compile to bytecode).

  • Share/Bookmark

Dec 2 2009

Nerd Link(s) of the Day

ArrayFields: Tired of indexing with numbers? Index your arrays in a better way.

Main: Quit re-writing your main program structure for all those flags and options. Use this gem to do the heavy lifting.

rq: rq is a great little gem for clustering Linux boxes via a job system running on an sqlite database on an NFS. Check out a nice little article on it here.

  • Share/Bookmark

Aug 17 2009

Does the Language make the Programmer?

I have always subscribed to the Sapir-Whorf hypothesis (a.k.a. Linguistic Relativity), especially in its applications to programming. It is my firm belief that programming languages directly influence how programmers solve problems. The verbosity and robustness of any solution are often directly dependent of the paradigms that the language attempts to empower.

As a simple example, one can examine the solutions that Ruby programmers come up with when given the power of meta-programming. Some are clean and extremely beneficial to the end user. If we look at ActiveRecord, we find that all sorts of instance and class level methods and values are inherited dynamically, by looking up values of columns in a database table! All transparent to the user.

While some solutions provided by Ruby’s dynamic and meta-programming ways are extremely elegant, other times it just leads to a heap of spaghetti code. Speaking from experience, with the lack of type constraints built in, open classes, and meta-programming abilities, I often find that I get objects in some places that I don’t expect. Twitter had the same problem. They blamed Ruby. Others blamed bad programmers.

I blame both. Personally, I know that my ‘programming hygiene’ has greatly suffered in my ‘Time of Ruby.’ While it has allowed me to code with blazing speed and agility, I find that my code smells quite a bit worse than it used to. It was great for writing one-off programs, but my code readability and reusability has faltered greatly. Quite simply, in the absence of static languages, or even functional languages, I failed to shoulder the responsibility of ensuring that my programs were well constructed and well tested.

Having picked up Scala lately, I have found the switch back to a static, semi-functional language refreshing. In fact, I feel almost the same way I did when I switched to Ruby. Now, writing simple one-off programs has taken me more time — but it also tends to lead me to writing reusable libraries. I tend to break up my code more naturally, instead of hacking together solutions that work at the time. Instead of just restarting a project, I find myself refactoring.

So that brings us back to the Twitter case. I know it is old news, but I think it is a good case study. Twitter chose Ruby because it allowed them to rapidly develop their product. But when it came to scalability, they found that it just didn’t suit their needs. That isn’t to say that Ruby couldn’t suit their needs — they just didn’t find it to be the appropriate solution. Certainly, I know I have trouble maintaining a large Ruby code base (except for maybe a well constructed Rails project). On the other hand, most of the larger Scala projects I have written have naturally fallen into proper software engineering constructs on their own. I put that on the language.

You see, I know I am a bad programmer. Not in the sense that I can’t write efficient algorithms, or design systems well. I can. But under the time constraints of business projects, I don’t. And Ruby let me do that better than any other language I have used. So I guess I will use Scala for now, not just for my own sake, but for the sake of everyone who has to read me code one day.

  • Share/Bookmark

Aug 4 2009

Curtain Call for Ruby

Unless you have been living under a rock, or don’t program, you know that the software world has been abuzz about the end of “free lunch” — i.e. we can no longer just toss “more horsepower” at our problems, in a traditional manner, and expect a speed-up to occur. Instead, we have to go parallel. One of the most successful ways in doing this is by using the Actor pattern, exemplified in Erlang, which has been known to have 99.99999% up-times in real-world applications.

Being a fan of Ruby, I wanted to see if anyone had implemented the Actor pattern yet. It wasn’t a hard concept — you just provide the class with a mail-box and a way to check and parse messages. The issue falls in Ruby’s threads — namely, they are big and bulky. To have lots of actors, we need a light-weight implementation. What is worse is that Ruby suffers the same problem as Python when it comes to concurrency — the global interpreter lock (GIL) prevents the virtual machine from executing more than one instruction at a time. So even though you might be running separate OS threads on multiple cores, the speed is limited by a single core. So basically, using 1.8, the Actor pattern is a waste of time.

The stable release of Ruby 1.9 seems to tackle the issue head on, by providing Fibers. With 1.9, I was able to find the (now defunct) revactor gem, which seemed like a fairly good (though poorly documented) implementation of the Actor pattern. I also found journeta (which works on 1.8), which was a system that performed network auto-discovery. With the power of these two combined, and some horrible spaghetti code, I whipped together a little gem called ‘curtaincall’, which allows end-users to spawn actors on other machines.

Unfortunately, not everything turned out as well as I would have liked. Revactor seems to break down … or at least, its dependency Rev does, in signaling the mailbox that it has a new message. I can’t quite seem to figure out why this occurs, so as it stands, curtaincall actors never end up receiving the messages passed to them. Furthermore, the method I used of identifying actors is horrible — though it gets the job done. Finally, Erlang niceties like ‘spawn_link’ are poorly implemented and untested.

Ultimately, once I hit the road-block with Revactor, I just gave up. Revactor is undocumented and discontinued. I just wanted to see if I could do this for fun.

The pros? Journeta is a delight to work with. Easy to figure out from the examples provided, and it works just as promised out of the box.

Cons? Just about everything else.

curtaincall-0.0.1

  • Share/Bookmark

Jul 21 2009

Some random libraries

Random programming libraries that I have discovered in trying to solve some problems lately…

C++
Acedia: acedia is designed to be an easy to use C++ library. It provides an ERLANG like actor implementation with message passing and pattern matching. It’s main goal is to support development of distributed software – also across a network.
POCO: The POCO C++ Libraries (POCO stands for POrtable COmponents) are open source C++ class libraries that simplify and accelerate the development of network-centric, portable applications in C++.

Ruby
Sinatra: Embeddable website DSL
Nanite: Nanite is a new way of thinking about building cloud ready web applications. Having a scalable message queueing back-end with all the discovery and dynamic load based dispatch that Nanite has is a very scalable way to construct web application back-ends.
Journeta: Journeta is a dirt simple library for peer discovery and message passing between Ruby applications on a LAN.

That’s all for now. I’ve been working on some interesting stuff, developing a client registration and authentication system. I started out in C++ using boost::ASIO to handle server connections, but then I just got annoyed, switched to Ruby, and have been running along ever since. I am using Revactor (with a few personal tweaks in the code) to handle my components, SQLite3 for a simple database (though, I might have to switch to MySQL for production), and Sinatra for a web-interface (running on Thin with Shotgun). All in all, I must say that simply plugging in working components instead of having to develop the resources makes production so much easier… As always, hardware is cheaper than programmer hours.

Speaking of which, why did I not notice Amazon’s Auto Scaling and Load Balancing features before? If you don’t have the hardware resources to build your own cloud (Eucalyptus with Xen), Amazon really does seem like a very cheap alternative. I mean, think of all the money you save not having to hire I.T. guys to figure out why your harddrive failed in the middle of the night. Amazon does it all for you!

Hopefully, some sort of Cloud Computing standard will come out soon. The only current problem with Amazon’s model is that it doesn’t give the clients any leverage. Start using Amazon and you can’t make the threat to leave and go to Google (well, easily at least). Worse yet, if Amazon goes out of business, you are in a world of hurt. Eucalyptus adopted the Amazon scheme, so at least there is a bit of leverage saying that you will just buy your own hardware — but it would be much better if we could get some pricing competition going on.

  • Share/Bookmark

Jun 15 2009

Ruby and CUDA and Thrust, Oh My!

Thought I would have a bit of fun to see if I could get CUDA and Thrust working with Ruby. Since no CUDA bindings exist, and I didn’t want to give up the power of Thrust, I decided to emply RubyInline to create wrappers. Got it a nice little demo working in under 30 minutes.

vectors.cu

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
 
#include <iostream>
 
#include "vector_test.h"
 
extern "C" int vector_test() {
   thrust::host_vector H(4);
   H[0] = 14;
   H[1] = 20;
   H[2] = 38;
   H[3] = 46;
 
   std::cout << "H has size "<< H.size() << std::endl;
 
   for(int i = 0; i < H.size(); i++)
      std::cout << "H[" << i << "] = " << H[i] << std::endl;
 
   H.resize(2);
 
   std::cout << "H now has size " << H.size() << std::endl;
 
   thrust::device_vector D = H;
 
   D[0] = 99;
   D[1] = 88;
 
   for(int i = 0; i < D.size(); i++)
      std::cout << "D[" << i << "] = " << D[i] << std::endl;
 
   return 0;
}

vector_test.h

extern "C" int vector_test();

Compile a shared library: nvcc –shared -o libvectors.so vectors.cu -I/usr/local/cuda/include -L/usr/local/cuda/lib -lcudart

cuda_inline.rb

require 'rubygems'
require 'inline'
 
class MyWrapper
   inline(:C) do |builder|
      builder.include '"vector_test.h"'
      builder.add_compile_flags '-x c++', '-lstdc++', '-L/usr/local/cuda/lib', '-L.', '-lvectors', '-I.'
      builder.c '
         void wrapper() {
            vector_test();
         }'
   end
end
t = MyWrapper.new()
t.wrapper

Note here that I have to specify for RubyInline to look in the current directory for headers and libraries!

So, as long as libvectors.so and vector_test.h is in the directory you are running cuda_inline.rb from, everything is gravy! Maybe not the most efficient tool-chain, but it works!

$ ruby cuda_inline.rb
ld warning: in ./libvectors.so, file is not of required architecture
H has size 4
H[0] = 14
H[1] = 20
H[2] = 38
H[3] = 46
H now has size 2
D[0] = 99
D[1] = 88

  • Share/Bookmark