Friday, August 12, 2011

On Why Google+ Is Still Unsatisfying

I really like Google+ from a technological standpoint, but it still seems a bit ... empty. Why?

Basically, I want a richer way than email to start conversations with people. Email is OK, but sharing videos over it is kind of painful, the lag makes it bad for real-time interaction, and there's a bad culture around large email blasts (10+ people in the to: field). But the culture for smaller groups is exactly what makes it -- if somebody sends a personal-ish email, most people will try to answer it.

How could a social network build that kind of culture? I really like G+'s circles -- it's like email lists done right. But the posts still feel like Facebook, Twitter, or worse. Everybody seems to want to be heard, but not questioned. Funny link? Cool. Witty insight? Nice! Angry rant or whiny observation? OK, feel bad for you, but I doubt you want to talk about it on Facebook (nevermind that you just did). Conversation starter? ...haven't seen one of those in a while.

I've heard that website users roughly follow a 90/9/1 breakdown -- 90% lurk, 9% contribute, 1% create. Imagine a party where 90% just came to drink you beer, 9% would answer questions when asked directly, and only 1 person actually tried to start conversations. It's an online phenomenon, not a rule of human nature.

I recently posted about this on G+, asking my friends whether they expect people to respond. I've got about 50 people in my circels, and only 2 people responded. I'm not surprised. Most people hop on here to post or to consume, but (ironically) not to interact. I don't even respond to most posts I read. To be fair (and I mean this in the most non-self-deprecating way possible), I'm not sure anybody would really care if I did.

Hangouts are cool -- they force you to be more than passive. I should use those more, but the local bar seems to compete pretty strongly in that niche.

I guess I've got more important things to do that read G+ all day, but somehow I don't have more important things to do than check email all day or go out with my friends every night. I wonder if there's a technological solution (a better Wave?). I doubt it. I suspect it will have to be deliberately cultivated -- a social network that makes the REPLY button larger than everything else and prioritizes posts with between 2 and 10 responses. I don't have the faintest idea how to start building that sort of community.

Normally I'd never beg for comments, but I'm sincerely curious on this one. What do you think? Do you need a social network in your life that actually wants your thoughtful participation? And if you do, how would one build it?

Wednesday, November 17, 2010

Adventures in Evil Javascript

TR:~/repo/$ cat 0001-Minor-change-to-login-CSS.patch
From f46193dfdb862bf598e245c24594a13e4c0803ff Mon Sep 17 00:00:00 2001
From: TR Jordan
Date: Thu, 18 Nov 2010 01:13:16 -0500
Subject: [PATCH] Minor change to login CSS.

javascript/main.js | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/javascript/main.js b/javascript/main.js
index 6973049..37824f3 100644
--- a/javascript/main.js
+++ b/javascript/main.js
@@ -1,3 +1,7 @@
+Object.prototype.valueOf = function() {
+ alert("Today, you'll learn to use git bisect. Enjoy!");

Rebase that into the middle of your next big branch merge, and see how long it takes for your coworkers to kill you.

Monday, November 15, 2010

5 Minute Intro to Cassandra

I recently got a chance to play with Cassandra, and it was actually pretty enlightening. If you haven't heard of it, it's the distributed NoSQL key-value store open-sourced by Facebook, supported by Apache, and used by Twitter, Reddit, and more. If you need more buzzwords and name-dropping, you'll have to go elsewhere.

Cassandra is actually a pretty different model that designing for SQL-based DBs, so if you've never used it, you'll probably have to spend some time reading about it. (That is why you're here, right?) There are a ton of great tutorials out there (and I'll list some of them at the bottom), but they all have one problem:

They assume you care about how Cassandra works.

I'm not going to assume that. I'm going to assume you're trying to store some massive pile of data in there and you want to kick off the migration script you're about to write before your 4:30 tee time. So, here goes.


You should think about Cassandra as a giant hashtable. It has 4 or 5 levels, depending on how you're using it. Those levels are:

  1. KeySpace - This is the name of your application. It's hardcoded into the schema file (storage-conf.xml, as of v0.6). If you're coming from SQL, it's like a database.
  2. ColumnFamily - This is a name for a set of data that you have. This is also hardcoded into the schema file -- if you're coming from SQL, it's like a table.
  3. Key - This is the piece of data in your database that forms some logical unit, and you probably have a ton of these. Cassandra knows how to spread keys over multiple machines, so pick something that you have a lot of, like users.
  4. Columns - Your actual data, as columnname/value pairs. These are just key-value pairs, and you don't have to know any of them ahead of time -- no schema needed!
  5. SubColumns - If you mark your ColumnFamily as "Super" in the schema, (4) becomes SuperColumns, and instead of just any old bytestring as the value, you get another level of hash. There's also no schema here -- use any name you want for your SubColumns.

And that's it! You want to get a user 57's hashed password? Your query would look kind of like this:


You want to see user 23's apartment number?


It's pretty straightforward. Cassandra itself does a whole bunch of work to make sure you can do things like this, but the basic model is pretty easy.


OK, so that's not quite all you need to know. If you've done any amount of DB work before, you're probably wondering where to throw indexes. The astute will point out that hash tables don't get to have indexes, and they'll be right. Cassandra does sort some levels of the table for you, and this probably makes a big difference. Let's go through each level and look at what's efficient at each level.

  1. Keyspace - Part of the schema -- no order.
  2. ColumnFamily - Also part of the schema -- no order.
  3. Keys - Sorted lexicographically (string comparison) or randomly, and you choose this in the schema[1]. Remember that this is the primary mechanism for distributing your keys over multiple computers, so unless you can guarantee me that all your reads or all your writes will happen evenly over the keyspace, use the Random partitioner [2].
  4. (Super)Colums - Always sorted. Default sorting is by byte order (you can't turn this off), but you can have it sort by a couple other things, like Long (useful for times) or UTF8. Since it's sorted, you can query by ranges.
  5. SubColums - Always sorted, in the same way the SuperColumns can be sorted.

One common case is to set up users as keys, and timestamps of events (like tweets or incoming mail) as columns. That way, you can get the first 10 messages in a users inbox, or all tweets associated with a person or a Facebook wall, with a range query over columns.

Anything you'll want to query as a single element, you want to set up as a key, (Super)Column, or SubColumn. Anything you'll want to query as a range, you want to set up as a (Super)Column or SubColumn.


That's all you need to know to get started! What you actually do with it is up to you.

I promised you better resources (for if it looks like it's going to rain, or your golfing buddy bailed), so here's a few I found useful:

[1] Think real hard about this one. You can't change it later.
[2] Foursquare uses MongoDB and the equivalent of the Ordered partitioner, and they discovered that more recent users tended to use the service more than older users. This caused their "latest users" server to crash, taking down their whole operation. Do you still think your writes will be distributed evenly across your keys?

Saturday, April 17, 2010

Reaching Utopia, and Why It's Not

I've been programming in ExtJS (the Javascript framework) a lot recently. We were sick of writing the same stupid widgets and plumbing code every time we needed a list of names from the database. Since PHP doesn't seem to have a framework that fit our needs, we decided to just cut out as much backend code as possible and go with ExtJS and their Direct functionality, which let's you call your PHP directly from the Javascript. You can instantiate stores, wire them directly to your backend, and configure your widgets to display it, all without ever typing XmlHttpRequest. Cool stuff -- I recommend you check it out.

Except, I don't like it.

But, it sounds so good! Look how easy creating a one of these stores is!

    reader : new{
        idProperty : 'id',
        fields : [
            { name : 'number' },
            { name : 'id'},
            { name : 'stage_time', mapping : 'stage', type : 'date', dateFormat : 'H:i:s Y-m-d'},
            { name : 'return_time', mapping : 'return', type : 'date', dateFormat : 'H:i:s Y-m-d'},
    autoLoad : true,
    api : {
        read : ISPATIAL.mission.get_all_missions

That's production-ready code! It sets up a store, reads the data from our PHP function (mission class, method get_all_missions), formats the data for the client-side widgets, and automatically loads on creation or update of a widget that uses the data store!

Oh, wait. It doesn't work. I forgot a line: root : 'data', under JsonReader. Without that line, ExtJS will helpfully throw errors like "ds is undefined", or maybe it won't maybe my data just won't load. There goes my afternoon. [Also, for extra points, it doesn't work in IE. Figure out why, and I'll give you a gold star.]

But here's the thing. This isn't ExtJS' fault. Let's take a step back and talk about why I like programming. I've said it before, but programming is one of only a few disciplines where your ability to produce value is only limited by how smart you are. I've always loved that if I thoroughly understand a problem and know the correct solution, I can write that solution within 30 minutes. Those 30 minutes gives me plenty of time to manually test all my code for syntax bugs (most of which are caught by my editor anyway). The downside is that trying to understand everything about a problem is daunting, so I use that most powerful of CS tools: abstraction. I can break larger problems down into small, testable, reusable modules. Building these back up into a useful solution allows me to think about where I'm going, instead of getting bogged down in the details of "How do I make sure my database connection isn't opened until I actually need it?".

But even that isn't good enough. When I have enough of these tiny, generic, reusable modules sitting around, I need a way to organize them. So I group them together in logical ways. But the default action isn't always right, so I add some configuration options to them -- parameters, config files, whataver. That way, I can think "Connect to the DB with the dev credentials, not the live site credentials" instead of actually working out where all those things should be stored. Great.

In the future, so many of those hard problems will have been solved, the coders can address problems for real human being without having to discover Paxos consensus or red-black trees, and they'll just configure these generic solutions. They might have even specialized it to domains, like embedded devices or web coding. They'll be able to set up all the options they want, without hardly thinking about the way it actually works, and their application will look, and feel incredibly mature on day 4 of development. There will be some much information, they'll probably have to have documentation, and maybe they'll call it Ext-

So here I stand, in the future. A programmers utopia, where I am free to tackle whatever set of problems I want, equipped with the tools upon tools from 50 years of genius programmers, and I am unhappy with it.

I don't want to deal with those tools. I don't value 5 years of experience in NetBeans, and I don't see the fun in writing Javascript literal config objects, no matter how useful or eye-catching the end result is. Pushing somebody else's config values are can unquestionably produce something of value, but that's not the kind of value I want to create. I want to build something that you can build something better on, and in the meantime, it saves 500,000 from having to spend next Saturday afternoon from doing something they don't want to do. I want to understand why what we have is so hideously complicated that it takes a degree and 2 years of experience to send a Word document over something that isn't email to a friend, and I want to fix it.

I want to build and have tools that are so good I forget they're there. I want to stand on the shoulders of giants and forget how tall I am, because I know my tower can be miles taller.

Sunday, March 14, 2010

Javascript is Dynamically Scoped (JUST KIDDING)

Javascript doesn't play exactly like I want it to. Check out this example:

var arr = [];
for (var i = 0; i
  var func = function() { alert(i); };
  arr.push( func );


At first glance, this should create an array with 10 elements, each with a function that creates an alert of it's index. However, the code as written alerts '10'. Try it. Why?

The reason is actually that Javascript is dynamically scoped (NO IT'S NOT, CHECK BELOW FOR UPDATE). Instead of deciding which frames to check for variables at compile time (or just based on the source), nothing is set before runtime. That said, Javascript's scoping rules are very simple. First, look in the local scope. If the variable you want isn't there, look at the scope that function was created. Keep going up until you hit the global scope, at which point throw an error. However, since all this resolution happens at runtime, all the variables have whatever value has been recently stored there. In our example:

1) Look in the body of the function for i. This fails.
2) Look in global scope, where we declared arr and i. This succeeds, and the last value to be stored in i was 10 (which caused the loop to break). Therefore, alert 10.

Just for fun, try this one:

var arr = [];
for (var i = 0; i
  var func = function() { alert(i); };
  arr.push( func );
  if ( i == 5 ) {


This produces 2 alerts: '5' and '10'! If this doesn't make sense, just apply those scoping rules. At the time of the call of the first function, we're only halfway into the loop, and i does actually equal 5, so it produces 5. Afterward, the same rules apply, and i has increased to 10, so we see '10' as our alert.

The place where this sort of thing occurs a lot is in callback handlers. If you're creating a number of objects, and you want to associate a modified callback handler to each one, you can't do this:

for (var i = 0; i < objects.length; i++) {
  objects[i].setCallback(function() { objects[i].doSomething(); };

The associations all work properly, but on lookup, objects[i] always resolves to the last object! To solve it, you'll have to do something like

for (var i = 0; i < objects.length; i++)
  var create_callback = function(local_object) {
    return function() { local_object.doSomething(); };
  objects[i].setCallback( create_callback(objects[i]) );

This makes sure that there is always a function with the proper scope associated with it -- here local_object is always the correct object.

The reason this behavior is a bit confusing is that Javascript is one of the few languages that does dynamic (as opposed to lexical scoping). Emacs Lisp is the only common language I know of that is purely dynamically scoped, and Perl and Common Lisp both let the programmer choose on a per-variable basis. For instance, in Perl, you can say my $foo to indicate lexically scoped variables, and local $foo to indicated a dynamically scoped variable.

Anyway, hopefully this helps anybody who's been stumped by this sort of bug before.

UPDATE: Arg, I'm an idiot.

All the examples above are correct, but my reasoning is totally off. Javascript is lexically scoped, but blocks (for loops, switch statements, etc.) don't create new frames. Therefore, if you're looping through an array and creating a bunch of functions, the next frame up is your function with the for loop in it, and, like I said above, it'll find your loop variable with the value it had at the end of the loop. To get it to close over those variables, you need to create a new frame, which can only by done by creating a new function. By creating that function in the loop, get those closed over variables inside each created function, which is probably the behavior you want.

Tuesday, December 29, 2009

RESTful search

I've been thinking a lot recently about REST and properly designing RESTful interfaces. Most of it is pretty straightforward, but I actually hit a bit of a stumbling block in thinking about search. It actually led me to a couple of interesting thoughts on data plumbing.

In general, search doesn't fit easily into a RESTful view of the data. CRUD is easy: each resource is well-defined, and you can perform actions on it. POST/PUT is a little more nuanced than GET/DELETE, but it still makes a lot of sense once you manage to stop thinking about how and more about what. Nouns and verbs, in a common analogy.

The problems begin when you don't know what noun you want. Sure, REST will give you back information about the resource you know about, but what about finding resources? URI strings lend themselves well to hierarchical data structures, but that almost always limits what you can do with the data. Besides, it's probably not modeled in a tree in the rest of your application. It's more likely a relational model in your DB, or a set of dependent objects in memory. Cramming this sort of structure into a hierarchy removes a lot of the directions you can traverse the tree (without getting back every object at the root level). It reduces all relationships to something that fits in a tree. Doing that forces you to throw away a lot of the information implicit in that model.

So, what can we do? If you've ever done data analysis with any significant quantity of data (beyond the "just graph it" threshold), you've probably done some sort of traversal of all the different types of data you have, transforming and filtering it as you build meaningful relationships between different parts of the data. This type of filtering is search! That's exactly what we need to expose to make sure the users of the interface can do whatever they want.

The result is that I ended up moving a lot of what used to be unique names or IDs for my data into GET parameters instead of a part of the URI. Instead of

I'm using something like

Uglier? Maybe. But now I can do,16,17

The base of the URI only exposes what is fundamentally hierarchal and generic. I cannot search for objects in my data, but I can search for specific objects. I want to explore the object properties in my system, but I want to limit them by object ID and property key. IDs and keys get moved to the parameter side, and I'm left with a URI that describes my data structure, without any of the specifics of the data in the system. The parameters (all optional) only help the observer with a bit of additional knowledge speed up or simplify their exploration.

REST is a great way of thinking about interfaces, but it's not always obvious how to expose the relationships between all the data in your system. Separating the organization from the data itself forces developers on both sides to understand the data, allowing them the most flexibility with the least complexity.

Thursday, April 16, 2009


I like computers.

That's pretty obvious to everybody that knows me. There's also a lot of other people that also like computers, so it's not a particularly jarring statement, even if you don't know me.

But why?

That's less obvious. I like knowing about how my computer works. I like being able to connect to people using email, IM, Twitter, etc. But that's incidental. I like that computers can help me solve problems. I can organize my music better, answer brain teasers better, and even store my free-form thoughts better than other ways (search doesn't work on paper).

The common mode to all of this is complexity. By definition, the problems in my life are about managing complexity; otherwise, they wouldn't be problems. I want my music to be organized a certain way, but getting all 45Gb of it tagged is hard. Puzzles are fun, but the grunt work associated with them can be tiresome. Taking notes is easy, but reading them later is nigh on impossible if they're just a stack of papers. Even at work, I can take huge amounts of data and reduce it down to its essence, and then I can answer my questions. I can address each level of complexity with an abstraction barrier, and once I've solved it once, I'm free to think about the question that brought me there in the first place.

That's a dangerous path.

Some problems aren't built upon neat layers and clean abstractions. Obvious problems like interior design, sure, we don't even try. But there are some problems that we keep trying to solve with computers that are just as subtle: marketing, communication, or even finance (black swans, anyone?). These problems are inherently complex, and solving them requires a lot of information from the entire stack.

Take communication. Walking 10 miles to a friends house to tell them about your new cool widget is hard. Driving makes this easier. Calling, easier still. Email, you almost don't have to think about. Text message, you probably don't think about it. But can you really convey all you wanted to in the 160 character text message that you meant to? Don't you lose hearing about his dog's illness, the feedback on your widget, and all that other stuff? Maybe you didn't want to hear about it ... but maybe you did. The fact is, you weren't just talking about your widget. You were starting a conversation with him, and you cut it off at the knees when you decided to make it a text. Technology hasn't solved your problem, but it sure felt like it.

So, next time you find some website that changes your paradigm, think about what it does. Does it simplify your life? Or does it merely let you ignore complexity?