Arguments you can make as a startup CTO against filing for software patents

Most software engineers believe that software patents are bad for innovation, and shouldn’t exist. Unfortunately, this isn’t a good argument against filing for software patents, since it’s essentially a complaint about the rules of the game, but as a startup CTO you still have to play the game, regardless of what you think of the rules.

I thought my response to him might be more generally useful, as many CTOs of seed or A-round companies find themselves under pressure from non-technical co-founders and investors to spend time and money on software patents. I don’t always win the debate, and have been involved in filing a number of patents, however I can say with confidence that the return on investment on all of this time and effort was precisely $0 in every case.

Hopefully others can also share their advice or experiences.

Most people who advocate for patents within a company haven’t really thought it through in depth, they’re just advocating it because they think it’s what you need to do. Often they’ll ask a patent lawyer for advice, and they’re no more likely to argue against filing for patents than a turkey is going to argue in favor of thanksgiving dinner.

A few arguments against startups filing for patents:

  • They cost $10-$15k a pop, and can soak up a significant amount of your CTO’s time, often at a critical early phase of the company.
  • You don’t even get them for several years, by which time your startup has probably either succeeded or failed, and these patents will have played no part in it either way (except perhaps for the previous point where they probably hurt you by sapping money and time).
  • US jurisprudence has grown more restrictive about what can be patented over the past few years. What was clearly patentable 3 years ago is clearly unpatentable today. There is a pretty good chance that your patents will be rejected. In particular, business method patents (which patent lawyers used to love) are no-longer being granted.
  • “Defensive” patents don’t protect you from patent trolls (aka non-practicing entities), since they are immune to a counter-suit because they don’t actually do anything other than litigate or threaten to litigate.
  • Even if you don’t have an ethical problem with patents covering a real innovation, these are not the kind of patents most patent lawyers will want you to file for. Firstly because they will require a lot more work to create (and you’re only paying them $10-15k). And secondly, because they tend to be specific enough that they’re less useful as an offensive weapon.
  • Even if the patents do cover a real innovation, if it’s granted you’ve now disclosed something valuable to your competitors. They may not be able to copy it directly, but as you know there are typically many ways to solve any given problem in computer science, and now they know how you solved it.
  • Even if they’re granted, their only real value will be as part of a firesale in the event of your company’s failure, where they’ll probably be snapped up by a patent troll for less than you paid to file for the patent in the first place.

When people ask about my “IP defensibility” strategy, I generally argue for trade secrets. They’re free, require no effort, you’re not disclosing potentially important information to competitors, and they basically lack any of the other shortcomings I describe above.

Add a command to list git branches in order of last commit

This is based on this SO answer.

Wouldn’t it be useful if you could order git branches in order of the most recently used, so the ones you are likely to be most interested in are at the top? Here is how, just type:

$ git config --global alias.branches 'for-each-ref --sort=-committerdate refs/heads/ --format=\'%(committerdate:short) %09%(authorname) %09%(refname:short)\''
Now, just type:
$ git branches

And you’ll get something like:

2013-09-21 Ian Clarke gh-pages
2013-09-15 Ian Clarke master
2013-07-14 Ravi Tejasvi contactBook
2013-06-15 Ravi Tejasvi android
2013-06-08 Ian Clarke web-look-and-feel
2013-03-23 Ian Clarke cleanup-topology-maint
2012-06-12 Kieran Donegan topologyMaintenance
2012-05-28 Ian Clarke vaadin
2012-04-27 Ian Clarke refactor-of-peer-info
2011-07-07 Ian Clarke tr-remote-address-everything-needed

Note the date of the last commit, the committer, and the branch name.

Tungle: A wasted opportunity

Apparently Tungle has shut down development, although they still allow people to sign up. Turns out their acquisition by RIM last year must have been an acquahire (technically an acquisition, but really an admission of defeat).

Tungle had an incredibly viral business model, perhaps the most viral I’ve seen since Plaxo, solving a problem I and many others encounter on a near-daily basis:  Help people schedule meetings and calls with each-other.

So what went wrong? Their usability SUCKED. I desperately wanted Tungle to work, but almost every time I tried using it to schedule a meeting with someone, something would screw up and we’d have to resort to manually scheduling via email.  This was embarrassing when it happened, but even so I tried over and over again.  Every time I did could have been an opportunity for Tungle to sign up a new user, if their usability wasn’t so bad.

So if there is anyone out there looking for an idea that could be the next LinkedIn-scale viral phenomenon, all you have to do is reimplement Tungle, but this time get the usability right.  If I weren’t already rather busy I’d be doing this myself.

Microsoft probably just killed “Do Not Track”

Update (27th Oct 2012): I told you so!  Yahoo will ignore DNT from IE10 for exactly the reason I cite below.

Microsoft just announced that the “do not track” opt-out would be on by default in Internet Explorer 10.  This is a boneheaded move.

“Do not track” is a standard through which a web browser can inform a web page that the user does not wish to be tracked by third-party websites for the purpose of advertising.  So far as I can tell, respecting this is entirely voluntary on the part of the advertisers.

Advertisers often use browser cookies to track users, this allows them to target advertising specifically to people who’ve visited their website, for example.  Google and Microsoft both do it, it’s fairly standard practice these days.  Typically the advertiser isn’t tracking you as an individual, all they know is that you may have previously visited a particular website.

To explain why Microsoft’s move is boneheaded, I’ll relate a story from the early days of Revver, the online video sharing website that I co-founded back in 2004.

We had decided to let video uploaders tell us whether the video contained any content that is not appropriate for children as part of the upload process.  The vast majority of our users did exactly this and all was well, until at some point we realized that people were uploading some pretty serious pornography that we weren’t comfortable with even if it was marked as “adult” by the uploader.

Our panicked solution was to simply remove all videos marked as “adult” from the site, and prevent any further uploads where the videos were so-marked.

Of course you can predict the result: people immediately stopped marking videos as “adult”, making our task vastly more difficult.

The moral?  Don’t expect people to do something voluntarily if you are then going to use it against them.

I think Microsoft has just made exactly the same mistake.  Previously I think there was a reasonable chance that advertisers would choose to respect this, since only a minority of users are likely to enable it, and those are the people that really care about not being tracked.

But if it is enabled by default in Internet Explorer 10, advertisers now have no idea whether the user really cares about being tracked, and as a result they are far less likely to respect it.

Looking at it a different way, Microsoft just gave advertisers the perfect excuse to ignore DNT, because they can correctly claim that in most instances the user will have made no conscious decision to enable it.

Object Relational Mappers (ORMs) are a terrible idea

ORMs (like Hibernate in Java) are a flawed solution to an imagined problem that should be consigned to the wastebasket of history.

They are predicated on the idea that relational databases are old fashioned and should be avoided at all costs, but if you can’t avoid them, use some kind of convoluted wrapper that tries (and generally fails) to pretend that the relational database is something it isn’t – an object oriented datastore.

Often what people are really looking for when they use an ORM is some kind of database abstraction layer.  It is reasonable to abstract the database somehow, but I recommend not using an abstraction layer that seeks to pretend the database is something that it isn’t.

In Java, the best such database abstraction layer I’ve found is Jooq, I highly recommend it.

LastCalc: A powerful calculator meets Quora meets Siri

For the past month or so my main spare-time project has been a crazy idea called LastCalc (link at bottom).

I’ve been having trouble figuring out how to describe it, but here goes: Imagine a powerful web-based calculator that can answer your questions, a little like Google Calculator, Siri, or Wolfram Alpha, but where anyone can teach it how to calculate new things.

Additionally, rather than asking one question, you can ask a series of questions, each potentially referring to previous answers (programmers know this is a Read-Eval-Print-Loop or REPL).

Just like the others it supports basic math and unit conversions, like this (note: the highlighting is automatic and happens as you type – you type the bit before the big silver = and hit return, the answer appears after it):

But it goes a lot further. You can assign the result of a calculation to a variable, and then use it in subsequent calculations:

Internally LastCalc treats all numbers as rationals (x/y where x and y are integers) if possible, even if they are displayed as floating point numbers.  This means that it will not lose precision regardless of how many calculations you do (this can be a problem if using normal floating point numbers which are imprecise).

It’s not just simple numbers, LastCalc understands lists and associative arrays too, using a syntax very similar to JSON:

LastCalc is extensible, so if you find yourself repeating the same calculation over and over again, you can teach LastCalc how to do it (note: parameters are denoted by capitalization, like Prolog):

And it goes further, supporting pattern matching and recursion using these datastructures, just like languages like ML and Haskell:

Then use it with:

You can also pattern-match on maps.  Here I define a function that takes a map and returns a list of its keys:

Currently I’m working on a tutorial and help system so I don’t need to explain all of this before sending people to the site :-)

Right now you can only use functions that you define yourself, but in due course people will be able to share functions, much like they can share answers to questions with Quora.

So far it has only been tested in Chrome and Safari, and it definitely doesn’t work yet in Internet Explorer.  I’m waiting for the Javascript to stabilize before climbing that particular mountain.

Check it out at

It’s obviously a work in progress, if you’d like to follow discussion and provide me with feedback please join the LastCalc Google Group, or follow @LastCalc on Twitter.

The perils of treating a model’s predictions as actual probabilities

With our work on SenseArray we are in the fortunate position to apply machine learning to a wide variety of fascinating real-world problems, as diverse as predicting the likelihood that two people will have a successful date, the probability that someone will click on an ad, or the likelihood that a credit card transaction is fraudulent.

Sometimes all you care about is that predictions are correct relative to each-other. You’re looking for the best ad, or the best match on a dating website.

But in other situations you need an actual probability, not just a “score”. We quickly learned that it is dangerous, and frequently just plain wrong to assume that the output of your predictive model is a reliable probability.

As a specific example of this, imagine that you are using your predictive model to bid for the right to show someone an ad through an ad exchange. To do this you need to determine how much you expect to make if you show them the ad, and this depends on the probability of them click on it, and perhaps the probability of them making a purchase later.

In this situation it is critical that your probabilities are really probabilities or you’ll end up bidding too much, or not enough.

The first time we saw this we identified it as a form of “selection bias”, which is described in this paper (#1): Tweedie’s Formula and Selection Bias.

Essentially the paper states that if you are using predictions to pick things, perhaps the likelihood that someone will click on an ad, then the mean of the predictions that you pick will be higher than the actual rate at which people click on the ads. The reason for this, briefly, is that you are more likely to pick predictions with a positive error rather than a negative error, which creates an overall positive bias in the errors of what you’ve picked. Paper #1 does a better job of explaining.

Yesterday during a discussion on Reddit the aptly named Mr_Smartypants directed my attention to paper #2: Transforming Classifier Scores into Accurate Multiclass Probability Estimates, which describes another reason that you can’t treat the predictions of many learning algorithms as probabilities (see Figure 1 from the paper).

This particular problem doesn’t affect SenseArray because it operates in a fundamentally different way to most machine learning algorithms, but it does affect most other supervised learning systems that produce a probability as output.

This is fine if all you care about is that the ordering of the predictions is correct (ie. the relationship between the predictions and reality is isotonic), but if you want to treat them as accurate probabilities you’re in trouble.

Paper #2’s solution, or the one that most appealed to me, was to use a simple algorithm called “pair-adjacent violators” or PAV to determine a mapping from the predictions to the actual observed probabilities. PAV is a form of isotonic regression.  This seemed like it would be an improvement on our current approach. PAV is described in section 3 of paper #2, but the basic approach is this:

Create an list of pairs of predictions and the actual outcome, 1.0 or 0.0, sorted by prediction lowest to highest, like this:

(0.3, 0), (0.5, 1), (0.6, 0), (0.8, 1) …

Now, working through the list starting at the beginning, find a prediction pair where the outcome of the second is lower than the outcome of the first.  This is a “violator”, because we know that a higher prediction should mean a higher outcome.  In the brief example above the 2nd and 3rd prediction-outcome pairs are violators because while 0.6 is greater than 0.5, 0 is not greater than 1.

To resolve this, we remove this pair of violators, and replace them with a new pair consisting of the average (this isn’t quite correct, keep reading) of both the prediction and the outcome.  So now, our new list looks like this:

(0.3, 0), (0.55, 0.5), (0.8, 1) …

We carry on through the list doing this every time we find a pair of violators.  At the end of this process, we are guaranteed to have an isotonic mapping from predictions to outcomes.  Yay!

Actually, not so “yay”.  I decided to test the algorithm with this code:


The basic idea was for the “real life” probabilities to be the square of the prediction. So, if the prediction was 0.5, the real probability would be 0.25, and so-on.

Here I compare the curve determined by the PAV algorithm to what it should be:

As you can see, it is a terrible fit, completely useless for our purposes.  What went wrong? (edit: Turns out that paper #2’s description of the PAV algorithm was incorrect, my modification to it was actually the correct implementation).

After some thought I had an idea. Recall that the PAV algorithm works by “merging” pairs of samples repeatedly, using the average prediction and outcome of each to produce the replacement sample.

Using the average in this way assumes that both of the “parent” samples are of equal value, but are they? Well, no. One parent sample might have hundreds of “ancestor” samples, while another might not have any parents at all. PAV ignores this.

The result is that samples in parts of the curve that have a lot of these violations effectively get “watered down”, while samples in parts of the curve with few violations do not. The effect is to distort the shape of the curve, something that is very obvious from my experiment above.

The solution? We need to keep track of a “weight” for every sample, which basically counts the number of “original” samples that they are composed of.

Now, instead of doing a normal averaging of two parent samples, we do a weighted average, and the resulting child weight is the sum of the two parent’s weights.

The effect? Take a look:

Wow, looks like we have solved the problem! Frankly, I’m surprised that people haven’t noticed this flaw in the PAV algorithm in the past (edit: it wasn’t a flaw in the algorithm, it was a flaw in paper #2’s description of the algorithm). I assume that they were using it in situations where the violations were evenly distributed throughout the prediction curve, in which case the distortion may not be so pronounced.

I’ve emailed the authors of paper #2 and will update this post once I hear back from them.

Update: Ronny commented below directing me to another paper that describes the PAV algorithm, but it describes it with my “fix”. It appears that the description of the algorithm’s implementation in paper #2 was incorrect, and the correct implementation is the weighted approach I describe above. I suspected this might be the case, nice that it’s been confirmed.