Author Archives: ian

A (possibly) novel approach to image rescaling

We’ve all seen fictional examples of increasing the resolution of images to reveal previously unseen detail, here is a reminder:

Unfortunately, all of these examples are basically impossible, because they imply revealing information in the image that simply isn’t there.

Turns out that there is a way to do exactly this, described by this paper: Scene Completion Using Millions of Photographs.

So, anyone want to try applying this approach to “filling in” the missing pixels in an image that has been scaled up? The results would be really interesting.

Why doesn’t this exist yet: Syntax-aware merge

Hey programmers – what if I told you that there was a fairly interesting problem out there, that would probably be just a few weekends of work, and which if you implemented it you could find your code part of the toolset of a significant proportion of software developers worldwide (especially if you open sourced it)?

Well, just such a problem has been floating around for several years, and is as-yet unanswered.  I proposed it two years ago, and others have suggested it several years before that.

Anyone familiar with source control systems like svn and git will be familiar with the occasional need to “merge” changes.  This is to take changes to one version of a file, and apply the same changes to a different (but not too different) version of that same file.

Today’s merge tools, at least to the best of my knowledge, basically treat all text files the same.  Whether its Java, HTML, C++, or CSS, as far as the merge tool is concerned, they are all just lines of text.

This works ok in many situations, but it can get tripped up by situations like this:

   Book book = new Book(
        String name,
        int length);

Let’s say we change this to:

   Book book = new Book(String name, int length);

A programmer knows that all we’ve changed here is formatting, but a merge algorithm will see this has two lines deleted, and one line changed – quite a major alteration.

It is true that passing both files through a formatter would help in this case, but there are many other situations where it wouldn’t.  Consider renaming a variable throughout a file, inlining a variable, and other common refactoring operations.

My proposal is fairly simple.  Rather than treat Java files as simple text, we parse the files into an “Abstract Syntax Tree” using a Java parser, maybe even matching identifier names like variables, and treat the merge operation as a merge of these trees, rather than simple lines of text.

After the merge we convert the resultant tree back to text, perhaps even formatting it as we go.

The result of this, I suspect, would be dramatically more robust merge operations.

So what would be involved here?  Well, it shouldn’t be necessary to write a bunch of parsers, because these already exist for most languages.  The main thing would be the tree merge algorithm, which would be a neat self-contained computer science project.  Perhaps there is even an off-the-shelf open source tree merge algorithm out there.

Regrettably I’ve already got a bunch of projects on my plate already, but perhaps there is someone out there interested in taking on this challenge?  If they did I could almost guarantee that the resultant software (which I assume they’d open source) would be very widely adopted.

If you are interested I’ll be happy to help and advise. You can email me at syntaxmerge@ian.33barfmail.com.

A better way to pick passwords

Hands up how many of you use the same password for more than one website? How many of you use the same password for most or all websites?

This is extremely dangerous. Let’s say you sign up for a website, and you give them your email address (perhaps a gmail account), and then give them a password that happens to be the same as your gmail password. It is now trivial for them to hack your Gmail account and spam your friends.

Even if you only sign up for reputable websites, they can be hacked, as happened recently with Gawker (update: and even more recently with LinkedIn). Anyone who used the same password both for their email and for Gawker was immediately exposed.

Additionally, let’s say you use several passwords (my previous approach). You then run into the problem that you often forget which password you used where, so you have to try several of them (potentially revealing all your passwords to an unscrupulous website).

Another annoyance is that some websites have weird requirements for passwords, often they must be at least 8 characters in length, and contain a mixture of letters and numbers. If your default passwords don’t meet these criteria then often you have to modify them somehow, or pick new passwords entirely, and then of course you can never remember which variations you used for particular websites.

So what to do? A simple approach I use, which isn’t foolproof, but which is a big improvement over what most people do, is to base my password in some way on the domain of the website I’m visiting.

For example, let’s say you are coming up with a password for plentyoffish.com. One approach you might take is to start with the last 4 letter of the main part of the domain in reverse order, capitalizing the final one. And then add an additional 4 characters that you’ll always remember – ideally a combination of letters and numbers. Here are some example passwords following this scheme (using “5yty” as the final 4 characters in each case):

Website Password
plentyoffish.com hsiF5yty
www.google.com elgO5yty
facebook.com kooB5yty

While initially it might take you a few seconds to figure out the appropriate password for any given website, with a little practice it quickly becomes second-nature.

The good thing about a password scheme like this is that these passwords will meet the criteria of even the most fussy websites, because they are 8 characters in length, I’ve never seen a website that required more than 8 character passwords. Additionally, the passwords contain a mixture of upper and lower case characters, and numbers.

Now please don’t copy the exact approach I describe here. Perhaps instead of taking the last 4 characters of the domain, take the 2nd, 4th, last, and 2nd last – or something like that. It doesn’t matter, so long as you remember it.

Of course a weakness of this approach is that someone looking at your password for their site might be able to reverse engineer your system, but this involves a lot more work on their part than if you use the same password everywhere.

If you are concerned about this you could make your system more difficult to reverse engineer by, say, incrementing the letters you take from the domain name, so “abcD” becomes “bcdE”. Of course, this is at the cost of making it more difficult to figure out the appropriate password for an appropriate domain.

Related tip: When you are signing up for new websites, don’t give them your real email address. Use a service like 33Mail.com to create a new email address for each website. That way, if they start to spam you, you can just shut them down with a single click. I’ve been using 33Mail for a few months now (since OtherInbox stopped offering this functionality), and it has worked flawlessly. I may do a separate blog entry on this soon.

Another year, another way to host my blog

Two years ago, almost to the day, I switched to using Squarespace for my blog hosting. To quote my reasons at the time:

Well, I’ve switched blog engines once again. Several months ago I switched from a self-hosted WordPress to the WordPress.com service, because my blog kept getting hacked due to security holes in the open source version of WordPress.

But WordPress.com was far from perfect, they are very restrictive about what you can put on your blog (so no tools like Woopra or Google Analytics), only a small selection of approved plugins.

So I’ve decided to switch again, this time to SquareSpace. They let you use your own embedded code, which means I can now use the analytics system of my choice, and other neat tools like Google’s Prettify script.

Unfortunately it turned out that Squarespace has shortcomings of its own, and recently they became unbearable. Basically the problem is comment spam. Almost every one of my blog entries, especially the popular ones, had tens of spam comments. On my blog as a whole there may have been as many as a thousand. Whatever mechanisms Squarespace has for preventing comment spam are evidently ineffective.

Worse, Squarespace provides no convenient mechanism to delete comment spam en-masse, which would have meant spending hours deleting them manually, only to have them reappear at some later date.

My original reasons for moving away from self-hosted WordPress was that it was insecure and kept getting hacked. At the time this was especially damaging as the server hosting my blog was also used for other things, and they could have been compromised.

Well, it seems like the security situation may have improved with WordPress, and they have made it much easier to keep up-to-date with the latest version, so I’ve decided to try it again.

Hopefully I won’t regret it :-/

Meet Orville

This is Orville, our friendly neighborhood garden orb weaving spider (we live in Austin, Texas).  He is huge, probably an inch from head to butt, and his back legs are about 2 inches long.  Normally he is sitting in his gigantic web, but he has taken it down for the moment because we’ve been getting some serious rain the last few days, and I assume rain and webs don’t mix well.

And if you’d like to see a video:

The game theory of sealed-bid auctions

According to Wikipedia, a “frenemy” is someone who is simultaneously a partner and a competitor.

The term is typically used in social relationships, but frenemies can also exist in economic relationships, and I was fortunate enough to encounter such a situation in a project I’m working on, and I decided to explore it. The conclusions are directly applicable to at least one specific situation where real money is at stake (I describe this at the end). More interestingly, this may suggest a selfish rationale for unselfish behavior in a wider class of situations.

First, a little background:

Sealed Bid Auction

Consider a simplified version of eBay where everyone bids once on an item, nobody sees each-other’s bid, and the highest bid wins. This is called a “First-price sealed-bid auction”.

One day you find a trustworthy guy called Bill that promises that they can pay you $200 for a Nexus One phone. You discover that Nexus One phones frequently sell for less than $200 on eBay, and several of these phones are auctioned off every day. Bill hates to use eBay and won’t ever use it no-matter what, and he doesn’t really care what you bid for the phones, so long as he gets them. You realize that there is an opportunity to make some money here.

So what do you bid when you see one of these phones? Its a compromise, since if you bid $200 you’ll definitely win the phone, but you’ll make no money. If you bid less, then you’ll make more money if you win, but you’ll win less frequently. If you bid too little then you’ll never win and you’ll make no money.

This is not a hard question to answer if you are reasonably smart and have a decent amount of information about past winning bids, since you’ll be able to discover, given a bid b1, a function f1(b1) that will tell you the probability of winning the auction, given whatever you bid is. Let’s assume that you are smart enough and you do have enough information. Let’s say your cut of the profit is c1, then, applying some high school math, your expected profit is $200*c1*f1($200*(1-c1)).

From this equation you are able to decide the optimal bid (and therefore how big your cut is) to maximize your own profit. You can either do this through some fancy math, or just experimentally plug in different values for c1 until you find what works best, the latter being my preferred approach (because I’m lazy and computers aren’t).

Bill gets clever

Then Bill comes to you and tells you that he may not always be able to pay you $200 for the phone, sometimes he’ll pay more, sometimes he’ll pay less, but he will tell you what he will pay before you must place your bid. Fair enough you think, you’ve got your function f1(b1), you can determine the optimal bid depending on whatever Bill is willing to pay.

Weeks go by and you are making good money off this relationship. One night you go out for a beer with Bill, and he drops a bombshell. He tells you that actually, he is playing the same game you are. He knows a guy, Jim, who is buying the phones from him. He won’t tell you who Jim is (Bill isn’t an idiot), but it turns out that Jim is paying even more for these Nexus Ones than Bill is! Worse still, it turns out that not only is Bill no idiot, he is at least as smart as you are. He has determined the probability of you winning your bid, and is choosing his cut to optimize his total profit, in the exact same way that you are (although his function, f2(b2), won’t be the same as your’s because you are reducing his bid by your cut).

You go home drunk and don’t think about it much, but the next day you have a real headache, and its not just all the beer you had with Bill last night. What exactly is the relationship between you and Bill? In one sense, you are on the same side. If your cut combined with Bill’s cut is too big, then both of you will make less money. But in another sense, you are on opposite sides, the smaller his cut, the bigger your cut can be, and vice versa.

Given this relationship, and assuming that you can’t deal directly with Jim, Bill can’t deal directly with eBay, and Bill is at least as smart as you are, how do you maximize your profit?

My Simulation

A mathematician or game theorist at this point would probably go into a dark room for a week, month, or year, and come out with the mathematically optimal answer. Since I don’t have the time or patience required for a rigorous mathematical treatment, I decided to do a few experiments instead. You can find the code here.

The Auction

In the results described here, Bill’s buyer pays $1 (ok, we’re somewhat abandoning the Nexus One analogy here) to Bill for a phone. I wanted to create a reasonably realistic auction, which I do in the bidWinProb() method on line 30. Basically I simulate 100,000 auctions, each consisting of 10 bidders whose bids are spread according to a Gaussian distribution with mean of $0.50, and standard distribution of 0.1. I then record all of the winning bids, which allows me to quickly estimate a probability of winning for any given bid.

Yes, I realize that I could do this much more efficiently mathematically, but speed doesn’t really matter here, and this solution is simple enough that bugs are unlikely.

It has the added benefit that you can just provide it with actual data from a real auction, rather than simulated data.

Optimize Cut

The first step is that we want to be able to determine the optimal cut for ourselves, given whatever cut Bill has, or the optimal cut for Bill given whatever our cut is. This is done in the optimizeCut() method on line 78. The approach is basic trial and error. While a more sophisticated, accurate, and efficient approach is certainly possible, the current implementation is more than adequate for our needs (not to mention being easier to be confident that it does what its supposed to).

Experiment #1: Best combined cut

So what happens if Bill and I were entirely transparent with each-other, and agreed that we’d find the best cut for both of us, and then split it down the middle?

In this situation it turns out that Bill and I should collectively take $0.28 of the dollar Bill is paid if the auction is won (which when we consider the possibility of losing the auction, is an expected revenue of $0.24). Split down the middle this is an expected revenue of $0.12 each.

Experiment #2: Iterated selfish cut

But what if Bill and I can’t agree on an even split, and we are basically each left to our own devices to make as much as we each can?

One approach to do this would be for me to optimize my cut based on whatever Bill is paying me (which will depend on his cut), and for Bill to optimize his cut based on his perspective of his probability given various payments to me (which will depend on my cut).

When I tried this the results were interesting. I started from a point where Bill isn’t taking any cut at all, then each of us take turns to optimize our cuts based on the other one:

% Cut          $ Cut
Bill    Me      Bill    Me      Ttl $   1/2
0.000   0.280   0.000   0.244   0.244   0.122
0.100   0.220   0.080   0.159   0.239   0.120
0.140   0.190   0.109   0.128   0.237   0.118
0.160   0.180   0.119   0.112   0.231   0.115
0.160   0.180   0.119   0.112   0.231   0.115

This is interesting because they converge to where Bill takes a 16% cut, and I take an 18% cut. Further, when you look at the expected revenue, it is close to evenly split between us, but not exactly. Note that after this iterative process Bill and I collectively are making only $0.231, whereas we would be making $0.244 if we had worked together and split the difference.

Bill makes 3% more money if we work together and split the profit, and I make 9% more!

Experiment #3: Machiavellianism

My last two experiments I call “Evil Me” and “Evil Bill”. In “Evil Me” I exploit the fact that Bill will optimize his cut based on my cut, so rather than selecting my cut based on Bill’s cut, I select my cut to maximize my profit knowing that Bill will optimize for whatever is in his best interests given my cut. Evil Bill is the same but the other way around.

% Cut          $ Cut
Bill    Me      Bill    Me      Ttl $   1/2
0.100   0.270   0.055   0.134   0.189   0.094  <-- I'm evil
0.240   0.120   0.151   0.057   0.208   0.104  <-- Bill is evil

Wow, so in this situation being Machiavellian allows me to make $0.134 while poor Bill only makes $0.055! Bill reverses this if he is Machiavellian and I am not.

What is fascinating is that, while I increase my profit by 20%, Bill's drops to less than half of what it was before! One of us being Machiavellian helps that person a bit, but it hurts the other a lot.

Conclusions

So what are our conclusions based on this one experiment?

  1. If our sole motivation is to maximize our own profit, we should adopt the Machiavellian approach, even though this will totally screw Bill.
  2. If we at least have some semblance of compassion for Bill, the next best approach is to pre-agree a split with Bill, and then optimize together, as in Experiment #1.
  3. The pre-agreed split should probably be 50:50, because if Bill and I can't agree on a split and were to optimize against each other as in Experiment #2, then we wind up making approximately the same amount anyway, but we both make less than if the split is pre-agreed.

An important caveat is that these conclusions are based on a single experiment that makes a variety of assumptions. These assumptions seem realistic enough that hopefully these results do generalize, but we can't say that for sure.

Unanswered questions

Some may feel dissatisfied by the fact that I've answered these questions experimentally, rather than explore the math behind why we get these results. I would strongly encourage such people to feel free to explore themselves, especially if they enjoy algebra and statistics. I'll gladly link to any serious attempts here.

Further, I've explored a situation where one participant is manipulative, while the other tries to make the best of whatever situation they are in. But what if both participants are manipulative? My suspicion is that in that case it degenerates into a Game of Chicken with an ultimate outcome that is far worse for both participants, but further investigation is definitely warranted.

Why is this problem relevant?

Realtime bidding is a recent innovation in advertising where advertisers bid against each-other for the right to show you an advertisement. The analogy with the situation described above is that the sealed-bid eBay is an ad exchange, I'm an ad network, and Bill is the advertiser. Bill is selling something where he has a (hopefully) known profit margin, and he must decide how much of this profit margin he will spend to sell his product. Similarly, I as the ad network must decide on my cut.

Note 26th July 2010 - This is a republish of an article from 24th July with significant additions and modifications.

A concise explanation of what is wrong with software patents

I was recently pointed to this comment by John Carmack, co-founder of Id Software, regarding software patents.  I thought he gives a beautifully concise explanation as to why the vast majority of software engineers would prefer it if software were not patentable:

Before issuing a condemnation, I try hard to think about it from [a Lawyer’s] point of view — the laws of the land set the rules of the game, and lawyers are deeply confused at why some of us aren’t using all the tools that the game gives us.

Patents are usually discussed in the context of someone “stealing” an idea from the long suffering lone inventor that devoted his life to creating this one brilliant idea, blah blah blah.

But in the majority of cases in software, patents affect independent invention. Get a dozen sharp programmers together, give them all a hard problem to work on, and a bunch of them will come up with solutions that would probably be patentable, and be similar enough that the first programmer to file the patent could sue the others for patent infringement.

Why should society reward that? What benefit does it bring? It doesn’t help bring more, better, or cheaper products to market. Those all come from competition, not arbitrary monopolies. The programmer that filed the patent didn’t work any harder because a patent might be available, solving the problem was his job and he had to do it anyway. Getting a patent is uncorrelated to any positive attributes, and just serves to allow either money or wasted effort to be extorted from generally unsuspecting and innocent people or companies.

Yes, it is a legal tool that may help you against your competitors, but I’ll have no part of it. Its basically mugging someone.

 

 

How to update your Nexus One to Froyo (2.2)

Here is how to manually update your Nexus One (the original model, non-AT&T, non-rooted) to Froyo, Android 2.2.

  1. Download the update from here and save the zip file.
  2. Put the update onto your SD card and rename it update.zip (Note that if you’re using Windows and don’t have “show file extensions” turned on in the file explorer you won’t see a .zip. Just rename it to “update” (no quotes, of course) because it’s already a zipped file).
  3. With your Nexus One off, hold down the trackball and press the power button.
  4. You’ll be booted into a white screen with three Android robots on skateboards. Select “Bootloader.”
  5. On the next screen, select “Recovery.”
  6. Your phone will reboot, giving you a picture of the Android robot and an exclamation point inside a triangle.
  7. Now press the power button and volume up button at the same time. It could take a couple of tries.
  8. Now (using the trackball this time) choose “Apply sdcard:update.zip” and let things run their course, it will take several minutes.

 

Google Wave goes public but misses obvious viral opportunity

So Google Wave has opened its doors to the public – yay!

Now you can just create a Wave, and enter anyone’s email address and they will automatically be invited, and signed up for Wave if they don’t already have an account – right?  

Wrong.  If you try you get an unhelpful message telling you that they aren’t a Google Wave user:

 

How Google could miss this completely obvious opportunity to ensure viral adoption of Google Wave?

What should happen is as I described it – from my perspective it should be no different than sending them an email.  From their perspective they should get an email saying that someone wishes to have a Wave conversation with them, and give them the opportunity to easily and transparently sign up for Wave.

If, for whatever reason they don’t want to use the Wave UI, then I guess they should be able to reply and participate in the conversation through email.

Wave is designed to break down the barriers between different means of communication through its plugin architecture (eg. its Twitter support).  How then could Google not see the importance of breaking down the barriers between Wave and the primary communication medium its supposed to replace: Email?