Social and professional networking

View Ian Clarke's profile on LinkedIn
Shameless plug

Does your company's revenue depend on being able to predict the future based on past data?  SenseArray may be able to help.

RSS
Links
« The Guardian writes about Freenet | Main | Infinite complexity from simplicity »
Saturday
Nov142009

Polynomial regression on a large dataset in Java

For an internal project at my day job I needed to find the n degree polynomial that best fits a (potentially very large) dataset, something like this:

Note: Yes I know the function illustrated above isn't a polynomial, this was the best illustration I could find of curve fitting on short notice.  I hope you'll forgive me.

I asked a question on Reddit about this, and Paul Lutus was kind enough to respond with a link to some code that did something close to what I needed, however Paul's code was not well suited to very large amounts of data.

So with his help, I modified his code to decouple it from the original application it was a part of, make it follow good Java coding practices a bit more closely, and in doing so make it more flexible and well suited to handling very large datasets.

The code is under the GNU Public License, which is unfortunate since its a library and the GPL is viral and Paul is unwilling to relicense under the LGPL, meaning that I or someone else will need to re-implement under a different license if the library is to be used within non-GPL'd code :-(

Here is the source, collaborative free software at work, and please comment if you find it useful!: PolynomialFitter.java

Reader Comments (7)

If that curve is representative of your data, rather than just fudged for the
blog, it seems that perhaps your next step is to fit to an arbitrary function,
rather than a polynomial.

I.e., your fit function should reflect the physical process responsible for
the data. Though I suspect you just added some noise to something
like (1 - cos(x^3/50))/2...

November 14, 2009 | Unregistered Commenterfvm

fvm, no - that curve is just fudged for the blog, that is not the actual data I'm using (and the actual data is unlikely to have any particular underlying structure).

November 14, 2009 | Unregistered CommenterIan Clarke

The GPL is not "viral." It's the copyright system that is, with the concept of "derivative work." There is nothing particular to the GPL (which stands for _General_ Public License by the way) that makes it more viral than a Disney movie. Go ahead, put bits of Disney movies in your work and see if it's not viral when you're getting served.

November 15, 2009 | Unregistered Commenterniczar

niczar, the GPL is indeed viral within the context of open source licenses.

The Apache license, the MIT license, the BSD license, the LGPL, and others are all not viral.

November 15, 2009 | Unregistered CommenterIan Clarke

Hey Ian, what do you think of Ireland's new blasphemy law:
http://killtheafterlife.blogspot.com/2009/07/when-god-and-government-mix.html

?

November 24, 2009 | Unregistered Commenternobody

@nobody, I think its idiotic.

November 26, 2009 | Unregistered CommenterIan Clarke

uly is within the basic Gucci shape, crafted within the signature GG fabric with wealthy leather trim in contrast, which in fact provides it more luxe. It measures about 12? x 8? x 6? inches and would certainly be straightforward to carry via the very rolled leather handles on top with 4 inches drop. It has a zip along the very top and has an inside zip pocket which adds to its overall practicality, letting you fill it with your esseniwc watches|
replica iwc|
ials and whatnots. It al

September 7, 2010 | Unregistered Commentersuodingwangzhi47

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>