Polynomial regression on a large dataset in Java
Saturday, November 14, 2009 at 3:13PM For an internal project at my day job I needed to find the n degree polynomial that best fits a (potentially very large) dataset, something like this:

Note: Yes I know the function illustrated above isn't a polynomial, this was the best illustration I could find of curve fitting on short notice. I hope you'll forgive me.
I asked a question on Reddit about this, and Paul Lutus was kind enough to respond with a link to some code that did something close to what I needed, however Paul's code was not well suited to very large amounts of data.
So with his help, I modified his code to decouple it from the original application it was a part of, make it follow good Java coding practices a bit more closely, and in doing so make it more flexible and well suited to handling very large datasets.
The code is under the GNU Public License, which is unfortunate since its a library and the GPL is viral and Paul is unwilling to relicense under the LGPL, meaning that I or someone else will need to re-implement under a different license if the library is to be used within non-GPL'd code :-(
Here is the source, collaborative free software at work, and please comment if you find it useful!: PolynomialFitter.java
Ian Clarke |
7 Comments |
Java,
Programming,
SenseArray 

Reader Comments (7)
If that curve is representative of your data, rather than just fudged for the
blog, it seems that perhaps your next step is to fit to an arbitrary function,
rather than a polynomial.
I.e., your fit function should reflect the physical process responsible for
the data. Though I suspect you just added some noise to something
like (1 - cos(x^3/50))/2...
fvm, no - that curve is just fudged for the blog, that is not the actual data I'm using (and the actual data is unlikely to have any particular underlying structure).
The GPL is not "viral." It's the copyright system that is, with the concept of "derivative work." There is nothing particular to the GPL (which stands for _General_ Public License by the way) that makes it more viral than a Disney movie. Go ahead, put bits of Disney movies in your work and see if it's not viral when you're getting served.
niczar, the GPL is indeed viral within the context of open source licenses.
The Apache license, the MIT license, the BSD license, the LGPL, and others are all not viral.
Hey Ian, what do you think of Ireland's new blasphemy law:
http://killtheafterlife.blogspot.com/2009/07/when-god-and-government-mix.html
?
@nobody, I think its idiotic.
uly is within the basic Gucci shape, crafted within the signature GG fabric with wealthy leather trim in contrast, which in fact provides it more luxe. It measures about 12? x 8? x 6? inches and would certainly be straightforward to carry via the very rolled leather handles on top with 4 inches drop. It has a zip along the very top and has an inside zip pocket which adds to its overall practicality, letting you fill it with your esseniwc watches|
replica iwc|
ials and whatnots. It al