FinQ Stats
I took too many statistics courses in college, or perhaps they took me. Hopefully I escaped with enough stats to implement the basics: mean, variance, and standard deviation for the FinQ library. LINQ stops with average, but standard deviation is etched into my mind as being genuinely useful…so I’ll go the extra few lines of code. Actually, the extra functions provide a good demonstration of how to combine map and reduce with lambda functions to achieve a slick result. Let’s get the simplest ones out of the way…
public static int Count<T>(this IEnumerable<T> seq)
{
return Reduce(seq, 0, (x, y) => y + 1);
}
public static double Sum(this IEnumerable<double> seq)
{
return Reduce(seq, 0D, (x, y) => x + y);
}
public static double Min(this IEnumerable<double> seq)
{
return Reduce(seq, double.MaxValue, (x, y) => (x < y) ? x : y);
}
public static double Max(this IEnumerable<double> seq)
{
return Reduce(seq, double.MinValue, (x, y) => (x > y) ? x : y);
}
public static double Avg(this IEnumerable<double> seq)
{
return seq.Sum() / seq.Count();
}
These functions use the lambda functions previously discussed for Count, Sum, Min and Max. With Min and Max, I’m making a somewhat sloppy choice for the Reduce init value, but I believe it only causes trouble with empty sequences. (Some test fodder for later!)
The Avg function is written as a single expression that builds on the Count and Sum we first defined, both of which use Reduce underneath! Underwhelming?, hmm. Let’s build something more fulfilling.
// variance (of population) - VarP
public static double VarP(this IEnumerable<double> seq)
{
double avg = seq.Avg();
return Map(seq, x => Square(x - avg)).Avg();
}
// standard deviation (of population) - StdDevP
public static double StdDevP(this IEnumerable<double> seq)
{
return Math.Sqrt(VarP(seq));
}
A number of things are going on in the Variance (of Population*) function. First the average of the sequence is calculated. Then Map is passed a lambda function that subtracts the calculated average from each item and then squares that difference. The resulting sequence of square differences is then passed to Average to produce the Variance. That’s some real functional style programming going on right there. I count five underlying reduction calls and five lambda functions. Fun, no? The Standard Deviation function writes itself, it’s simply the square root of the Variance. You’d likely write this function the same whether you were being functional or not.
*Consult ANY other source for a better explanation of Variance of Population and Sample.
Some C# Limitations
You may have noticed that I quietly slipped from IEnumerable<T> to IEnumerable<double> in the code samples. Because of a limitation of C# generics, I’ve chosen to implement type-specific versions of these statistical functions for all of the numeric types. The root of the problem has to do with C#’s treatment of the built-in numeric types. When it comes down to operations like addition and multiplication, it’s reasonable that numerics would have these operations available. But generics cover more than numerics, they cover strings and user-defined classes like Employee that certainly might not have multiplication operations defined. Furthermore, the numerics don’t have a distinct common base-class, like INumeric, so you can’t easily test for them as a special class. The consequence is that I need to build an int, long, float, double, and decimal version of each stat function in the FinQ library for completeness.
Some languages that are more friendly to functional programming have type systems that either allow you to treat numerics as a class of objects together with a common base-type, or they allow you to go ahead and define your functions as if all types support the operations you need. The programs written in such languages can postpone the determination of whether operations exist for the types they’re actually called with.