Hand to Hand Malware Combat


Published on November 2nd, 2009
Leave a Comment

Does anybody else like to battle viruses and malware by hand? I use the free AVG Anti Virus and highly recommend it. But my kids play a lot of flash games that invariably come from sites designed to infect with the latest malware.

The free AutoRuns utility from Sys Internals (now Microsoft) is my chief weapon against malware. This utility simply shows you all the places (about 20) that viruses can hook into windows or internet explorer via the registry. I can quickly determine if anything has changed from the last legitimate set of hooks using AutoRuns. The utility can either remove the hook for you or navigate you to the registry setting in regedit to do it yourself.

Over the weekend, my home PC had a new twin malware attack…and a clever one. I quickly found its hooks into windows with AutoRuns, but the malware was savvy enough to rewrite its hook whenever the registry changed. It was not clear which process was doing the update, and I know you can hide from the process list, so I was forced to try something else to kill the process…

With a small amount of trial, I determined that the malware had some faulty logic. It only tested for the existence of its hook within the registry value…and not that it was both within the registry value and valid. All that was required to throw the malware off was to prefix the hook with a single character to make its file reference invalid. A quick reboot, and the malware was not loaded successfully, and the registry and files could be cleaned up.

I know there are free tools for handling malware like this, but I like taking them out by hand (and seeing how they work!). And my wife and kids think I’m a hero, naturally, the story they get about the battle is much more lurid!

 

Published in categories: Software


Software History Books


Published on November 2nd, 2009
Leave a Comment

It’s valuable for programmers to know the history of computer programming. One reason is to observe how ordinary people accomplished extraordinary things…creating software that’s part of our everyday existence. Another is to gain insight into how software development got to its current state and where the future lies.

I find the stories of Internet start-ups, success or not, fascinating. Jessica Livingston’s “Founders at Work” includes great interviews with people from the birth of the PC to the heroes (and perhaps villians) of the Internet boom. I have many favorite interviews, the Steve Wozniak one is particularly good, as are Katerina Fake (flickr) and Evan Williams (blogger).

Peter Seibel’s “Coders at Work” brings together interviews with software and programming language creators. The interviews aim to get to the core of how software is developed by master programmers. I, for one, am relieved to hear that none of the interviewees speak of modern project management and development methodology as being key to success. All of the coders spend a lot of time thinking and a lot of time using what they write. My favorite interview is probably the Joshua Bloch one if you want to skim.

Both Founders and Coders can be picked up and put down at any time and the interviews enjoyed in any order. Whatever you do, don’t skip the unfamiliar names.

The history of our profession is extremely brief. And we’re living in exciting times. It’s certainly not too late for any one of us software developers to make a significant mark. Reading about the history and the people involved can be inspiring. “Hey! I can do that.”

 

Published in categories: Programming


Startup Interest Group


Published on August 21st, 2009
Leave a Comment

I’ve been attending a technology startup interest group, the Philly Startup Hackathons, for about two years now. I’ve met lots of interesting people, notably Gabriel Weinberg, the organizer. Gabriel’s always got too many interesting projects underway! One of his latest is an internet search engine called DuckDuckGo. He’s also well-connected in the area, and is happy to make introductions between people with similar (and often esoteric) interests.

The group gets together once a month and has a rather open format of talking about technology, startups, and the various startup efforts of the group members. Anywhere from 3 to 30 people show up, depending on the venue and time of the meeting. Our meetings in the evening in Philadelphia have been better attended than our lunch-time meetings in King of Prussia. 

One important function of the group is to provide a social outlet for people who might be working in isolation on their startup. It’s great to have a sounding board if you’re working alone. A number of people have teamed up or found work through the group.

The next meeting will be hosted at LiquidHub’s office in King of Prussia on Monday, August 31st at 6:00pm. If you’re intersted in attending, sign up at the Hackathon site.

Published in categories: Uncategorized


Blog Comments


Published on August 18th, 2009
2 Comments

Comment Spam, Serial Posts, Long Gaps

Wow, 95% of the comments posted to the blog articles here are spam. Through the spam, I’ve learned a lot about Pfizer’s flagship product…or is that flagstaff product? The comment spammers go to varying amusing lengths to make their posts look like legitimate comments, but they’re effortless to spot, so why bother? Well, I know why, PageRank. It’s safe to say, however, that nearly all of the spam-bots come from a DZone link to a single post on this blog…[thanks Ted ;)].

Another thing I’ve learned is that it’s very hard to stay focused for long series posts, like the FinQ series I attempted about C# functional programming. The serial posts take a lot of effort to tell a consistent story in a logical way.  That focus required for a series is not a problem if you’re writing an article or a white paper to be delivered all at once, but for a blog it’s just not worth the effort, and not a good match for the blog medium. Perhaps it’s just hard to wade through some dull-but-necessary parts to get to the interesting stuff. Not all writing breaks neatly into complete, bite-size, stand-alone chunks.

I won’t swear off the series posts, but I need to let them not prevent me from posting more freely on varied topics. Long gaps in the series are OK, eventually the whole story will get out there. But long gaps between posts are not OK…no one wants to follow an inactive blog.

Published in categories: Blogging


Ideal Transform Rule


Published on May 11th, 2009
Leave a Comment

Always work from ideal input.

Developers familiar with the power of pipeline operations central to the UNIX operating system know how simple, modular tools can be chained together to accomplish a wide variety of complex tasks. The small-scope and single-purposed blocks of code that make up a pipeline maximize the ability to maintain the code. Even if you don’t understand the workings of a single stage, it’s likely that you can understand the stage’s place in the context of the pipeline process.

One of the biggest challenges of code maintenance is handling changing requirements without breaking the existing code base. When you have pipelines composed of stages with clear input and output definitions, the introduction of additional stages becomes much safer. As long as you respect the input and output expectations of each stage, the rest of the pipeline can remain unchanged (in most situations).

I use a principle I call the “Ideal Transform Rule” when designing a complex process. I start by asking the question, “What’s the simplest input I could use to produce the desired output?”

If I were working on a report (gack!) I would want a data source with a “shape” that corresponded to the output format, required no normalization or exception processing, and that spoke in terms the consumer of the report understood. With that “ideal” input in hand, the report formatting would be as simple as possible. The report definition would not need to concern itself with any serious data transformation.

My next step, having established an ideal input for the final stage, is to work my way forward, backward, recursively from the original input in a reasonable number of discrete stages to achieve the ideal input.

For example, the initial data may be coming from a mainframe (double-gack!) and adhere to a dated column-naming convention. You know, one of those indescipherable short character coded names like BNF_TY_CD (not to be confused with BEN_TY_CD or BNF_TYP). The input data may also include additional extraneous elements or have codes that need to be expanded to readable descriptions. I might choose to “normalize” the data to include data elements named with business terms (ie. BenefitTypeCode), discard un-needed data elements, and expand codes with descriptions (ie. A – Active). This step may seem frivilous, but I believe in leaving legacy conventions behind because of the terrible impact they can have on communication in the present. By using current business terminology in a modern naming convention, developers will greatly reduce the learning curve of the data they are working with.

The interim stages could include sorting, filtering, aggregation, cross-tabulation, etc. It all depends on what needs to be accomplished. Eventually, you may face a piece of complexity that can’t easily be broken down further. With this pipeline approach you at least isolate the complexity to as few stages as possible.

With the report example above, it’s very well possible that the report could be produced from the original mainframe input in a single (complex!) step in the report definition. If the complexity is there in the report definition alone, you’ve created code that has many concerns, does not communicate clearly to future maintainers, and perhaps hides errors. Additionally report definition tools like Crystal Reports are often opaque…not the best way to manage code. Working with pipeline stages “shows your work”, a best practice you may have learned in school.

You don’t need to limit your use of the Ideal Transform Rule to multi-stage pipelines. Creating just a two step process has benefits as well. A quick Perl script can often clean-up data for use with a database bulk import tool…the Perl script saves you from limitations in both the input data and the import tool. Similarly, database view definitions simplify reports against a database…the view created for a report should be the ideal report input, removing much of the join logic from the concerns of the developer.

There’s a joy in writing simple code. Working from ideal input keeps your code simple. Simple code is more accesible and therefore more maintainable and less likely to break. When faced with a complex task, start by defining the ideal input then close the gap from actual to ideal.

Published in categories: Programming


Unit Tests for FinQ


Published on March 25th, 2009
Leave a Comment

I’ve setup the FinQ library solution to support the unit testing framework NUnit. Here are a few of the guidelines I’ve used for setting up the project.

NUnit comes with a good test running application, but I prefer to use ReSharper’s IDE-integrated unit test runner. ReSharper makes it effortless to run and debug individual tests, subsets of tests, or the whole monkey.

UnitTestSnapshot

Some of the test setup guidelines are illustrated in the snapshot above. The namespace and subject area breakdowns serve to organize the unit tests. The test naming convention is one I adopted from a more experienced unit tester. Each name is descriptive enough that I can quickly determine the test aspect that is failing.

Here’s a single test from the FinQ statistics unit tests.

namespace FinQ.UnitTests
{
    [TestFixture]
    public class StatTest
    {
        [Test]
        public void sum()
        {
            const int N = 5;
            const int sumToN = (N * N + N) / 2;
            Assert.AreEqual(sumToN, Enumerable.Range(1, N).Sum());
        }
    }
}

The test above shows how minimal the NUnit interface is…there’s no noise, just your test code. This test uses constants that can be used to easily vary the test and an independent formula to verify the result. Whether you’re developing in a test-driven style or adding key checks on your implementations, when testing has such a low barrier to entry there’s no excuse not to.

Two NUnit Tips

The NUnit framework is very rich. My NUnit usage for the FinQ library is rather shallow, mostly consisting of AreEquals assertions without any shared setup or tear down test code. I recommend you explore other aspects of NUnit by browsing the documentation. Here are a few “nice to know” techniques.

The AreEqual assertion has additional parameters for specifying precision.

The AreEqual assertion has additional parameters for specifying precision/tolerance. When you’re comparing floating point results this is essential. The delta parameter allows you to test to maybe just three or five decimal places instead of all decimal places. If you’re only comparing floats with equals, you’re doing it wrong. The sample test below compares results to within one one-thousandth.

const double devExpected = 1.41421356D;
Assert.AreEqual(devExpected, Enumerable.Range(1, N).StdDev(), 0.001);

Another useful technique is the ExpectedException test attribute. In my case, I know my code SHOULD throw a DivisionByZero exception. You need to follow your test case with an intentionally unexpected “safety” exception, using Assert.Fail as shown below, to ensure that your test is successful. (If your “safety” exception throws, then your test case is not throwing the expected exception.)

[Test]
[ExpectedException(typeof(DivideByZeroException))]
public void var_one_dec()
{
    List<decimal> lstOneDec = new List<decimal> { 50 };
    decimal noresult = lstOneDec.VarA();
    Assert.Fail("Should have thrown DivideByZero");
}

Onward!

This brief departure on unit testing serves a key illustrative purpose in this FinQ series. The chief advantage of unit tests is the freedom they afford you to refactor your code. Running your tests provides instant feedback on the impact of your changes.

In the next post, I’ll refactor the base FinQ functions in two interesting ways. First, I’ll illustrate lazy evaluation using the C# yield construct for Map and Filter. Second, I’ll implement Reduce using recursion…just for sport. Remember that Reduce underlies the Filter and Map implementation, and together they underlie the rest of the FinQ library so far. I’m going to fundamentally change the entire library with these two implementations, and do so with confidence because of the unit tests.

That’s Unit-toasty, and Functional-icious!

Published in categories: FinQ, Programming


FinQ Stats


Published on March 12th, 2009
Leave a Comment

I took too many statistics courses in college, or perhaps they took me. Hopefully I escaped with enough stats to implement the basics: mean, variance, and standard deviation for the FinQ library. LINQ stops with average, but standard deviation is etched into my mind as being genuinely useful…so I’ll go the extra few lines of code. Actually, the extra functions provide a good demonstration of how to combine map and reduce with lambda functions to achieve a slick result. Let’s get the simplest ones out of the way…

public static int Count<T>(this IEnumerable<T> seq)
{
    return Reduce(seq, 0, (x, y) => y + 1);
}

public static double Sum(this IEnumerable<double> seq)
{
    return Reduce(seq, 0D, (x, y) => x + y);
}

public static double Min(this IEnumerable<double> seq)
{
    return Reduce(seq, double.MaxValue, (x, y) => (x < y) ? x : y);
}

public static double Max(this IEnumerable<double> seq)
{
    return Reduce(seq, double.MinValue, (x, y) => (x > y) ? x : y);
}

public static double Avg(this IEnumerable<double> seq)
{
    return seq.Sum() / seq.Count();
}

These functions use the lambda functions previously discussed for Count, Sum, Min and Max. With Min and Max, I’m making a somewhat sloppy choice for the Reduce init value, but I believe it only causes trouble with empty sequences. (Some test fodder for later!)

The Avg function is written as a single expression that builds on the Count and Sum we first defined, both of which use Reduce underneath! Underwhelming?, hmm. Let’s build something more fulfilling.

// variance (of population) - VarP
public static double VarP(this IEnumerable<double> seq)
{
    double avg = seq.Avg();
    return Map(seq, x => Square(x - avg)).Avg();
}

// standard deviation (of population) - StdDevP
public static double StdDevP(this IEnumerable<double> seq)
{
    return Math.Sqrt(VarP(seq));
}

A number of things are going on in the Variance (of Population*) function. First the average of the sequence is calculated. Then Map is passed a lambda function that subtracts the calculated average from each item and then squares that difference. The resulting sequence of square differences is then passed to Average to produce the Variance. That’s some real functional style programming going on right there. I count five underlying reduction calls and five lambda functions. Fun, no? The Standard Deviation function writes itself, it’s simply the square root of the Variance. You’d likely write this function the same whether you were being functional or not.

*Consult ANY other source for a better explanation of Variance of Population and Sample.

Some C# Limitations

You may have noticed that I quietly slipped from IEnumerable<T> to IEnumerable<double> in the code samples. Because of a limitation of C# generics, I’ve chosen to implement type-specific versions of these statistical functions for all of the numeric types. The root of the problem has to do with C#’s treatment of the built-in numeric types. When it comes down to operations like addition and multiplication, it’s reasonable that numerics would have these operations available. But generics cover more than numerics, they cover strings and user-defined classes like Employee that certainly might not have multiplication operations defined. Furthermore, the numerics don’t have a distinct common base-class, like INumeric, so you can’t easily test for them as a special class. The consequence is that I need to build an int, long, float, double, and decimal version of each stat function in the FinQ library for completeness.

Some languages that are more friendly to functional programming have type systems that either allow you to treat numerics as a class of objects together with a common base-type, or they allow you to go ahead and define your functions as if all types support the operations you need. The programs written in such languages can postpone the determination of whether operations exist for the types they’re actually called with.

 

Published in categories: FinQ, Programming


Reduce Redux


Published on February 25th, 2009
2 Comments

Reduce is a powerful function pattern. In fact, it can be used as the base implementation for both Filter and Map. That’s what we’ll do to take our functional library to the next level of functional goodness.

We initially had Reduce take a list of many values and produce a single value, an aggregate, but there’s nothing stopping us from returning a list of values. To accomplish this, we can have the Reduce initial value parameter be an empty “result” list, and then have the aggregation function push result items onto the list as it processes the input sequence.

In the implementations of Filter and Map below, a lambda function is created that wraps the predicate passed to Filter or the conversion passed to Map. In addition, the call to Reduce itself creates the initial empty result sequence required and returns it as the reduction result.

/// <summary>
/// Reduce uses an initial value and an accumulation function to
/// build a result value from a sequence. In LINQ, Aggregate().
/// </summary>
public static R Reduce<T, R>(this IEnumerable<T> list, R init, Agg<T, R> agg)
{
    R result = init;

    if( null == list ) return result;

    foreach (T it in list)
        result = agg(it, result);

    return result;
}

/// <summary>
/// Filter uses a predicate function to conditionally copy elements
/// from an input sequence. In LINQ, Where().
/// </summary>
public static IEnumerable<T> Filter<T>(this IEnumerable<T> list, Pred<T> pred)
{
    return list.Reduce(new List<T>(), (it, result) =>
        {
            if (pred(it))
                result.Add(it);
            return result;
        });
}

/// <summary>
/// Map uses a conversion function to build a result sequence from an
/// input sequence. In LINQ, Select()
/// </summary>
public static IEnumerable<R> Map<T, R>(this IEnumerable<T> list, Conv<T, R> conv)
{
    return list.Reduce(new List<R>(), (it, result) =>
        {
            result.Add(conv(it));
            return result;
        });
}

Having Reduce as the basis for Filter and Map (and just about everything else in the FinQ library) is elegant coding. And it also allows us to make some deep design experiments which is good for exploring C# features. I’m curious to take a good look at the C# yield performance compared to straight iteration. A recursive Reduce will also be interesting. I’ll look at those in a future post.

Published in categories: FinQ, Programming


Lambda Functions


Published on February 25th, 2009
1 Comment

What about those predicate, conversion, and aggregation functions? Does this style of programming with higher order functions mean you have to write a bunch of tiny functions to pass around? Nope! We can use C# lambda functions.

A lambda function is a function defined on-the-fly at the call site without a lot of declaration and syntax baggage. Here’s a sample lambda function that replaces the isOver30 function from the Filter post…

List<T> untrustworthy = employees.Filter( emp => emp.Age > 30 );

That’s the expression form of lambda function. To the left of the => symbol is the parameter list, multiple parameters are wrapped in parenthesis and separated by commas. (ie. (a, b) => example below). To the right of the => symbol is the expression, which can refer to the parameter by name.

Lambdas can also have statement bodies if needed…

List<t> untrustworthy = employees.Filter( emp =>
        {
            // untrustworthy indeed
            if( emp.LastName == "Page" )
                emp.Age /= 2;
            return emp.Age > 30;
        };

Lambda functions with statement bodies have their place…but this doesn’t seem to be one of them from a code clarity point of view. Choosing to use a lambda function will depend on whether the function is generally useful or whether it will be called from multiple locations. Why bother cluttering up your class’s interface with yet another method declaration when a lambda will do the trick? Lambda functions also let you place details in-line rather than hiding them.

Here’s a few quick examples of Reduce used with lambda functions that take multiple parameters.

Reduce(list, 0, (x, y) => y + 1); //A
Reduce(list, 0, (x, y) => x + y); //B
Reduce(list, 0, (x, y) => (x > y) ? x : y; //C

Can you guess what they do?

The first call, A, ignores the x parameter altogether, it simply increments the result y (initially 0) by 1 for each element. It will give a COUNT of the number of items in the list. Yes, that’s an O(n) count function, I’m proud. The second call, B, adds each item x to the accumulated sum in y…it’s a SUM function. The last call, C, returns the greatest value in the sequence, or MAX. (MIN is left as an exercise for the reader, ha!)

These are among the simplest reduce functions, we’ll include them in FinQ as part of the basic statistics functions (upcoming post). The implementations also reveal limitations in C#’s functional programming features, more to come on that topic for certain. But first, let’s take another look at Reduce, one that I hope you’ll find interesting.

Published in categories: FinQ, Programming


FinQ Reduce


Published on February 25th, 2009
Leave a Comment

The Reduce function uses an initial value and an aggregation function to build a result value from a sequence. Taking the sum of a set of numbers is a reduction, a reduction by addition.

public static R Reduce<T, R>(this IEnumerable<t> list, R init, Agg<T, R> agg)
{
    R result = init; 

    if( null == list ) return result; 

    foreach (T it in list)
        result = agg(it, result); 

    return result;
}

The Reduce implementation above is very similar to Map in that it takes two template types T and R for the input type and result type. Reduce adds an init parameter matching the result type which is used to seed the reduction. For sum or count functions, zero would likely be a good init value. If you had a long-running calculation that needed to be suspended and resumed a non-zero init value might be useful. I’ll re-visit the Reduce implemenation soon and use some non-scalar types for the result and init types.

The Agg<T,R> type is a delegate(function signature) that accepts an object of type T and an object of type R but returns a single object of type R. During the execution of the Reduce function, each result is folded into the next aggregate function call. (For the first call, init is folded into the call.)

The SQL aggregation functions MIN, MAX, AVG, COUNT, and SUM have simple corresponding Reduce functions. Before I demonstrate how those work, let’s take a look at the C# lambda function.

Published in categories: FinQ, Programming