Saturday, September 8, 2012

Building Console Apps in .NET

Different programming environments have different semantics for producing console applications. In VBScript, for example, the script engine starts executing code at the top of the file and continues until there is no more code to execute. In .NET, the common language runtime (CLR) looks for a specific entry point in the compiled executable. Namely, it looks for a static method marked with the .entrypoint IL directive (often the Main method). The CLR calls the entry method and the program proceeds from there. When you create a new C# console application in Visual Studio® .NET, you are given a simple class with just a static Main method:
using System;
namespace ConsoleApplication1
{
    class Class1
    {
        [STAThread]
        static void Main(string[] args)
        {
        }
    }
}
When the C# compiler compiles this code, it marks Main with the .entrypoint IL directive, making it the application's starting method. This is the simplest possible console application—it compiles but doesn't actually do anything useful when you run it. Notice how the Main method takes an array of strings as its only argument. This represents the command-line parameters passed to the console application (unlike with standard C applications, the name of the executable is not passed to the application as the first argument). Also notice how the Main method doesn't return anything. Although console apps can return an int (which can then be queried via ERRORLEVEL), return values aren't needed for pipe applications.
All communication with the outside world is done through the standard data streams. Using the System.Console.In, System.Console.Out, and System.Console.Error static properties, the .NET Framework makes the standard data streams available to the console application. System.Console.In is an instance of System.IO.TextReader; Out and Error are instances of System.IO.TextWriter. This proves to be a big advantage since you can work with the standard data streams just as you would a file or network stream.
Figure 1 shows a simple example of working with standard input and output. This is the code for a program called Passthrough, a console application that simply passes standard input through to standard output. You can compile it through Visual Studio .NET or using the command-line tools. For your reference, this and all other examples in this article will compile with a command-line call to the C# compiler, such as:
C:\> csc.exe /t:exe /out:Passthrough.exe Passthrough.cs
If you were to run this program as follows, it would appear as though you ran the dir command by itself:
C:\> dir | Passthrough.exe
using System;
namespace ConsoleApps
{
    class Passthrough
    {
        [STAThread]
        static void Main(string[] args)
        {
            string currentLine = Console.In.ReadLine();
            while(currentLine != null)
            {
                Console.Out.WriteLine(currentLine);
                currentLine = Console.In.ReadLine();
            }
        }
    }
}
An interesting property of console applications that read from standard input is that they can be used (and tested) interactively. For example, if you run Passthrough by itself (Passthrough.exe), it appears as though the program is hung—cmd.exe doesn't give you back the command prompt. If you type something and press Enter, the program echoes it back to you. Passthrough is running just as before, except that it is getting standard input from the keyboard instead of from the dir command. If you were to check your Windows Task Manager, you would see a task for Passthrough.exe. This is to be expected—Passthrough is a normal executing program that Windows identifies with a process ID. Try running the following command:
C:\> Passthrough.exe | Passthrough.exe
Here, the output of your interactive session with Passthrough routes to another instance of Passthrough, whose output routes to the console. If you check Task Manager again, you will see two instances of Passthrough.exe. Why? Because both instances of Passthrough are running simultaneously. In fact, you would see an instance of every program that you included in a pipeline. What does this mean? Remember, pipelines are the domain of cmd.exe—they have no meaning to the underlying operating system. When you construct a pipeline, as in the example I just showed, cmd.exe does the work of launching the required applications and routing their standard data streams according to the specifications of the pipeline. To the operating system, they are separate processes; you can see this through Task Manager. To the user, however, cmd.exe makes it look as though there were a single executable at work.
Most console applications process one line of standard input at a time and send one line to standard output at a time. As soon as a program that is upstream in a pipeline is finished with a line of input, it is available to be processed by a downstream program. The upstream program might have moved well beyond the line that the downstream program is currently working on. While this design will not necessarily improve the overall performance of the entire pipeline, it will tend to result in a lower overall memory requirement as well as a shorter time until the first line of output is finished being processed by the whole pipeline. There are circumstances where line-at-a-time processing is impossible (such as a sorting program or a program that needs to work with an in-memory XML tree). In these cases, standard input needs to be read in its entirety before anything can be sent to standard output. Meanwhile, downstream programs in the pipeline sit idle, waiting for something to appear on their standard input.
Just about anything can be accomplished in the Main routine of a console application. For example, the Main routine in Figure 2 builds on the idea of the Passthrough example but performs a regular expressions-based replacement on each line of standard input. If you were to run the compiled application as follows, you would see the normal output of dir, but with all instances of the string "<DIR>" replaced with "****":
C:\> dir | Replace.exe "<DIR>" "****"
static void Main(string[] args)
{
    if(args.Length < 2)
    {
        Console.Error.WriteLine("Usage:");
        Console.Error.WriteLine("Replace \"search\" \"replace\"");
        return;
    }

    System.Text.RegularExpressions.Regex re = 
        new System.Text.RegularExpressions.Regex(args[0]);
    string currentLine = Console.In.ReadLine();
    while(currentLine != null)
    {
        Console.Out.WriteLine(re.Replace(currentLine, args[1]));
        currentLine = Console.In.ReadLine();
    }
}

No comments:

Post a Comment