Design notes

Pipelines v1.6

 

    

This version of Pipelines will install on a platform which hosts as a minimum requirement; the Microsoft Windows .NET Framework 2.0.

 

Home

    

Multiple Pipelines instances may execute concurrently.

 

    

Pipelines dispatches the stages in the order in which they appear in the pipeline, however; any stage may be the first to begin processing records. The relative order of the records flowing through a pipeline can be predicted; as long as the stage path only comprises stages’ that do not delay the records.

 

    

Unless the pipeline comprises a stage or stages’ that accumulate records; for example the SORT stage, and, that the input records are not excessively long, Pipelines requires only a small amount of memory to process input files of any size; as only a handful of records will be in the pipeline at any one time.

 

    

Pipelines is not pre-emptive. When a stage reports an initialisation or runtime error; Pipelines begins terminating the pipeline by instructing all active stages to quiesce. When all active stages in the pipeline chain have responded to the quiesce command and have terminated; Pipelines (the StageManager) terminates.

 

    

Pipelines is designed to execute on a single processor, where each stage/process vies for service by the StageManager; the specific design of a stage controls how it interoperates within a multi-stream pipeline configuration.

 

    

Pipelines does not verify that a pipeline is semantically correct, only that it is syntactically correct. This means that you may construct a pipeline that does not execute in the way that you expect it to. It may produce output records in a format or an order that you did not intend or it may not produce any output records at all. In view of this; when developing a pipeline that replaces the contents of a disk-file, it is particularly prudent to test the pipeline against a copy of that file. Pipelines does not issue "are you sure?" messages!

 

    

Pipelines does not work with records containing MBCS or UNICODE data (this is will be addressed in future versions of Pipelines), only the single-byte ASCII character set is supported. As a consequence, you should ensure that only ASCII-type input files are selected for modification. Pipelines cannot determine the format of an input file; it simply executes the pipeline that you specify.

 

    

Pipelines comprises a stall detection mechanism that determines when a pipeline is stalled; A stall occurs when Pipelines determines that every stage is either waiting to read a record or write a record. That is, there is no stage that is currently processing a record; all stages are either read-pending or write-pending. Pipelines writes the current status of each stage in the pipeline to a dump-file which can be inspected to determine the combination of stream connections that caused the stall.

 

    

When a stage does not specifically limit the number of input and/or output streams; the stage may process up to 4096 input streams and the unsigned integer value _MAX_INT_ output streams.  However, a pipeline configuration which connects more than a handful of input or output streams to any one stage should be considered; as badly designed.

 

Consider the following pipeline which concatenates three input files:

 

pipe (endchar ?)

     < myfile1.txt

     | a: fanin

     | > myjoinedfiles.txt

     ?

     < myfile2.txt

     | a:

     ?

     < myfile3.txt

     | a:

 

The pipeline above is limited and not easily extensible; a better approach might be:

 

pipe filelist file=myfile* ext=txt

     | > myjoinedfiles.txt

 

This pipeline is extensible by design. The FILELIST stage will select all the files with a pattern mask of: myfile*.txt.

 

    

Pipelines itself is extensible; it comprises an MS VC++ stage command API library which contains all the stage initialisation parsing functions and runtime extraction routines that support the current set of builtin stage filters. The API allows you to create new stage DLL’s that augment the current builtin set. The API addresses’ most of the needs that a stage might reasonably require; console locking and synchronisation, multi-stream connectivity, multiple column, word and field isolation, pre-process functionality, character range expansions, input and output record availability and more. Pipelines ships with a DEBUG and RELEASE API library version.

 

The Pipelines Stage command API utilises the Microsoft Foundation Class (MFC) CString class extensively and other MFC specific classes under the covers, as and when required.

 

    

Pipelines supports third-party non-API WIN32 console applications/modules through the SHELLEXECUTE stage command. SHELLEXECUTE will load and service any WIN32 application; reading input records from that process’ STDOUT and STDERR I/O streams; writing records to the SHELLEXECUTE stages’ primary and secondary output streams, respectively.

 

    

This version of Pipelines has involved separating the package documentation from the install package, and allowing Pipelines to be installed on a disk-drive and in a directory of choice. As the location of the input-files for the example pipelines cannot be determined prior to installation; rather than programmatically, statically setting example input-file source locations during the install process, I have replaced the input-file path in each example pipeline with a ‘place-holder’ or ‘macro’. These new definitions allow you to save/relocate an example pipeline to another directory and (as I may introduce new versions of example pipelines which illustrate new or extended functionality; you may want to retain older example versions for future reference), an example pipeline provided by this and future versions of Pipelines will always reference the currently installed input-file directory.

 

    

The pipeline is not interpreted; Pipelines performs a single-pass parse of the pipeline; allocating the resources required by each stage and then it begins dispatching them.

 

    

A pipeline can be specified as an element in a system command-line, batch-file or PowerScript, for example; the following pipeline sorts its input data on three key fields (1-20, 30-40 and 80-100); 30-40 is the sub-sort field of 1-20 and 80-100 is the sub-sort field of 30-40.

 
C:\>type myfile.txt | pipein | sort 1.20 30.10 80-100 | out” > mysortedfile.txt
 

    

Pipelines is offered freely and without evaluation caveats; you may use it as you please. If you have any comments, suggestions or requests; please contact me via the link below.

 

    

Lastly, If you use Pipelines, you use it at your own risk!. TenFiftyTwo does not take any responsibility implied or otherwise; for any damage caused through its use.

 

For more information contact: TenFiftyTwo