|
Pipelines by TenFiftyTwo |
Pipelines is inspired by and based
upon CMS Pipelines; an enterprise systems utility originally designed and
developed by John Hartmann of IBM.
Pipelines allows you to modify the contents of a
text/data file or files, quickly and easily. You can specify that only certain
sections of a file are to be changed; you can confine those changes to a
column, word or field range, translate words and phrases, discard or insert new
lines of data. You can perform a whole range of operations on a file or files,
using only a simple set of commands.
Pipelines build’s on the concept of
directing the output of one process to the input of another, commonly known as
pipelining. It is an old idea and almost all operating systems support an
implementation of varying degree of usefulness. In general they support the
linear, single-stream model; where if you lay each process out in a straight
line, data starts in the first process, passing into the next where it is
changed in some way, and so on down the pipeline chain in a sequential fashion
until it reaches a sink. For example:
stage1
| stage2 | stage3 | ... | stagen
Pipelines builds on this mechanism;
allowing you to create multi-stream pipelines, where the topology is no longer
horizontal and linear, but two-dimensional; where the records travel up and
down the pipeline chain through intersecting joints which control the flow of
data. Multi-stream pipelines allow you to select and operate on specific sets
of records; routing unselected records through a joint into and out of other
sections of the pipeline.
Pipelines treats its input data as lines or records, reading them one at a time from
its input and writing them one at a time to its output. As such and unless the
entire input needs to be loaded into memory-storage; Pipelines only consumes a
fraction of the memory that might otherwise be required, as only a handful of records are ever in the pipeline at any
one time.
Pipelines allows you to operate on
files in a single pass; isolating sections of the file without having to
needlessly buffer or sort the data simply in order to maintain the relative
record order. Consider the following simple pipeline, which, utilising only 6 stages, reads the file: myfile.txt and in a single-pass, changes
the word hello to goodbye only in records that contain the
word friend.
pipe (endchar ?)
< myfile.txt | a: locate ‘friend’ | change ‘hello’ ‘goodbye’ | b: faninany | > myfile.txt ? a: | take * | b:
Pipelines comprises a range of
input, output, selection and transformation stages
which cover a spectrum of manipulation functions; splitting records, stripping
characters, joining records, collating and sorting and more. On the whole,
similar operations are performed by a single stage; which means that you do not have to remember the names of an
unnecessarily lengthy list of stages.
For example; stripping characters from a record, Pipelines provides a single stage called STRIP which removes
characters from the beginning and/or the end of a record.
With Pipelines, the pipeline can be
specified on the system command-line, in a batch file or in a Pipelines file,
ext (.PPL). You design the pipeline
in your favourite editor and save it; to execute the pipeline you simply double
click the file icon and Pipelines will launch it. You can specify pipelines
which accept arguments which substitute stage
operands and even stage names and coupled with the capability to
connect pipelines together, this allows you build a range of utility pipelines
that can be called upon whenever you need them.
Pipelines is general purpose; it has not been
developed with any particular field in mind, it is simply a line/record
orientated textual processing utility that is useful for manipulating data. The
design of Pipelines is essentially a compromise between speed and flexibility.
A bespoke, dedicated program may out-perform Pipelines, However, with a
dedicated program; each time your requirements change that means altering the
source code (if it is compiled; that means re-building it as well). This is not
a problem when the program is small or simplistic. But, when we start to talk
about pattern, field, word and column selection, recursive sorting, collating,
splitting and joining records from multiple input files, possibly large files,
then we have a different scenario. Pipelines is
designed with this type of processing in mind; it is intended to offer a quick
and efficient processing utility that can help you manipulate data into a
format that suits your needs.
You may find Pipelines of use in
cases where you might otherwise have to write a program to solve the problem
and it may well save you some time and effort that could be better spent on
other tasks. Pipelines is free; there are no
evaluation caveats, you may download it and use it as you please.
Pipelines is designed and
maintained by: James Laing; if you have any questions
or comments; please contact: TenFiftyTwo.
|
|