Wednesday, January 27, 2016

Collection Pipeline

Collection pipelines are a programming pattern where you organize some computation as a sequence of operations which compose by taking a collection as output of one operation and feeding it into the next. (Common operations are filter, map, and reduce.) This pattern is common in functional programming, and also in object-oriented languages which have lambdas.

In many programming environments, there are usually two composite data types:
  1. Hashmaps are a key-value data structure, which may be called associative arrays, hashtables, maps, or dictionaries.
  2. Lists are simple sequences. They're not quite the same as traditional arrays as they dynamically resize as you add or remove elements (some languages do call them arrays, however). They can be indexed by integer keys.

A list 'n' hash structure is by default schemaless, the lists can contain disparate elements and the hashes any combination of keys. This allows the data structure to be very flexible, but we must remember that we nearly always have an implicit schema when we manipulate a schemaless data structure, in that we expect certain data to be represented with certain keys.

A strength of the list and hash structure is that you can manipulate it with generic operations which know nothing of the actual keys present.

List and hash structures can easily be serialized, commonly into a textual form. JSON is a particularly effective form of serialization for such a data structure, and is my default choice for this. Often XML is used to serialize list 'n' hash structures.

The presence of rich lists and hashes as standard equipment in modern languages has been one of the definite improvements.  Most major languages now provide standard versions of these data structures, together with a rich range of operations, in particular Collection Pipelines.

No comments:

Post a Comment