On the Bleeding Edge of Puppet

Wednesday, January 29, 2014

Puppet Internals - Separation of Concerns

Separation of Concerns

As we are getting closer to Puppet 4 where the future parser (available as an experimental feature since 3.2) and future evaluator (about to be released as an experimental feature in 3.5) are expected to become the standard, the implementation of these two (now experimental) features is something that will be the concern of many contributors.

As the implementation in these new features is somewhat different from the rest of the Puppet code base I want to explain the rationale behind the design, describe the various techniques and how they are used in the future parser and evaluator.

In this series of posts I am going to be talking about concepts such as polymorphic dispatch, adapters, and modeling, but also about more concrete things such as association of line/position with expressions and error handling.

Jumping ahead in the story - polymorphic dispatch and adapters are techniques that helps us implement code in a way that keeps different concerns separate. Before explaining how these techniques work it is important to understand what "keeping concerns separate" (or "separation of concerns" as it is normally referred to) is all about, and what happens when there is no such separation.

Separation of Concerns

Even the ancient Macedonians knew the importance of 'Separation of Concerns'. They phrased it differently though - King Philip II of Macedon, father of Alexander the Great, is attributed to have coined the phrase 'Divide and conquer'. At that time (382-226 BC) they were naturally not dealing with maintenance of a large code base; 'divide and conquer' was the strategy the Macedonians used to deal with the Greek city-states they ruled over, and they had two goals in mind:

break up top power alliances into smaller chunks to make them easier to subdue/manage
prevent small power groups from linking up and becoming more powerful

This sounds like a perfect strategy for software! While we from a functional standpoint want our logic to "link up" an "become more powerful" we certainly do not want to be subdued by it from a maintenance and future development standpoint.

If you have worked with a long lived software project that has gone without a good trim for a long time you know what this looks like. Everything you want to change is interlinked with everything else to the point that it is almost impossible to begin renovations without causing the entire structure to collapse. This reminds me of the song Dem-Bones, only modified to go "Neck-bone connected to ... every other bone". This is also known by the term "high coupling".

The term "cohesion" is closely related to "coupling" but measures how various features of a component are focused (have much in common) - the higher the cohesion the better. We can observe low cohesion when we find an odd responsibility (we can not parse a file without also keeping track of the file being changed). This Wikipedia article talks about the various forms of cohesion - thematically low cohesion comes from grouping (typically methods) based on a non-functional principle.

Certainly no one wants to create a system guided by the opposite principle; "lets mix it all up" - so what causes software to almost secretly grow in complexity while no one was watching? Are there evil chiropractors at work here, rearranging Dem-Bones?

Of source not, it typically starts with one small step taken by one developer, added to by the next, and so on. This goes on for a while, and then someone decides that there is too much duplication of code and code is locally refactored. While there is now less code, there is also more coupling. After a period of increased coupling there is usually a phase of feature expansion. Now this is more difficult because of the coupling and there is usually time pressures preventing a full scale refactoring. Instead new functionality is shoehorned in. The system again undergoes refactoring and common pieces are broken out into utilities for the sake of reusability (again less code and more coupling). When such refactoring is performed with only a single use case in mind the principle used for grouping is often flawed. When this has been ongoing for a while you will find several "Swiss army knifes" backed by an army of Utility classes.

What was once bad in terms of duplicated code (but easily changeable, because a variation that was no longer needed could easily be deleted or changed) has been replaced by logic that almost everything depends on and no one dares to touch since consequences are very hard to predict. While the practice of creating common reusable functionality in itself is a good thing, we typically rush into it, under-design and too quickly let the use of our new shiny utility permeate the system.

It does not really matter which underlying technology something is implemented it - the problem only manifests itself slightly differently depending on if it is object oriented, or functional, if it is strongly typed or not. In general, the less stringent the implementation language is, the more trouble you can get into, and faster.

The qualities to strive for are to have small, simple things with clear focus combined into larger things with clear focus.

What is it we are separating?

What is it we are trying to separate anyway? Maybe you have heard that you should separate "data" from "code". This does however not help us much as we always deal with some sort of data - e.g. 1, 42, and "hello world" are all pieces of data. I like to think of these as being content and algorithms - and what we like to separate, just like King Philip II did, is that any part, be it content or algorithm, is broken up into manageable chunks. And by the way, content is not always just data in the form of numbers, text, or structures thereof - our algorithms could be of an higher order and juggle other algorithms (such as selecting which one out of many algorithms to use, or composing a complex algorithm out of smaller independent ones).

There once was a to_s

Even something simple as converting an object to string form is subject to the kinds of problems "separation of concerns" warns us about. The first to implement a to_s on a class clearly did so for a specific purpose - maybe it was for debugging, maybe to print out information about the object in a report, maybe a label in a user interface, or something that is included in an error message to help identify the location of a problem. This list can become very long, there simply is no convention. Instead, there is typically a tendency to implement multiple to_s - e.g. to_label, to_debug_string, to_json, to_pson, to_yaml, etc. etc.

While other functions that we want to apply to a particular piece of data may not be as generic as "represent in textual form" there are often several variations on how we want something done. This may manifest itself as several similar methods or by having a rich set of parameters. Both adding complexity to the implementation.

Using a Strategy

What we want to do, instead of pushing every possible piece of wanted functionality into a class, is to separate functionality into a separate piece of logic. This is often referred to as using a "Strategy" or a "Policy" pattern. Depending on the language used such separation could be achieved with inheritance, multiple inheritance, by using a "mixin", or aggregation. Of these only aggregation (or indeed complete separation) allows us to dynamically compose the behavior - most languages only have features for static binding (even if it may be late binding at runtime).

The Ruby way of doing this is to write a module, and at runtime decide to include a specific module into a class. This provides static late binding and we have to be careful that modules do not step on each other since their methods are overlaid on top of what is already declared. Once included it is hard to get rid of the logic, if we for some reason need to use different strategies at different times (without restarting the runtime).

Anemic Models

We want our content ("model", or "data" if you like) to be as simple as possible. I often use the term anemic to describe the desired quality. A class that holds content should only contain the intrinsic data, and the access-logic that protects its integrity. The rest of the strategy / algorithm should be implemented separately.

Typically the behavior of data boils down to:

Attribute accessors (a.k.a "getters" and "setters")
Type safe setters (catch bad input early)
Generic operations such as "equality" and "identity"
Intrinsics such as "a car has four wheels", "a specific wheel can only be mounted on one car at a time" (that is if we are implementing a Car object).

Degrees of separation

Maybe it is enough to just not have all the code in one place and compose either statically at "compile time", or selected dynamically at "run time". But what if we want to use different strategies all at the same time? (Just think about all the various ways we may want to turn an object into "textual representation").

Sometimes there is good reasons to create a design with high coupling and specialization - usually to get performance out of the system. But as humans we are often dead wrong when we guess what may be the bottlenecks of our system and it is best to only optimize after measuring. A problem with a design using high coupling is that it is more difficult to change into a loosely coupled design than vice versa (for performance reasons), and it is also more difficult to test.

As a rule of thumb, design with anemic content model and use loosely coupled strategies.

An Example

I am picking ArithmeticExpression as an example. Later in this series I will get to the details of the real implementation, but the principle is the same.

An ArithmeticExpresion is used to represent an expression such as "1 + 2" in the Puppet Programming Language. It has a left-expression (a '1' in the example), a right-expression ('2'), and an operator ('+'). It can be trivially implemented in Ruby as:

class ArithmeticExpression
  attr_accessor left_expression
  attr_accessor right_expression
  attr_accessor operator
end

Clearly this implementation does a poor job of protecting its integrity; the operator can be anything, and so can the left and right expressions. We need to protect the setters by changing the use of attr_accessor to attr_reader and write setters that validate their arguments. It is also inconvenient to create as we need to set the three attributes individually. Apart from these problems it is a decent anemic design (in fact, it cannot really be more anemic than this).

The problems come when we start adding evaluate and to_s. What is the purpose of the to_s? The implementation below tries to recreate the source (which may be fine for something like a simple arithmetic expression, but what about an if-then-else expression, or indeed the top level construct containing all the expressions in a file, there we will need some sort of formatting if the ambition is to be able to recreate the source in human readable form. Is it really a good idea to implement this in one small piece per expression?

class ArithmeticExpression
  attr_accessor left_expression
  attr_accessor right_expression
  attr_accessor operator

  def to_s
    "#{left_expr} #{operator} #{right_expression}"
  end

  def evaluate
    left_expression.evaluate.send(operator, right_expression.evaluate)
  end

end

The evaluate method also has problems. Clearly it must be given some kind of input (a scope) to access variable values etc. but the real problem lies in that there is now only one way an ArithmeticExpression can be evaluated. Its evaluation will be embedded into other evaluations. What if we want to control which implementation to use at runtime? What if we want to support the '+' operator on objects that do not implement this operation directly? How do we handle errors? What if we want to implement a debugger that allows us to step through the evaluation? Also, ArithmeticExpression is only one out of a hundred or so expressions in the Puppet Programming Language and breaking something like a debugger-concern into pieces in hundred places is not particularly fun to implement and is costly to maintain. (We cannot simply inherit the behavior since it is intermixed with the particular evaluation of each expression and its subexpressions).

While we could use language techniques such as inheritance to implement some common behavior we then increase coupling, and we still cannot modify the behavior dynamically. We could also use a "inversion of control" (or injection pattern) and instantiate each expression with strategies for producing a string and for evaluation.

class ArithmeticExpression
  attr_accessor left_expression
  attr_accessor right_expression
  attr_accessor operator

  def initialize(label_provider, evaluator)
    @label_provider = label_provider
    @evaluator = evaluator
  end

  def evaluate
    @evaluator.evaluate(self)
  end

  def to_s
    @label_provider.string(self)
  end
end

Now, we have delegated the production of a textual representation and evaluation to separate strategies and thus separated the concerns. We have however also introduced bloat since each expression now needs to carry two additional references, and we need to pass them to each constructor. We can make that better by providing a default implementation that gets used if the caller did not give the implementation to use, but that is more boiler plate code we need to write for each of the hundreds of expressions. (And we have not even begun handling debugging, or more advanced formatting). While we did handle the concerns via delegation, our ArithmeticExpression is still aware of these concerns - it has to have methods for them; albeit small.

In this case, we really want a clean separation - the ArithmeticExpression simply should not know how to represent itself in textual form, nor be able to evaluate itself. We want something that is completely anemic to allow us to deal with the computational concerns more effectively.

Here is what the real implementation of ArithmeticExpression looks like. It is implemented using RGen, a modeling framework that (among other things) ensures the integrity of the objects (in this case that left and right are indeed Expressions, and that only supported operations can be assigned).

class BinaryExpression < Expression
  abstract
  contains_one_uni 'left_expr', Expression, :lowerBound => 1
  contains_one_uni 'right_expr', Expression, :lowerBound => 1
end

class ArithmeticExpression < BinaryExpression
  has_attr 'operator', RGen::MetamodelBuilder::DataTypes::Enum.new(
    [:'+', :'-', :'*', :'%', :'/', :'<<', :'>>' ]),
    :lowerBound => 1
end

The use of RGen and modeling is a separate topic, but I will jump ahead a bit to enable reading the above code;

abstract means that the class can not be instantiated (there are no pure BinaryExpression objects in the system, only objects of concrete subclasses such as
ArithmeticExpression).
'contains_one_uni' means a containment reference to max one of the stated type
- containment means that the referenced object may only be contained by one parent (compare to the example "a wheel of a car can only be mounted on one car at the time"),
- 'uni' means that the reference is uni-directional; in general, an Expression does not know about all the places where it may be contained.
:lowerBound=>1 declares that the value is required.
An Enum data type allows one out of a set of given symbols to be assigned

As you probably noted, there is no to_s and no evaluate method. These are instead implemented as separate strategies - e.g. there is an Evaluator class that has an evaluate method, there is a LabelProvider strategy when we want a textual representation to be used as a label, and yet another strategy for production of the text representation to use when performing expression interpolation into strings. How these work will be covered in posts to come.

Summary

In this post I have shown that it is desirable to separate concerns between content and algorithms operating on content and that it is desirable to implement content as anemic structures that only provides basic navigation of attributes and protection of their own integrity. We want our designs to have low coupling (i.e. interchangeable parts) and high cohesion (concerns that are functionally focused).

In the next Post

In the next post I will be covering the technique called polymorphic dispatch since it plays an important role when implementing strategies.

Monday, December 30, 2013

Operating on Types

In the previous post about the Puppet 3.5 experimental feature Puppet Types I covered the Class and Resource types and that concluded the tour of all the currently available types.

This time, I am going to talk about what you can do with the types; the operators that accept types as well as briefly touch on how types are passed to custom functions.

The Match Operators

Almost all of the previous examples used the match operator =~ so it should already be familiar. When the RHS (right hand side) is a Type, it tests if the LHS (left hand side) expression is an instance of that type. Naturally !~ tests if the LHS is not an instance of the type.

Equlaity Operators

The equality operators ==, and != also work on types. It should be obvious that == tests if the types are equal and != that they are not. Equality for types means that they must have the same base type, and that they are parameterized the same way - essentially "do they represent the same type?".

Integer[1,10] == Integer[1,10] # true
Integer[1,10] == Integer       # false
Integer[1,10] != Integer[7,11] # true

Comparison Operators

The comparison operators <, <=, >, >= compares the generality of the type (i.e. if the type is more general or a subtype of the other type). As you may recall, Object is at the top of the hierarchy and is the most general, so is is greater than all other types.

Object > Integer          # true
Object > Resource['file'] # true
Integer < Object          # true

Compare these two expressions:

Integer < Object         # true
Integer =~ Type[Object]  # true

They basically achieve the same thing, the first by comparing the types, and the second by first inferring the type of the LHS expression (i.e. Type[Integer]). Which operator to use (match or comparison) depends on style, and if you have an instance or a type to begin with etc.

There is currently (in what is on master as this is written) a difference in that the comparison operators checks for assignability which always allows for undef. This may change since the rest of the type system now has solid handling of undef / Undef, and it currently produces the somewhat surprising result:

Integer > Undef    # true

This because the operator is implemented as "if an instance of the type on the right can be assigned to something type constrained by the type on the left, then the right type is less than the left (or equal)"

In Operator

The in operator searches for a match of the LHS in the RHS expression. When the LHS is a Type a search is made if RHS has an entry that is an instance of the type. With this it is very easy to check say if there is an undefined element in an array:

Undef in [1,2,undef]  # true
String in [1,2,undef] # false

Case Expression

The case expression also handles types. Normally, the case expression compares a test expression against a series of options using == (or =~ if the option is a regular expression). This has been extended to also treat the case when the option is a Type as a match (i.e. an instance-of match).

case 3 {
  Integer : { notice 'an integer value' }
}

If you do this using a Type:

case Integer {
  Type[Integer] : { notice 'an integer type' }
}

Selector Expression

The selector expression treats types the same way as the case expression

notice 3 ? {
  Integer => 'an integer value'
}

notice Integer ? {
  Type[Integer] => 'an integer type'
}

Interpolation

You can perform string interpolation of a type - it is simply turned into its string form:

$x = Array[Integer]
notice "the type is: $x"

notice "the type is: ${Array[Integer]}"

Both print:

Notice: Scope(Class[main]): the type is Array[Integer]

Accessing attributes of a Resource

You can access parameters of an instance specific Resource type:

notify { announcement: message => 'This works' }
notice Notify[announcement][message]

prints:

Notice: Scope(Class[main]): This works

Note that the use of this depends on evaluation order; the resource must have been evaluated and placed in the catalog.

It is also possible to access the parameter values of a class using this syntax, but not its variables. Again, this depends on evaluation order; the class must have been evaluated. It must naturally also be a parameterized class.

Resource Relationships, Override and Defaults

The Puppet 3.x statements/expressions involving resource references continues to work as before. You can use the relationship operators ->, <-, ~>, <~ between Resource types to establish relationships. Resource types also continue to work in resource defaults and resource override expressions.

Summary, and some open issues

In this blog series I have described the new Puppet Type System that is available in the experimental --parser future in Puppet 3.5. As noted in a few places, there may be some adjustments to some of the details. Specifically, there are some outstanding issues:

Should comparison operators handle undef differently?
Should Regexp be treated as Data since it cannot be directly serialized?
Do we need to handle Stage and Node as special types?
Is there a need for a combined type similar to Variant, but that requires instances to to match all its types? (e.g. match a series of regular expressions)
Is it meaningful to have a Not variant type? (e.g. Not[Type, Ruby, Undef])
Should Size be a separate type (instead of baked into String, Array, Hash and Collection)?
What are very useful types in say Scala, or Haskel that we should borrow?

Playing with the examples

If you want to play with the type system yourself - all the examples shown in the series work on the master branch of puppet. Simply do something like:

puppet apply --parser future -e 'notice Awesome =~ Resource'

That's it for now.

Sunday, December 29, 2013

Class and Resource Types

Type Hierarchy

In the previous post about the Puppet 3.5 experimental feature Puppet Types I covered the Variant and Data types along with the more special Type and Ruby types. Earlier posts in this series provide an introduction to the type system, an overview of the types, a description the general types Scalar, Collection, Array, Hash etc.

This time, I am going to talk about the types that describe things that end up in a Puppet catalog; Class and Resource, subtypes of Resource, and the common super type CatalogEntry.

Type Hierarchy

Here is a recap of the part of the type system being covered in this post.

 Object
   |- CatalogEntry
   |  |- Resource[resource_type_name, title]
   |  |   |- <resource_type_name>[title]
   |  |
   |  |- Class[class_name]
   |  |- Node[node_name]
   |  |- Stage[stage_name]

The Catalog Entry types in Puppet 3x

In Puppet 3x there is the notion of a reference to a class or resource type using an upper cased word, e.g. Class, File. In 3x it is also possible to refer to a specific instance of class or resource by using the [] operator and the title of the wanted instance.

So, in a way, Puppet 3x has a type system, just a very small one with a very limited set of operations available.

Backwards Compatibility

It was important that the new Type System was backwards compatible. All the existing puppet logic is frequently using "resource references" and references to type using upper cased words. It was very fortunate that it was possible to extended the "resource reference" syntax to that of parameterized types (as explained in this series of blog posts). Popular type names (like String, and Integer) did not collide with existing resource type names.

Hence, going forward, when there is an upper cased word (e.g. Class, File, Apache) you are looking at a type, and when it is followed by a [] operator, it is a parameterized type.

The catalog entry types are slightly more special than the general type as it is possible to create an array of types.

The Catalog Entry Type

The CatalogEntry type is simply the common type for Class and Resource. It is not parameterized.

The Class Type

The Class type represents Puppet (Host) Class. When not parameterized it matches all classes. When parameterized with the name of a class it matches that class. When parameterized with multiple class names the result is an array of Class type, each parameterized with a single class name.

class one { }
class two { }

Class[one] =~ Type[Class]      # true
Class[one] =~ Type[Class[one]] # true
Class[one] =~ Type[Class[two]] # false

Class[one, two] =~ Array[Type[Class]]        # true
Class[one, two] == [Class[one], Class[two]]  # true

The class name can be any string expression as long as the result is a valid class name.

The Resource Type

The Resource type is the base type for all resource types (as they exist in Puppet 3x). The Resource type is parameterized with a type name (e.g. 'File') when a reference to the resource type itself is wanted, and with a type name, and one or more titles to produce a reference to an instance (or array of instances) of the particular resource type. There is no distinction between a resource type defined in a ruby plugin, or a user defined resource type created with the define keyword in the Puppet Programming Language. The examples below use the well known File resource type, but it could just as well be MyModule::MyType.

file { '/tmp/a': }
file { '/tmp/b': }

Resource['File'] =~ Type[Resource['File']]  # true
Resource['file'] =~ Type[Resource['File']]  # true

Resource['file'] == File                    # true
Resource[File] == File                      # true
Resource[file] == File                      # true

Resource[file, '/tmp/a'] == File['/tmp/a']                    # true
Resource[file, '/tmp/a', '/tmp/b] == File['/tmp/a', '/tmp/b'] # true
File['/tmp/a', '/tmp/b'] == [File['/tmp/a'], File['/tmp/b']]  # true
File['/tmp/a', '/tmp/b'] =~ Array[Type[File]]                 # true
Resource[file]['/tmp/a'] == File['/tmp/a']                    # true

As you can see, the syntax is quite flexible as it allows both direct (e.g. File) reference to a type, and indirect (e.g. Resource[<type-name-expression>]). The type name is case insensitive.

The general rules in the type system are:

A bare word that is upper cased is a reference to a type (e.g. Integer, Graviton)
If the type is not one of the types known to the type system (e.g. Integer, String) then it is a Resource type name (e.g. Graviton means Resource['graviton']).

Naming Advice

The set of known types in the type system may increase over time. If this happens they will most likely represent (be named after) some well known data structure (e.g. Set, Tree) or computer science term (e.g. Any, All, Kind, Super). It is therefore best to avoid such names when creating new resource types. Resource types typically represent something far more concrete, so this should not be a problem in practice. In the unlikely event there is a clash it is always possible to reference such resource types via the longer Resource[<type-name>] syntax.

This problem may also be remedied by the introduction of placing resource types inside a module namespace. The type system is capable of handling this already, but the rest of the runtime does not yet support this. (E.g. if you insist on having a resource type called 'String', you could refer to it as MyModule::String.

Just to be complete; fully qualified resource type names works for user defined resource types (i.e. when using the define keyword in the Puppet Programming Language).

Node and Stage

And finally, I have reached the frontier of the development of the Type System. The Node and Stage types are actually not yet implemented. The things they are intended to represent do exist in the catalog, but it is a question about what they really are - just specializations of resource types or something different? This is something that will be sorted out in the weeks and months to come before Puppet 4.0 is released.

In the Next Post

So far, examples have only used a handful of expressions to operate on types - i.e. == =~ and [], and iteration. In the next post I will cover the additional operations that involve types in, comparisons with <, <=, >, >= and how types can be used in case expressions.

Saturday, December 28, 2013

Variant, Data, and Type - and a bit of Type Theory

In the previous post about the Puppet 3.5 experimental feature Puppet Types I covered how the type system handles undefined values and empty constructs. Earlier posts in this series presents the rationale for the the type system, and an overview of the fundamental types.

This time, I am going to talk about the remaining general types; the very useful Variant and Data types as well as the more esoteric Type type. I will also explain the Ruby type, the rationale and its role in the type system.

The Variant Type

Let's say you want to check if values in a hash are either one of the words "none" or "all", or is an array of strings. This is easily done with a Variant:

$my_structure =~ Hash[Variant[Enum[none, all], Array[String]]]

The Variant type considers instances of one of it's types as being an instance of the variant. An unparameterized Variant matches nothing.

To accept either an array of strings, or an array of numeric (i.e. we do not want a mix of numeric and strings in the same array) we can write:

Variant[Array[String], Array[Numeric]]

To accept a symbolically named color, or a RGB integer value (0 to 0xFFFFFF) we can write:

$colors = [foreground, background, highlight]
Variant[Enum[$colors], Integer[0, 0xFFFFFF]]

Variant and Optionality / Undef

If you want to make the variant optional, you can add Undef as a type parameter (i.e. there is no need to wrap the constructed variant type in an Optional type). In fact, Optional[T] is really a shorthand for Variant[T, Undef].

The type called Data

The Data type is a convenience type that represents the sane subset of the type system that deals with the regular types meaningful in the Puppet Programming Language. Behind the scenes Data is really a Variant with the following definition:

Variant[Undef, Scalar, Array[Data, 0], Hash[Scalar, Data, 0]]

This means that Data accepts Undef, any Scalar, and nested arrays and hashes where arrays and hashes may be empty or contain undef. A hash entry may however not have an undef key.

Default in Array and Hash

The Data type is the default type of values in an Array or Hash when they are not parameterized, such that:

Array == Array[Data]         # true
Hash  == Hash[Scalar, Data] # true

Data vs. Object

If you are tempted to use Object to mean "any" you must be prepared to also handle all of the CatalogType sub types (Class, Resource and its subtypes), the Type type, and the Ruby type, and possible future extensions to the type system for other runtime platforms than Ruby.

While the above mentioned types can be serialized in string form and parsed back to a type representation they can not be directly represented in most serialization formats.

The Type Type

Since all the various values that are being used have a type, and we allow types themselves to be used as values; assign them to variables etc. we must also have a type that describes that the value is in fact a type. Unsurprisingly this type is called Type, and it is parameterized with the type it describes. This sounds more confusing than what it is - and is best illustrated with an example:

Integer =~ Type[Integer]  # true

The next question is naturally what the type of Type is - and you probably guessed right; it is also Type (parameterized with yet another Type). And naturally, for each step we take towards "type of", it gets wrapped in yet another Type.

Type[Integer] =~  Type[Type[Integer]] # true

And this does indeed go on to Infinity. While this can be solved in various ways by "short circuiting" and erasing information, there is really very little practical need for such a solution. We could state that the type of Type[Integer] is Type[Type], or one level above that by making the type of Type[Type[Integer]] be Type[Type]. We could also introduce a different abstraction like Kind, maybe having subtypes like ParameterizedType, FirstOrderType, HigherOrderType and so on. This however have very little practical value in the Puppet Programming Language since it is not a system in which one solves interesting type theory problems. There simply are no constructs in the language that would allow making any practical use of these higher order types.

With that small excursion into type theory, there actually is practical value in being able to reason about the type of a type. As an example, we can write a function where we expect the user to pass in a type reference, and we want to validate that we got the right ehrm... type. Let's say the function does something with numbers and you are willing to accept an Array of Integer and Float ranges, which is illustrated by this expression:

[Integer[1,2], Float[1.0, 2.0]] =~ Array[Type[Numeric]]  # true

This is about how far it is of practical value to go down this path. The rest is left as a paradox like the classic, "Is there a barber that shaves every man that does not shave himself?".

The Ruby Type

The type Ruby is used to represent a runtime system type. It exists in the type system primarily to handle configuration of the Puppet Runtime where it is desirable to be able to plugin behavior written in Ruby. When doing so, there must be a way to reference Ruby classes in a manner that can be expressed in ways other than Ruby itself. The type system has the ability to describe a type in string form, and parse it back again.

The Ruby type also serves as a "catch all", just in case someone writes extensions for Puppet and returns objects that are instances of types that the Puppet Programming Language was not designed to handle. What should the system do in this case? We don't want it to blow up so something sensible has to be returned - for no other reason than to be able to print out an error message with reasonable information about the alien type.

There are also experiments being made with making configuration of the Puppet runtime in the Puppet Programming Language - but that is the topic of another series of blog posts.

While you can create a Ruby type in the Puppet Programming Language, there are currently no functions that operate on those - so they have very limited practical value at the moment. If you however write your own custom functions there is support in Ruby to use the Ruby type, instantiate a class etc.

In the Puppet Programming Language, a Ruby type is parameterized with a string containing the fully qualified name of the Ruby class.

Ruby['MyModule::MyClass']

In the Next Post

With the adventure into "what is the type of all types" in this post, I am going to return to what the type system is really all about; supporting the types that end up in the catalog; Class and Resource.

Friday, December 27, 2013

Let's talk about Undef

Let's talk about Undef In the previous post about the Puppet 3.5 experimental feature Puppet Types I presented and overview of the types in the Puppet Type System and provided details about the Scalar types.
In this third post on the topic of the new Type System, I am going to present undef - what it means to have undefined value, and what it means when something is empty. At least from a type perspective - if you are in a bar with a drink in your hand with undefined or empty contents you have a very different problem.

Let's talk about Undef

All computer languages have to deal with "undefinedness"; a variable that has no value, an array or hash that is empty, a hash with no value for a key, etc. When the language also has the ability to use a symbol to denote the "undefinedness" it gets more complicated since it is now also represented by a value and it can be used as the value of a variable, as a key or value in a hash entry etc.

Only undef is Undef

To start out simple, the literal undef in the Puppet Language is the only thing that is an instance of the Undef type. We can easily confirm this:

 undef =~ Undef     # true
 42    =~ Undef     # false
 hi    =~ Undef     # false
 ''    =~ Undef     # false
 []    =~ Undef     # false
                    # etc.

The undef value is also an Any.

 undef =~ Any    # true

The value undef is also produced when something looked up does not have a value.

 $hsh = { a => 10 }
 $hsh[b] =~ Undef    # true

So far, this is quite straight forward. The fun starts when considering collections of values that may be empty, contain a mix of values and undef etc.

Combining undef with values

When the type system performs type inference (the act of figuring out the type of values) it will combine types to produce a single type that describes the value / values. It does this by widening (i.e. making the type more general). Say, if we combine an Integer with a Float, the inference will return Numeric, since that is the type that is general enough to describe both of them. When undef is involved, the only more general type is Any.

 [1, 3.14]  =~ Array[Numeric]  # true
 [1, undef] =~ Array[Numeric]  # false
 [1, undef] =~ Array[Any]      # true

We now have a problem if we do not want to accept all kinds of values just because we want to accept undef values among the numbers. Luckily, the type system has a type called Optional that does exactly what we want in this situation, it accepts something of a specific type or Undef.

 [1, undef]    =~ Array[Optional[Numeric]] # true
 [1, a, undef] =~ Array[Optional[Numeric]] # false

In case you wonder, if the array only contains undef values, its type is Array[Undef].

 [undef, undef] =~ Array[Undef]  # true

Emptiness

"Emptiness" is very much related to "Undefinedness". As an example - what is the type of the elements of an empty array? Clearly, there is a difference between an empty array and an array containing undef values.
The type system handles this by using a different quality of the array; its size. The concept is generalized; Collection, Array, Hash, and String are types that consider the size of values - they are said to be sized types.

By default a sized type allows the instance to be empty (as well as having unlimited size).
An empty sized collection (array, hash) has an element type that matches any type

Here are some examples:

 [] =~ Array[Integer]         # true
 [] =~ Array[String]          # true
 {} =~ Hash[Scalar, String]   # true

We can make this behave in a strict way by also constraining the size - read on...

Constraining the Size

The Type System supports constraining the size of the sized types. This is done by using a range (like we have already seen when expressing Integer and Float ranges).
We can specify that a String should not be empty:

 String[1]        # at least one character
 '' =~ String[1]  # false

We can cap the upper limit:

 String[1,80]           # min 1, max 80 characters
 'abcd' =~ String[1,3]  # false, too long

For an Array the limit comes after the type:

 Array[Integer, 1]      # at least one Integer
 Array[Integer, 1, 10]  # at least one Integer, at most 10

The same is true for Hash:

 Hash[Scalar, Integer, 1]      # at least one Integer entry
 Hash[Scalar, Integer, 1, 10]  # at least one Integer entry, at most 10

The Collection type also accepts a range (but no type).

Collection[1]  # i.e. a non-empty collection (array or hash)

The range can be specified as one or two integer values, using a literal default, by giving an Integer type with a range, or an array containing the values. This means you can do things like these:

$range = Integer[1,10]
$arr =~ Array[Integer, $range]

$range = [$from, $to]
$arr =~ Array[Integer, $range]

In the Next Post

In the next post I am going to talk about the Variant, and Data types - types that represent a selection of other types and how they can be used.

Friday, December 20, 2013

What Type of Type are You?

What Type of Type are You

In the upcoming Puppet 3.5.0 the experimental Type System (first introduced into the code base in Puppet 3.3.0) has been put to good use in the "future parser". In this post I will show some of the things that the type system can do to help you increase the quality of your Puppet logic. This is also an introduction to the concept of Types. I will come back with more posts about additional types, and how they can be used in various Puppet expressions.

But all series must have a beginning...

What is a Type System?

At first when mentioning "types", you may start to feel nauseous thinking about statically typed programming languages littered with superfluous type declaration. This kind of "your grandfather's typing" is not at all what the new type system is about - like this horrible piece from C.

 char *(*(**foo[][8])())[]; // huh ?????

In programming languages, a type system is a collection of rules that assign a property called a type to the various constructs—such as variables, expressions, functions or modules — a computer program is composed of. The main purpose of a type system is to reduce bugs in computer programs by defining interfaces between different parts of a computer program, and then checking that the parts have been connected in a consistent way. This checking can happen statically (at compile time), dynamically (at run time), or as a combination thereof. -- Wikipedia

Don't worry - Puppet is a dynamically typed language and will remain so. The new Type system is there to help you with certain tasks. It is not a straight jacket designed to help a static compiler.

You are already using types

Whenever you are using Puppet match expression to check if a String has a particular pattern you are actually using typing! With a bit of type jargon, we can say that what you are doing is checking if your particular string is an instance of a subtype of String - one out of many that also matches the pattern.

 $my_string =~ /(blue)|(red)|(green)/

A type system is just that, a pattern system that is applied to certain properties of the objects it operates on. What the example does is that it matches all kind of red, green, blue strings - e.g. 'rose red', 'deep red', 'dark blue', 'viridian green'. In other words we have written a statement that checks "Is this string the type of string that has a color word in it?".

In Puppet 3.5's future parser we can take this a step further and name the pattern.

 $primary_color_string = /(blue)|(red)|(green)/

 $my_string =~ $primary_color_string

And look, we almost (kind of) created a Type.

The Rationale for Types

Regular expression are great, but they cannot help us with everything we need to check. They can only be used with strings for example, and if we need to check a structure of some sort (say an array, or a hash) it starts to become difficult - we need to iterate, we may need to call functions and the task we tried to achieve starts to be overshadowed by general programming logic.

Let's say we want to check that an Array of values are all integers within a given range. The first problem in Puppet 3x is that all numbers are string values, and users may write them in decimal, hex or octal, so you have to write regular expressions that can handle all of those (but lets skip that painful part of the problem). We do have the comparison operators <, > etc. that work on numbers, but there are issues when we do not know if we are comparing strings with text and numbers or arrays or hashes, so we must also call functions to check if values are indeed numeric. However, since there is no iteration in 3.x we cannot loop over the array, and we do not know how many elements there are so we cannot hardcode the checks (first check entry 0, then 1, and so on). (In practice, the path with least extra work is to write a custom function in Ruby or find something on the forge that suits your needs).

 # hard-coded
 is_integer($my_array[0]) and $my_array[0] >= 0 and $my_array[0] <= 10
 # this is getting old quickly
 # give up...

With the future parser we can at least iterate:

$my_array.each |$element| { 
  if is_integer($element) and $element >= 0 and $element <= 10 {
    # do something
  } 
}

Which is much better naturally, but still noisy.

If we are doing this to find elements in the array that do not comply with our rules, we can iterate to find those that do not match, and then use a function from stdlib to check if what we found is empty - for example:

unless $my_array.filter |$x| { !is_integer($x) or $x < 0 or $x > 10) }.empty {
  # we found non matching elements
}

which is a bit nicer, but still a bit too much code.

Example - an Array of Integers in a Range

Lets jump forward a bit. One of the types in the new type system is Integer, and it can be parameterized to describe a range. (A parameterized type is just like a more specific pattern - it narrows down the number of objects it matches. A parameter is typically another type, but can be something concrete like numbers used to express a range).

Another type is Array, which can also be parameterized with another type - the type of its elements. Parameters to a type is written in brackets after the type. We can put this to use in Puppet 3.5's future parser since the match operator now also matches based on type.

$my_array = [1, 2, 3, 11]
$my_array =~ Array                 # true, it is an array
$my_array ≈~ Array[Integer]        # true, it is an array, and all elements are integers
$my_array =~ Array[Integer[0,100]] # true, all values are in the range 1-100
$my_array =~ Array[Integer[0,10]]  # false, one value, 11, is not <= 10

Type Hierarchy

If you have done a bit of programming in other languages you already know that types (or Classes as they are typically called) follow a hierarchy. This is also true in the Puppet Type System.

As an example, all strings that match /(blue)|(red)|(green)/, also match /(lu)|(red)|(een)/, but not vice versa - we can say that those that match the more restrictive pattern 'colors' are also 'lu-red-eens', or that 'colors' is a sub-type of 'luredeens'.

We do the same with Types. A Numeric (just like 'luredeen') is an abstract type, and it has two sub-types; Integer and Float.

 $my_array [1, 2, 3.1415]
 $my_array =~ Array[Integer]   # nope, there is a float in there
 $my_array =~ Array[Float]     # nope, there are integers in there
 $my_array =~ Array[Numeric]   # yep, they are all numbers

Typically this is shown as a hierarchy:

Numeric
   +- Integer
   +- Float

Let's throw a String into the mix as well:

 $my_array [1, 2, 3.1415, "hello"]
 $my_array =~ Array[Integer]   # nope, there is a float and a string in there
 $my_array =~ Array[Float]     # nope, there are integers and a string in there
 $my_array =~ Array[Numeric]   # no, there is a string in there
                               # then what?

To deal with this, the Type system has additional abstract types - Scalar which describes something that has a single value, and Object, the most abstract "anything" (there are more abstract types which I will come back to). Here is the updated hierarchy:

 Object
  +-Scalar
    +- String
    +- Numeric
       +- Integer
       +- Float

And now we can check:

$my_array =~ Array[Scalar]   # true
$my_array =~ Array[Object]    # true
$my_array =~ Object           # true

So what good does checking against Object do? you may ask - it will always be true. Well, not much except it is clear that something that accepts Object is prepared to handle anything. It is also useful when there are error messages that print out the type - if you see something like "type mismatch, an Array[Object] cannot be used where an Array[Integer] is expected", you know that the problem is that there is "all sorts of stuff" in that array.

In the Next Post

There are several other types to talk about; there are the scalars Boolean, and Regexp, the abstract Collection with subtypes Array and Hash, types that deal with enumeration; Pattern and Enum, a type that allows different types called Variant, as well as puppet specific types such as Resource, Class, File, etc.

Oh, yes, there is an Undef type - we must definitively talk about undef - but that is for later

Thursday, December 19, 2013

Fixing the Mixed Metaphors in Puppet 3.4

In Puppet 3.4 there will be a new version of the future. More specifically a new version of the future parser that again brings the Puppet Language a little bit closer to Puppet 4.

Iterative Functions

One of the changes is a fix of the mixed metaphors in the iterative functions. It turned out that "We did not have all our ducks on the same page" [sic] as we had mixed the names of the functions from two different schools. Here are the final iterative functions:

each - no change
map - was earlier called collect
reduce - no change
filter - was earlier called select
reject - dropped
slice - no change

Like any mixed metaphor this stuck out as a sore throat you would not want to touch with a 10 foot pole, so we have been burning the midnight oil from both ends to get this fixed [sic]. Sorry about the inconvenience this may cause regarding renaming - the functionality is the same though, so it should be easy to change.

Here are some examples to illustrate their use:

[1,2,3,4].each |$item| { notice $item }
# Result: notice 1, notice 2, notice 3, notice 4

[1,2,3,4].filter |$item| { $item % 2 == 0 }
# Result: [2, 4]

[1,2,3,4].map |$item| { $item * 2 }
# Result [2, 4, 6, 8]

[1,2,3,4].reduce |$memo, $item| { $memo + $item }
# Result: 10

[1,2,3,4].slice(2) |$first, $second| { notice $first + $second }
# Result: notice 3, notice 7

One Syntax

The other mixed metaphor in the future parser was intentional; it had support for three different syntax styles for calling the iterative functions.

The recommended style with parameters outside the braces
a Java-8 like style using an additional arrow
and a Ruby like style with the parameters inside the braces.

Usability studies showed that the recommended syntax (as shown above) was also the preferred among the majority of test pilots. In Puppet 3.4 the alternative syntax styles have been removed. Remember, "There is light at the end of the Rainbow as the road towards the future unfolds" - to use a mixed metaphor.