On the Bleeding Edge of Puppet

Saturday, January 31, 2015

Puppet 4.0 Data in Modules and Environments

In Puppet 4.0.0 there is a new technology-agnostic mechanism for data lookup that makes it possible to provide default values for class parameters in modules and in environments. The mechanism looks first in the "global" data binding mechanism across all environments (i.e. the existing mechanism for data binding, which in practice means hiera, since this is the only available implementation). It then looks for data in the environment, and finally in the module.

The big thing here is that a user of a module does not have to know which implementation the module author has chosen - the module is simply installed (with its dependencies). The user is free to override values using an implementation of their choice (in the environment using the new mechanism, or with the existing data binding / hiera support).

It is expected that there will be implementations for hiera as well available in a module.

In this part 1 about the new data binding feature I will show how it can be used in environments and modules. In the next part I will show how to make new data binding implementations.

How does it work?

Out of the box, the new feature:

provides module authors with a way to select which data binding implementation to use in their module without affecting how other modules get their data.
provides users configuring an environment to select which data binding implementation to use in an environment (or all environments) - different environments can use different implementations, and the environment does not have to use the same implementation as the modules.
contains a data binding implementation named 'function' which calls a puppet function that returns a hash of data. The module author can select this mechanism and simply implement the function. A user can also configure an environment to use a function to provide the data - the function is then added to the environment.
provides module author with a way to package and share a data binding implementation in a module. It can be delivered in the same module as regular content, or in a separate module just containing the data binding implementation.

Using a function to deliver data in an environment

This is the easiest, so I am starting with that. Two things are needed:

Configuring the environment to state that a function delivers data.
Writing the function

configuring the environment

The binding provider to use for an environment can be selected via the environment specific setting environment_data_provider. The value is the name of the data provider implementation to use. In our example this is 'function'. If not set in an environment specific environment.conf, the environment inherits the global setting - which is handy if all your environments work the same way.

writing the function

The function must be written using the 4x function API and placed in a file called lib/puppet/functions/environment/data.rb under the root directory of the environment.

# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
  def data()
    # return a hash with key to value mappings 
    { 'abc::param_a' => 'default value for param a in class abc',
      'abc::param_b' => 'default value for param b in class abc',
    }
  end
end

Later in the 4x series of Puppet, it will be possible to also write such functions in the puppet language which makes authoring more accessible.

Note that the name of the function is always environment::data irrespective of what the actual name of the environment is. This because, it would not be good if the name of the function had to change as you test a new environment named 'dev' and later merged it into 'production'.

Using a function to deliver data in a module

The steps to deliver data with a function for a module is different because there are no individual settings for a module. Here are the steps:

Creating a binding using the Puppet Binder to declare that the module should use the 'function' data provider for this module.
Writing the Function

Note that in the future, the data provider name may be made part of the module's metadata. This is however not the case in the Puppet 4.0.0 release.

writing the binding

The binding is very simple as it is all boilerplate except for the name of the module and the name of the data provider implementation - 'mymodule' and 'function' in the example below. The name of the file is lib/puppet/bindings/mymodule/default.rb where the mymodule part needs to reflect the name of the module it is placed in. (The file is always called 'default.rb' since it contains the default puppet bindings for this module).

# <moduleroot>/lib/puppet/bindings/mymodule/default.rb
#
Puppet::Bindings.newbindings('mymodule::default') do
  bind {
    name         'mymodule'            # name of the module this is placed in
    to           'function'            # name of the data provider
    in_multibind 'puppet::module_data' # boiler-plate
  }
end

writing the function

This is exactly the same as for the environment, but the function is named mymodule::data where mymodule is the name of the module this function provides data for. The file name is lib/puppet/functions/mymodule/data.rb

# <moduleroot>/lib/puppet/functions/mymodule/data.rb
#
Puppet::Functions.create_function(:'mymodule::data') do
  def data()
    # Return a hash with parameter name to value mapping
    { 'mymodule::abc::param_a' => 'default value for param a in class mymodule::abc',
      'mymodule::abc::param_b' => 'default value for param b in class mymodule::abc',
    }
  end
end

Overriding a parameter in the environment

As you may have figured out already, it is easy to override the module's data in the environment. As an example we may want to provide a different value for mymodule::abc::param_b at the environment level. This is how that would look:

# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
  def data() 
    { # ... other keys and values
      'mymodule::abc::param_b' => 'env specific value for param b in class mymodule::abc',
    }
  end
end

Getting the data

To get the data, there is absolutely nothing you need to do in your manifests. Just as before, if a class parameter does not have a value, it will be looked up as explained in this blog post. Finally, if there was no value to lookup the default parameter value given in the manifest is used.

Using the examples above - if you have this in your init.pp for the mymodule module:

class mymodule::abc($param_a, $param_b) {
  notice $param_a, $param_b
}

the two parameters $param_a and $param_b will be given their values from the hashes returned by the data functions, looking up mymodle::abc::param_a, and mymodule::abc::param_b.

Note that there is no need to use the "params pattern" now in common use in modules for Puppet 3x!

More about Functions

Since the new 'function' data provider is based on the general concept of calling functions and you can call other functions from them, you have a very powerful mechanism to help you organize data and to do advanced composition.

The data function is called once during a compilation for the purpose of producing a Hash with qualified name strings to data values. The function body can call other functions, use expressions, transformations, composition etc. When the data binding kicks in, it will call the function on the first request to get a parameter in the compilation, it will then cache the returned hash and reuse it for lookup of additional parameters (this in contrast to calling the function for each and every parameter which would be much slower).

Note that the data function can be called like any other function!. This means that a module or environment can use another module's data function, transform it etc. before using its data.

Naturally, since we are dealing with functions it is easy to divide the composition of data into multiple functions, and then hierarchically compose them. Say that we want to divide the data up into two parts, one for osfamily, and one for common and we then want to combine them. We can now do a simple function composition and merge the result.

In the examples, the functions are written using the puppet language (even though they are not available in the 4.0.0 release). At the moment, it is left as an exercise to translate them into Ruby. What I want to show here is the power of combining data with functions without cluttering the examples with what you need to do in Ruby to get variables in scope, call other functions etc.

Data Composition with Puppet functions

When we add support for functions in the Puppet Language data composition can look like this:

function mymodule::data {
  mymodule::common() + mymodule::osfamily()
}

function mymodule::osfamily() {
  case $osfamily {
    'Debian' : {
       { mymodule::abc::param_a => 'the debian value for a' }
    }
    'Darwin': {
      { mymodule::abc::param_a => 'the osx value for a' }
    default: {
      { }  # empty hash
  }
}

function mymodule::common() {
  { mymodule::abc::param_a => 'the default for param a',
    mymodule::abc::param_b => 'the default for param b',
  }
}

Naturally, the functions called from the data function can take parameters. The data() function itself however does not take any parameters.

Example - Module with multiple use cases

A module author wants to provide a set of default values for a base use case of the module, but also wants to offer defaults for other use cases. Clearly, there can only be one set of defaults applied at any given time, and the data() function in a module is for that module only, so these defaults must be provided at a higher level i.e. in the environment (where it is known how the module is getting used). If the environment is also using the function data provider, it is very simple to achieve this:

function environment::data() {
  # merge usecase_x from module with the overrides
  mymodule::usecase_x() + {
    mymodule::abc::param_b => 'default from environment for param_b'
  }
}

This illustrates that mymodule has a special data function named mymodule::usecase_x() that provides an alternate set of default values for classes inside the mymodule, these are then overridden with a hash of specific overrides wanted in this environment.

Example - Hierarchical keys

If you find it tedious to retype mymodule::classname::foo, mymodule::classname::bar, etc. etc. you can instead construct the keys programmatically. Since the "data functions" are general functions, variables and interpolation can be used - e.g:

function mymodule::data() {
  $m = 'mymodule::abc'
  { "${m}::param_a" => 'the value', 
    ...
  }
}

Or why not call a function that reorganizes a hierarchical hash; say that we have param_a in classes a::b::x, a::b::y, and a::b::z, we could then do something like this:

function mymodule::data() {
  $hierarchical = { 
    a => {
      b => {
        x => { param_a => 'default for a::b::x::param_a' },
        y => { param_a => 'default for a::b::y::param_a' },
        z => { param_a => 'default for a::b::z::param_a' },
  }}}
  # Calling a function that expands the hash (left as an exercise)
  expand_hierarchical_keys($hierarchical)
}

Trying out this new featue

When this is written, the new data binding feature is available in the nightlies for Puppet 4.0.0, or you can run it from source using Puppet's master branch. (The new feature will not be available for 3x with future parser). If you are reading this after Puppet 4.0.0 has been released, just get the release.

Summary

The new data provider mechanism is a technology agnostic way of defining default data for modules and environments without dictating that a particular technology is used by the users of a module.

The new mechanism comes with a built in implementation based on functions that provides a simple yet powerful way of delivering, using and composing data. Functions in Ruby provide a simple way to extend the functionality without having to write a complete data provider.

The function mechanism, while relatively easy to write in Ruby for delivering data since they consist mostly of boilerplate code will become much more powerful and accessible when functions can be written in the Puppet Language.

In the next post about the new data binding feature I will show how to write a new implementation of a data provider.

Sunday, January 25, 2015

The Puppet 4x Function API - part 2

In the first post about the 4x Function API I showed the fundamentals of the new API. In this post I am going to show how you can write more advanced functions that take a code block / lambda as an argument and how you can call this block from Ruby. This can be used to create your own iterative functions or functions that make it possible to write puppet code in a more function oriented style.

Accepting a Code Block / Lambda

A 4x function can accept a code block / lambda. You can make it required by calling required_block_parameter in the definition of the dispatcher, or optional by calling optional_block_parameter.

Here is an example of a simple function called then, that takes one argument and a block and calls the block with argument unless the argument is nil.

Puppet::Functions.create_function(:then) do
  dispatch :then do
    param 'Any', :x
    required_block_param
  end

  def then(x)
    x.nil? ? nil : yield(x)
  end
end

Note that: Puppet blocks are passed the same way as Ruby blocks are and we can simply yield to the given block. Just as with Ruby blocks, the block can be captured in a parameter by having a &block parameter last, the block_given? method can be used, etc.

The then function is useful when looking up a nested value in a hash as it removed the need to check intermediate results for undef. Say, there may or may not be a value in a $hash such that $hash[a][b][c] and we just want that value, or undef if either a, b, or c are not found instead of an error if we say try to lookup c in undef (if b did not exist).

Instead we use the then function we just defined - like this:

$result = $hash
 .then |$x| { $x[a] }
 .then |$x| { $x[b] }
 .then |$x| { $x[c] }

And for completeness, if you were to write that without the function, you end up with something like this:

$result =
if $hash[a] != undef and $hash[a]|b] != undef and $hash[a][b][c] != undef {
  $hash[a][b][c]
}

...or worse if you start using variables for the intermediate steps

The block's number of parameters and their types

If nothing is specified about the number of parameters and types expected in the accepted block, the user can give the function any block. This is what you get by just calling required_block_parameter, or optional_block_parameter. You still get type checking, but this takes place when the block is called.

If you want to involve the number of parameters and their types in the dispatching - i.e. selecting which ruby method to call based on what the user defined in the block you can do so by stating the Callable type of the block. (The Callable type was added in Puppet 3.7, and is described in this blog post). In brief - Callable[2,2], means something that can be called with exactly two arguments of any type).

Here is the dispatcher part of the each function (from Puppet source code):

Puppet::Functions.create_function(:each) do
  dispatch :foreach_Hash_2 do
    param 'Hash[Any, Any]', :hash
    required_block_param 'Callable[2,2]', :block
  end

  dispatch :foreach_Hash_1 do
    param 'Hash[Any, Any]', :hash
    required_block_param 'Callable[1,1]', :block
  end

  dispatch :foreach_Enumerable_2 do
    param 'Any', :enumerable
    required_block_param 'Callable[2,2]', :block
  end

  dispatch :foreach_Enumerable_1 do
    param 'Any', :enumerable
    required_block_param 'Callable[1,1]', :block
  end

  def foreach_Hash_1(hash)
    enumerator = hash.each_pair
    hash.size.times do
      yield(enumerator.next)
    end
    # produces the receiver
    hash
  end

And to be complete, here are the methods the dispatchers calls - the actual implementation of the each function. As you can see, each variation on how this function can be called; with an Array, a Hash, a String, and one or two arguments are now handled in a small and precise method. (It is really just Hash that needs special treatment, all others are handled as enumerables (i.e. what ever the Puppet Type System has defined as something that can be enumerated / iterated over in the Puppet Language).

  def foreach_Hash_2(hash)
    enumerator = hash.each_pair
    hash.size.times do
      yield(*enumerator.next)
    end
    # produces the receiver
    hash
  end

  def foreach_Enumerable_1(enumerable)
    enum = asserted_enumerable(enumerable)
      begin
        loop { yield(enum.next) }
      rescue StopIteration
      end
    # produces the receiver
    enumerable
  end

  def foreach_Enumerable_2(enumerable)
    enum = asserted_enumerable(enumerable)
    index = 0
    begin
      loop do
        yield(index, enum.next)
        index += 1
      end
    rescue StopIteration
    end
    # produces the receiver
    enumerable
  end

  def asserted_enumerable(obj)
    unless enum = Puppet::Pops::Types::Enumeration.enumerator(obj)
      raise ArgumentError, ("#{self.class.name}(): wrong argument type (#{obj.class}; must be something enumerable.")
    end
    enum
  end
end

What about Dependent Types and Type Parameters?

If you read the above example carefully, or if you already are used to working with a rich type system you may wonder about type parameters and if it is possible to use dependent type.

The short answer is no, the puppet type system, while capable of describing rich types we have not added the ability to use type parameters. They would be really useful - take the hash example, where we instead of:

    param 'Hash[Any, Any]', :hash
    required_block_param 'Callable[2,2]', :block

could specify that the block must accept the key and value type of the given Hash - e.g. something like:

    param 'Hash[K Any, V Any]', :hash
    required_block_param 'Callable[K,V]', :block

This however requires quite a lot of complexity both in the type system itself and what users are exposed to. (The syntax has to be something more elaborate than what is shown above since the references to K and V must naturally find the declared K and V somehow - in the sample that is solved by magic :-).

If we do provide a mechanism to reference the type parameters of the actual types given in a call, we could fully support dependent types. As an example, this would enable declaring that a function takes two arrays of equal length.

How about Return Type?

Return type is also something we decided to leave out for the time being. In hindsight it should have been added from the start as this enables both advanced type inference and type checking to be performed. For this reason we may add this into the dispatch API early in the 4x series. The most difficult part will be figuring out the syntax for the Callable type since it also needs to be able to describe the return type of the callable.

The Puppet 4.0.0 Type System Changes

The Puppet Type System in Puppet 4.0.0 (and in 3.7 when the future parser/evaluator is in use) has undergone some change. I am posting this update for those that have already experimented with the Type system and that just want to know what has changed.

Also, the already published posts in this series about the type system will be updated with these changes (where not already done).

Object Renamed to Any

We felt that the word object had too many associations with an object oriented programming language and did not fit very well with the rest of the Puppet Language. There is already confusion over what a "class" is (especially if you come from an OO language).

From now on, the most abstract type in the Puppet Type System is Any. As the name implies, it accepts assignment of an instance of any other type, including Undef. Thus, there is no need to use Optional[Any].

"All Your Types are Belong to Any"2

All types are now also Any. Earlier you would have to use a Variant type if you wanted a type to be able to accept both Type and Object instances (i.e. Variant[Object, Type]), now you just use Any.

The Ruby[name] Type Renamed to Runtime['ruby', name]

The Puppet Type System supports references to types in an underlying runtime system. Currently only Ruby, but the Puppet master will run on JRuby on top of a JVM and there is then also expected to be the need to reference types in the JVM name-space. The implementation in 3.6 supports a type called Ruby, and the specification reserves names of other runtime systems (e.g. Java).

The 3.6 implementation does however block usage of those names (e.g. Ruby, Java) as the name of a resource type (plugin), or user defined resource type (define) using the short notation, and users would be required to use the longer notation for form e.g. Resource[java] to reference such a resource.

This is unfortunate as the obvious names of the types in the type system also are obvious names for managing these technologies with Puppet. References to runtime types are used far more seldom so we decided to rename Ruby[class_name], to the more generic Runtime['ruby', type_name].

Runtime['ruby',class_name] is currently the only supported runtime type, but you can expect there to be a Runtime['jvm'] or Runtime['java'] when/if the need arises.

This change only affects those who have played with the advanced features in the puppet bindings system or played with advanced puppet functions where a reference to a Ruby type was passed using a Ruby type defined in .pp logic, or in internal ruby logic inside the puppet runtime.

The Default Type

We also realized that we forgot about one symbolic value in the Puppet Language, the default. It is a value in the language (represented by the Ruby symbol :default internally), and it can be passed around. In 3.6, the type of a default expression is Ruby['Symbol'], and would have been Runtime['ruby', 'Symbol'] in 4.0 unless we did something.

The solution was to add a type to the type system unsurprisingly called Default. There is only one value that is an instance of this class, and such an instance is only assignable to Any or Default.

Note that the value itself holds no magic powers unless it is used in a position that acts on it; like in a case expression where the case expression takes the default value to mean 'match against anything'. If you do this matching yourself, say 1 == default the result is false.

The 'default value' has practical value where there is a need to pass two different kinds of unknown values as well as values. You can use it to get one behavior for undef/missing, one for given values, and one for the default value. Note that passing a value of default, does not mean that it will assign a parameter's default value, it means setting that parameter to the special value of Default type.

Also note that this is not a default-type; a type that is used by default, that type is called Any.

The Callable Type

Also added is the Callable Type. It currently has no practical use in the Puppet Language since it is not possible to assign or pass a lambda/block as a value. It is however of importance when writing Puppet functions in Ruby using the 4x function API since it can accept lambdas/blocks and there is the need to also be able to define the types of an acceptable block's parameters.

Although, you may see references to the Callable type in error messages, if you are not into writing functions using the 4x function API that accepts lambdas/blocks, you can probably skip the rest of this post as such errors should be understandable from context.

Here is an excerpt from the Puppet Language Specification:

Callable is the type of callable elements; functions and lambdas. The Callable type will typically not be used literally in the Puppet Language until there is support for functions written in the Puppet Language. Callable is of importance for those who write functions in Ruby and want to type check lambdas that are given as arguments to functions in Ruby. They are also important in error messages when communicating why a given set of arguments do not match a signature.

The signature of a Callable denotes the type and multiplicity of the arguments it accepts and consists of a sequence of parameters; a list of types, where the three last entries may optionally be min count, max count, and a Callable (i.e. calling a lambda with another lambda).

If neither min or max are specified the parameters must match exactly.
A min < size(params) means that the difference is optional.
If max > size(params) means that the last type repeats until the given max cap number of arguments
if max is literal default, the max value is unbound (+Infinity).
If no types and no min/max are given, the Callable describes any callable i.e. Callable[0, default] (i.e. no type constraint, and any number of parameters).
Callable[0,0] is a callable that does not accept parameters
If no types are given, and the min/max count is not [0,0], then the callable describes only the untyped arity and it places no constraints on the parameter types, e.g. Callable[2,2] means callable with exactly 2 parameters.

Callable type algebra is different from other types as it seems to work in reverse. This is because its purpose is to describe the callability of the instance, not its essence (even if the type serves dual purpose by simply reversing the comparison). (This is known as Contravariance in computer science). As an example, a lambda that is Callable[Numeric] can be called with one argument being a Numeric, Float, or an Integer, but not with a Scalar, or Any. Thus, while it seems intuitive that a Callable[Integer] should be assignable to a Callable[Any] (since Any is a wider type), this is not true because it cannot be called with an Any. The reason for checking the type of a callable is to detect if it can be called a certain way - thus assignable?(Callable[Any], Callable[Integer]) really is a declaration that there is an intent to call the callable with one Any argument (which it does not accept).

This also means that generality works the opposite way; Callable[String] ∪ Callable[Scalar] yields Callable[String] - since both can be called with a String, but both cannot be called with any Scalar.

You can read the full specification text for Callable in the Puppet Language Specification.

Isn't something missing?

If you read all of the above about the Callable type, you may have wondered how the type system deals with callables that do not specify the types of the parameters. What are they? They cannot really be typed as Any for the reasons given above - are they just Undef or nil?

The answer is that there is a type that is used internally in the type system to represent this case. This type is known as Unit, and it is basically a chameleon that says 'I am whatever you want me to be' - technically the contravariant of Any.

It cannot be used directly from the Puppet Language; you can however observe instances of this type when specifying something like Callable[1,1] (a callable that accepts exactly one parameter) in your 4x function API for a block parameter and then introspect the created type.

You are not expected to ever use this internal type directly. If you type Unit in the Puppet Language, you actually get a reference to the resource type Resource[Unit]. The internal type is however required in the type system to avoid special cases, and since you may observe it or come across it when reading the source code of puppet I thought it was worth mentioning.

The Puppet 4x function API

In Puppet 4.0.0 there is a new API for writing Ruby functions that extend the functionality of the Puppet language. This API is available in the 3.7.x versions of Puppet when using --parser future, so you can try out this functionality today.

The new 4x API for functions was created to fix problems and add missing features in the 3x API:

The function runs as a method on Scope (and has access to too much non-API)
Undefined arguments are given to the function as empty strings, but as a :undef Symbol if undefined values are given inside collections.
There is no automatic type checking
Functions share a flat namespace and you have to ensure you use a unique name
Functions cannot be private to a module
Functions are defined in the Puppet::Parser::Functions namespace. Future use of functions is to also use them where no parser is available. The concept of "parser function" is just odd.
Methods defined in a Function pollute Scope - if you require helper logic it must be in a separate class.
There are problems with reloading complex functions
There is a distinction between functions of expression and statement kind, and this distinction is no longer meaningful.
The specification of arity (number of arguments) used in 3x to describe parameters to a function, is a blunt tool (no typing, no overloading, and it can not express a variable number of arguments that is capped).
Documentation can not (at least not easily) be retrieved without running the ruby code that defines the function.

The 4x function API solves all of these issues. (With the exception of private functions, which did not make it into 4.0.0, but will be added during the 4x series).

A simple function in the 4x API

The new API has many features, yet, for simple functions, it is very easy to use. Here is a basic example.

Puppet::Functions.create_function(:max) do
  def max(x, y)
    x >= y ? x : y
  end
end

This defines the function max taking two arguments (of Any kind). As you can see, it is slightly different from the 3x function API in that the body of the function is expressed in a defined method.

Also different is that functions are now stored under <moduleroot>/lib/puppet/functions instead of under the terribly confusing <moduleroot>/lib/puppet/parser/functions in 3x which has mislead everyone to talk about "parser functions" - which I guess could mean some kind of function used for parsing. Neither the 3x nor the 4x function plays any role during parsing, and they should be referred to as "functions". So please, no more "parser function" crazy talk...

Automatic Type Checking

In the 4x API there is support for type checking. Here is the same function again, now with type checking:

Puppet::Functions.create_function(:max) do
  dispatch :max do
    param 'Numeric', :a
    param 'Numeric', :b
  end

  def max(x, y)
    x >= y ? x : y
  end
end

As you can see, the max method is identical to the first version. A call to a dispatch method has been added to type the parameters. In addition to typing the parameters, the dispatch call also informs puppet that the call should be dispatched to a particular method (in the example above to :max). If we inside our function want to call the method max_num (instead of the method max) we would change the definition like this:

Puppet::Functions.create_function(:max) do
  dispatch :max_num do
    param 'Numeric', :a
    param 'Numeric', :b
  end

  def max_num(x, y)
    x >= y ? x : y
  end
end

The function is still named max() in the Puppet Language, but internally, when it is called with two Numeric arguments, the call is now dispatched to the max_num method. As you will see in the next section, this is very useful when we want to write functions that have different implementations depending on the types of the arguments given to it when it is called.

When defining a parameter, the type is always given in a string using the Puppet Language Type System notation. This means you can be very detailed in your specification and get type checking with high fidelity.

Multiple Dispatch

The 4x API supports multiple dispatch; so far you have seen two examples. In the first there where no calls to dispatch and the system automatically figured out that the call should be dispatched to a method with the same name as the function.

In the second example we took over dispatching, and declared that a call requires two Numeric arguments.

What if you want to call max with either Numeric, or String arguments? We could certainly type the arguments as Variant[Numeric, String], but we would then also need to write the logic in our method to deal with all of the possible cases. A much simpler approach is to use multiple dispatch. Here is an example - this time for a min function:

Puppet::Functions.create_function(:min) do
  dispatch :min do
    param 'Numeric', :a
    param 'Numeric', :b
  end

  dispatch :min_s do
    param 'String', :s1
    param 'String', :s2
  end

  def min(x,y)
    x <= y ? x : y
  end

  def min_s(x,y)
    cmp = (x.downcase <=> y.downcase)
    cmp <= 0 ? x : y
  end
end

Now the system will look at the types of the given arguments and pick the first matching dispatcher. Thus, in min we know that the arguments are Numeric, and in min_s we know that they are String. Everything is precise, small, clear and easy to read. We also did not have to spend time on dealing with error handling as type checking always takes place in all calls.

Variable Number of Arguments

The 4x API can handle a variable number of arguments. If you do not use a dispatcher the logic introspects the Ruby method declaration and checks the types of the arguments. If we change the max function to return the maximum of a variable number of arguments we can do that like this:

Puppet::Functions.create_function(:max) do
  def max(*args)
    args.reduce {|x, y| x >= y ? x : y }
  end
end

If you want to also type the arguments, or cap the max number of arguments, then this is done in the dispatcher by defining the minimum and maximum argument count with a call to arg_count. In the example below a minimum of 1 argument is specified, and a maximum of :default (which means any number of arguments).

Puppet::Functions.create_function(:max) do
  dispatch :max do
    param 'Numeric', :args
    arg_count 1, :default
  end

  def max(*args)
    args.reduce {|x, y| x >= y ? x : y }
  end
end

Note that the arg_count specifies the min required and max allowed number of arguments given to the function (i.e. it is not just for the last parameter). Also note that the method the call is dispatched to can be defined in any compatible way (i.e. it must handle missing arguments by using default values, or capture variable arguments in an array as in the example below:

Puppet::Functions.create_function(:example) do
  dispatch :example do
    param 'Numeric', :name
    param 'String', :value
    param 'Numeric', :name2
    param 'String', :value2
    arg_count 2, 4
  end

  def example(name, value, *args)
  end
end

Namespaced Functions

In 3x it is not possible to give functions a name-spaced name. They all live in the same name space. This is a problem because one module may override functions in another module. In 4x, the functions can be given a complex name. To do this, the function should be placed in a directory that corresponds to the name space, and it should be named accordingly in the call to create_function.

Here, the function max is placed in the namespace mymodule (which is also the name of the module).

# in <moduleroot>/lib/functions/mymodule/max.rb
Puppet::Functions.create_function(:'mymodule::max') do
  dispatch :max do
    param 'Numeric', :args
    arg_count 1, :default
  end

  def max(*args)
    args.reduce {|x, y| x >= y ? x : y }
  end
end

Note that it is only the name of the function that needs to be given the fully qualified name, in the dispatcher the name of the Ruby method to dispatch to is still used, and it is not a fully qualified name.

You can nest namespaces further if you like.

To call to a fully qualified function from the Puppet Language simply uses the full name - e.g:

mymodule::max(1,2,3,4)

Helper Logic

You can have as many helper methods you like in the function - it is only the methods being dispatched to that are being used by the 4x function API. You are however not allowed to define nested ruby classes, modules, or introduce constants inside the function definition. If you have that much code, you should deliver that elsewhere and call that logic. Note that such external logic is static across all environments.

Documenting the Function

The new Puppet Doc tool (a.k.a Puppet Strings) that will be released with Puppet 4.0.0 can produce documentation from functions written using the 4x function API. In the 3x API functions are documented with a Ruby String that is given in the call to create a function. The 4x API instead processes comments that are associated with the created function. This processing supports a set of YARD tags to make it possible to write documentation of higher quality. Tags for @param, @example, and @since are examples of such tags.

See the Puppet String project at github for examples and more information.

In the Next Post

In the next post I describe how you can pass code blocks to puppet functions and call them from within the function.

Wednesday, June 18, 2014

Optionally Typed Parameters

And now, the latest entry in the series about the new Puppet Type System introduces the capability to optionally type the parameters of defines, classes, lambdas and EPP template parameters.

There are also some new abilities and changes to the type system that I will cover in this post.

Optionally Typed Parameters

Writing high quality puppet code involves judicious use of type checking of given arguments. This is especially important when writing modules that are consumed by others. Anyone having written a serious module knows that it is a chore to not only deal with all of the parameters in the first place, but that type checking involves one extra call to a standard lib function per parameter. The result is simply lower signal to noise ratio.

From Puppet 3.7 when using the future parser/evaluator (and from 4.0 when the future parser/evaluator becomes the standard), you can now (optionally) type the parameters of defines, classes, EPP template parameters and lambdas. We have even thrown in support for varargs/captures-rest/splatted parameter in lambdas (excess arguments are delivered in an array).

Type Checking Defines and Classes

To opt in to type checking in defines and classes, simply give the type before the parameter declaration:

String $x        # $x must be a String
String[1] $x     # $x must be a String with at least one character
Array[String] $x # $x must be an Array, and all entries must be Strings

(See earlier posts in this series for other types, and the type system in general).

If you do not type a parameter, it defaults to the type Any (renamed in 3.7.0 from Object). And this type accepts any argument including undef.

define sinatra(String $regrets, Integer $amount) {
  notice "$regrets, I had $amount, I did it my way. Do bi do bi doo..."
}
sinatra{ frank:
  regrets => regrets,
  amount  => 2        # e.g. 'a few'
}

Which results in:

Notice: regrets, I had 2, I did it my way. Do bi do bi doo...

And if the wrong type is given:

sinatra{ frank:
  regrets => regrets,
  amount  => 'a few'
}

The result is:

Error: Expected parameter 'amount' of 'Sinatra[frank]' to have type Integer, got String ...

And while on this topic, here are a couple of details:

If you supply a default value, it is also type checked
The type expressions can use global variables - e.g. String[$minlength]

Type Checking Lambdas

Lambdas can now also have type checked parameters, and lambdas support the notion of captures-rest (a.k.a varargs, or splat) by preceding the last parameter with a *. The type checking of lambdas, and the capabilities of passing arguments to lambdas has been harmonized with the new function API (which I will presenting in a separate blog post).

Before showing how typed lambda parameters works, I want to tell you about a new function called with that I will use to illustrate the new type checking capabilities.

The 'with' function

Andrew Parker (@zaphod42) wrote a nifty little function called with that is very useful for illustrating (and testing) type checking. It is also very useful in classes, where you would like to make some logic (and in particular some variables) local/private to a block of code, this to avoid leaking non-API variables from your classes.

The with is very simple - it just passes any given arguments to the given lambda. Hence its name; you can think of it as "with these variables, do this...".

with(1) | Integer $x | { notice $x }

Which, calls the lambda with the argument 1, assigns it to $x after having checked for type compliance, and then notices it.

Now if you try this:

with(true) | Integer $x | { notice $x }

You get the error message:

Error while evaluating a Function Call, lambda called with mis-matched arguments
expected:
  lambda(Integer x) - arg count {1}
actual:
  lambda(Boolean) - arg count {1} at line 1:1 on node ...

Captures-Rest

As mentioned earlier, you can declare the last parameter with a preceding * to make it capture any excess arguments given in the call. The type that is given is the type of the elements of an Array that is constructed and passed to the lambda's body.

 with(1,2,3) | Integer *$x | { notice $x }

Which results in:

Notice: [1, 2, 3]

There is one special rule for captures rest: If the type is an Array type, it is used as the type of the resulting array. Thus, if you want to accept elements of Array type, you must describe this as an Array of Arrays (or use the Tuple type). By declaring an Array you can constrain the number of excess arguments that the captures-rest parameter accepts.

 with(1,2,3,4,5) | Array[Integer,0,3] *$x | { notice $x }

Will fail with the message:

 Error while evaluating a Function Call, lambda called with mis-matched arguments
 expected:
   lambda(Integer x{0,3}) - arg count {0,3}
 actual:
   lambda(Integer, Integer, Integer, Integer, Integer) - arg count {5} at line 1:6 on node ...

A couple of details:

The captures-rest does not affect how arguments are given (in the example above, the lambda could have been changed to have 3 individual parameters with a default value and still be called the same way, and it would accept the same given arguments).
Captures rest is not supported for classes, defines, or for EPP parameters

Using the assert_type Function

And finally, if the built in type checking capabilities and the generic error messages that they produce does not work for you there is an assert_type function that gives you a lot more flexibility.

In its basic form, it performs the same type checking as for typed parameters. The assert_type function returns its second argument (the value) which means it can be used to type check and assign to a resource attribute at the same time:

 # Somewhere, there is this untyped definition (that does not work unless $x is
 # an Integer).
 define my_type($x) { ... }

 # And you want to create an instance of it
 #
 my_type { 'it':
   x => assert_type(Integer, hello)
 }

Which results in:

 Error: assert_type(): Expected type Integer does not match actual: String ...

The flexibility comes in the form of giving a lambda that is called if the assertion would fail (the lambda "takes over"). This can be used to customize the error message, to issue a warning, and possibly return a default sanitized value. Since the lambda takes over, you need to call fail to halt the execution (if that is what you want). The lambda is given two arguments; the expected type, and the actual type (inferred from the second argument given to assert_type).

 assert_type(Integer, hello) |$expected, $actual| {
   fail "The value was a $expected must be an Integer (like 1 or 2 or...)"
 }

Which results in:

 Error: The value was a String must be an Integer (like 1 or 2 or...)

Type checking EPP

Type checking EPP works the same way as elsewhere, the type is simply stated before the parameter and defaults to Any. EPP parameters does not support captures-rest.

See the "Templating with Embedded Puppet Programming Language" for more information about EPP.

In this post

In this post I have showed how the new optionally typed parameters feature in Puppet 3.7.0's future parser/evaluator works and how type checking can be simplified in your Puppet logic.

The Type System Series of Blog Posts

You can find the rest of the blog posts about the type system here.

Tuesday, May 6, 2014

Puppet Internals - The Integer Type Ruby API

Creating an Integer

Here is a follow up post about the Puppet Type System Ruby API. The Integer type (as you may recall from the earlier posts) have the ability to represent a range of values and the earlier posts showed how this can be used in the Puppet Language. In this post, I will show you how the Integer range features can be used from Ruby.

Creating an Integer

An Integer type with a range can be created in Ruby a couple of different ways.

# Using the type factory
#
FACTORY = Puppet::Pops::Types::TypeFactory
range_t = FACTORY.range(100, 200)

# Using the type parser
#
TYPE_PARSER = Puppet::Pops::Types::TypeParser.new
range_t = TYPE_PARSER.parse('Integer[100,200]')

If you want to be explicit about an Infinite (open) range, use the symbol :default in Ruby, and the default Puppet language keyword in in the string representation given to the type parser.

The integer type's class is Puppet::Pops::Types::PIntegerType.

Integer Type API

The Integer type has two attributes, from and to. In Ruby these values are either nil or have a Ruby Integer value. A value of nil means negative infinity in from, and positive infinity in to.

The Integer may also have a to value that is <= from (an inverted range).

The most convenient way to get the range in Numeric form is to call the method range which returns an array with the two values with smallest value first, and where nil values are replaced by the corresponding +/- Infinity value.

A Note About Infinity

Infinity is a special numeric value in Ruby. You can not access it symbolically, but it is the value that is produced by an operations such as 1/0. The great thing about this value is that it can be used in arithmetic, and naturally; the result of any arithmetic operation involving Infinity is still Infinity. This makes it easy to test if something is in range without having to treat the unbound ends a special way.

The constants INFINITY, and NEGATIVE_INFINITY are available in Puppet::Pops::Types should you need them for comparisons.

Range Size

You can get the size of the range by calling size. If one of the to/from attributes is Infinity, the size is Infinity.

Iteration Support

The PIntegerType implements Ruby Enumerable, which enable you to directly iterate over its range. You can naturally use any of the iterative methods supported by Enumerable.

If one of the to/from attributes is Infinity, nothing is yielded (this to prevent you from iterating until the end of time).

range_t = FACTORY.range(1,3)
range_t.reduce {|memo, x| memo + x }  # => 6

Getting the String Representation

All types in the Puppet Type system can represent themselves in String form in a way that allows them to be parsed back again by the type parser. Simply call to_s to get the String representation.

Using Integer Range in Resources

Resources in the 3x Puppet Catalog can not directly handle PIntegerType instances. Thus, if you like to use ranges in a resource (type), you must use the string representation as the values stored in a resource, and then use the type parser to parse and interpret them as Integer values.

You can use the type system without also using the future parser for general parsing and evaluation. The only requirement is that the RGen gem is installed. And if you are going to use this in a Resource, you must also have RGen installed on the agent side. (In Puppet 4.0 the RGen gem is required everywhere).

Monday, May 5, 2014

Puppet Internals - Modeling and Custom Logic

In this post about Modeling with Ecore and RGen I will show how do achieve various common implementation tasks. Don't worry if you glanced over the the very technical previous post about the ECore model, If needed, I will try to repeat some of the information, but you may want to go back to it for details.

Derived Attributes

Sometimes we need to use derived (computed) attributes. Say we want to model a Person and record the birth-date, but we also like to be able to ask how old a Person is right now. To do this we would have two attributes birth_date, and a derived attribute age.

class Person < MyModelElement
  has_attr 'birth_date', Integer
  has_attr 'age', Integer, :derived => true
end

(Here I completely skip all aspects of handling date/time formats, time zones etc., and simply use a date/time converted to an Integer).

Since a derived attribute needs to be computed, and thus requires us to implement a method, we must define this method somewhere. All logic for a modeled class should be defined in a module called ClassModule nested inside the class. The definition in this module will be mixed into the runtime class.

A derived attribute is implemented by defining a method with the same name as the attribute plus the suffix '_derived'.

The full definition of the Person class then looks like this:

class Person < MyModelElement
  has_attr 'birth_date', Integer
  has_attr 'age', Integer, :derived => true

  module ClassModule
    def age_derived
      Time.now.year - birth_date.year
    end
  end
end

Derived attributes are good for a handful of intrinsic things like this (information that is very closely related / an integral part of the class), but it should not be overused as we in general want our models to be as anemic as possible; operations on models are best implemented outside of the model as functions, the model should really just contain an implementation that maintains its integrity and provides intrinsic information about the objects.

Here is an another example from the new Puppet Type System:

class PRegexpType < PScalarType
  has_attr 'pattern', String, :lowerBound => 1
  has_attr 'regexp', Object, :derived => true

  module ClassModule
    def regexp_derived
      @_regexp = Regexp.new(pattern) unless @_regexp && @_regexp.source == pattern
      @_regexp
    end
  end
end

Here, we want to be able to get the real Ruby Regexp instance (the regexp attribute) from the PRegexpType based on the pattern that is stored in string form (pattern). Derived attributes are by default also virtual (not serialized), volatile (they have no storage in memory), and not changeable (there is no setter).

Here is an example of using the PRegexpType.

rt = Puppet::Pops::Types::PRegexpType.new(:pattern => '[a-z]+')
the_regexp = rt.regexp
the_regexp.is_a?(Regexp)      # => true

Going back to the implementation. Remember, that all features (attributes and references) that are marked as being derived, must have a defined method named after the feature and with the suffix _derived. Thus, in this example, since the attribute is called 'regexp', we implement the method 'regexp_derived'. Since we do not have any storage and no generated supporting methods to read/write the Regexp we need to create this storage ourself. (Note that we do not want to recompile the Regexp on each request unless the pattern has changed). Thus, we assign the result to the instance variable @_regexp. The leading _ has no special technical semantics, but it is there to say 'hands off, this is private stuff'.

Adding Arbitrary Methods

You can naturally add arbitrary methods to the ClassModule, they do not have to be derived features. This does however go against the anemic principle. It also means that the method is not reflected in the model. Such methods are sometimes useful as private implementation method that are called from methods that represent derived features, or that are for purely technical Ruby runtime reasons (as you will see in the next example).

Using Modeled Objects as Hash Keys

In order for something to be useful as a hash key, it needs to have a hash value that reflects the significant parts of the object "as a key". Regular Ruby objects use a default that is typically not what we want.

Again, here is the PRegexpType, now also with support for being a hash key.

class PRegexpType < PScalarType
  has_attr 'pattern', String, :lowerBound => 1
  has_attr 'regexp', Object, :derived => true

  module ClassModule
    def regexp_derived
      @_regexp = Regexp.new(pattern) unless @_regexp && @_regexp.source == pattern
      @_regexp
    end

    def hash
      [self.class, pattern].hash
    end

    def ==(o)
      self.class == o.class && pattern == o.pattern
    end
  end
end

This implementation allows us to match PRegexpType instances if they are a) of the same class, and b) have the same source pattern. To support this, we simply create a hash based on the class and pattern in an Array. We also need to implement == since it is required that two objects that have the same hash also compute true on ==.

Can you think of improvements to this implementation?

(We do compute the hash value on every request, we could cache it in an instance variable. We must then however ensure that if pattern is changed, that we do not use a stale hash. In order to to know we must measure if it is faster to recompute the hash, than compute if the pattern has changed - this is an exercise I have yet to do).

Overriding Setters

Another use case is to handle setting of multiple values from a single given value - and worst case setting them cross-wise. (Eg. in the example with the Person, imagine wanting to set either the birth_date or computing from a given age in years - yeah it would be a dumb thing to do, but I had to come up with a simple example).

Here is an example from the AST model - again dealing with regular expressions, but now in the form of an instruction to create one.

# A Regular Expression Literal.
#
class LiteralRegularExpression < LiteralValue
  has_attr 'value', Object, :lowerBound => 1, :transient => true
  has_attr 'pattern', String, :lowerBound => 1

  module ClassModule
    # Go through the gymnastics of making either value or pattern settable
    # with synchronization to the other form. A derived value cannot be serialized
    # and we want to serialize the pattern. When recreating the object we need to
    # recreate it from the pattern string.
    # The below sets both values if one is changed.
    #
    def value= regexp
      setValue regexp
      setPattern regexp.to_s
    end

    def pattern= regexp_string
      setPattern regexp_string
      setValue Regexp.new(regexp_string)
    end
  end
end

Here you can see that we override the regular setters value=, and pattern=, and that these methods in turn use the internal methods setValue, and setPattern. This implementation is however not ideal, since the setValue and setPattern methods are also exposed, and if they are called the attributes value and pattern will get out of sync!

We can improve this by doing a renaming trick. We want the original setters to be callable, but only from methods inside the class since we want the automatic type checking performed by the generated setters.

module ClassModule
  alias :setPattern :_setPattern_private
  private :_setPattern_private

  alias :setValue :_setValue_private
  private :_setValue_private

  def setPattern(regexp_string)
    _setPattern_private(regexp_string)
    _setValue_private(Regexp.new(regexp_string))
  end

  def setValue(regexp)
    _setValue_private(regexp)
    _setPattern_private(regexp.source)
  end
end

Here we squirrel away the original implementations by renaming them, and making them private. Since we did this, we do not have to implement the value= and pattern= methods since they default to calling the set methods we just introduced.

Now we have a safe version of the LiteralRegularExpression.

Defining Relationships Out of Band

Bi-directional references are sometimes tricky to define when there are multiple relations. The classes we are referencing must be known by Ruby and sometimes the model is not a a hierarchy. And even if it is, it is more natural to define it top down than bottom up order.

To handle this, we need to specify the relationships out of band. This is very easy in Ruby since classes can be reopened, and it especially easy with RGen since the builder methods are available for modifying the structure that is built while we are building it.

Here is an example (from RGen documentation):

class Person < RGen::MetamodelBuilder::MMBase
  has_attr 'name', String
  has_attr 'age', Integer
end

class House < RGen::MetamodelBuilder::MMBase
  has_attr 'address', String
end

Person.many_to_many 'homes', House, 'inhabitants'

What RGen does is to simply build the runtime model, for some constructs with intermediate meta-data recording our desire what our model should look like. The runtime classes and intermediate meta-data is then mutated until we have completed the definition of the model. The runtime classes are ready to use as soon as they are defined, but caution should be taken to use the classes for anything while the module they are in is being defined (classes may be unfinished until the very end of the module's body). Then, the first request to get the meta-model (e.g. calling Person.class.ecore) will trigger the building of the actual meta-model as an ECore model). It is computed on demand, since if it is not needed by the logic (only the concrete implementation of it), there is little point taking cycles to construct it, or having it occupy memory.

As you may have guessed, it is a terribly bad idea to modify the meta-model after it has been defined and there are live objects around. (There is nothing stopping you though if you know what you are doing). If you really need to jump through hoops like these, you need to come up with a scheme that safely creates new modules and classes in different "contexts".

In this Post

In this post I have shown some common tasks when using RGen. You should now have a grip on how derived attributes are handled and how to provide implementation logic for the declaratively modeled classes.

In a future post I will cover additional topics, such as dealing with custom data types, serialization of models, and how to work with fragmented models. It may take a while before I post on those topics as I have a bit of exploratory work to do regarding how these features work in RGen. meanwhile, if you are curious, you can read about these topics in the EMF book mentioned in the ECore blog post.