Sunday, February 1, 2015

Puppet 4.0 Data in Modules part II - Writing a Data Provider

In Puppet 4.0.0 there is a new technology agnostic mechanism that makes it possible to provide default values for class parameters in modules and in environments. In the first post about this feature I show how it is used. In this second post I will show how to write and deliver an implementation of a data provider.

The information in this post is only relevant if you are planning to extend puppet with additional types of data providers - you do not need to learn all that is presented here to use the services the new data provider feature provides.

How does it work?

The new data provider feature is built using the Puppet Binder which wires (i.e. binds) the various parts together in a composable way. You do not really need to know all the features of the Puppet Binder to be able to use it as the bindings needed for the data providers are mostly boilerplate and you just have to copy/paste and replace the example names with the names of things in your implementation.

The feature has two different kinds of data providers; one for environments, and one for modules. The steps to implement them are almost the same, so I am going to show both at the same time.

What you need to do:

  • Implement the data provider(s). They have a very simple API - basically just a method named lookup.
  • Register the bindings that makes your data provider implementations available for use.

Implementing the Data Providers

Data providers are implemented as Ruby classes. The two classes (one for environments, and one for modules) have a very simple API - basically they must inherit from the correct base class (as shown in the examples below), and they must implement the method lookup(name, scope, merge).

In this example, I am creating data providers that users will know by the name 'sample'. There will be a provider called 'sample' that can be used for the environment, and one that can be used for modules. This will be made available in a module that I am going to name 'sampledata'.

For use in environments

# <modulepath>sampledata/lib/puppet_x/author/sample_env_data.rb
#
require 'puppet_x'
module PuppetX::Author
  class SampleEnvData < Puppet::Plugins::DataProviders::EnvironmentDataProvider
    def lookup(name, scope, merge)
      # return the value bound to the name/key
    end
  end
end

For use in modules

# <modulepath>sampledata/lib/puppet_x/author/sample_module_data.rb
#
require 'puppet_x'
module PuppetX::Author
  class SampleModuleData < Puppet::Plugins::DataProviders::ModuleDataProvider
    def lookup(name, scope, merge) 
      # return the value bound to the name/key
    end
  end
end

Note:

  • The data provider API guarantees that calls to lookup only occurs for an environment that has
    opted in by setting the environment_data_provider to the key 'sample', and for a module that has opted in with a binding of 'sample' to the 'module_data' (just like we used 'function' in the earlier examples in the previous post on this topic).

  • The PuppetX namespace is available for 3d party Ruby code. When using it, it should be followed by the name of the author (as defined by the Puppet Forge for modules) - i.e. in your code replace Author with your name.

  • When the implementations are loaded by the runtime, the data provider base classes have already been loaded, so there is no need to require 'puppet'.

  • The merge parameter is a string of type Enum[unique, hash, merge] or a hash with the key 'strategy' set to that string with additional keys that control the merge in detail (see the documentation of the lookup function). (In the sample implementation this parameter is ignored since it can only supply one value per key).

There are more things to say how to implement the lookup to make it efficient. More about that later after I have showed how to wire the implementation into puppet.

Registering the Data Provider Implementations

The first thing is to register the bindings that makes it possible for other modules (or an environment) to declare that our new implementation should be used.

The Puppet Binder loads bindings from modules. By default the file <moduleroot>/lib/puppet/bindings/<modulename>/default.rb is loaded (if it exists). In this file, we need to create the bindings we want.

Since we have an implementation for both environment, and modules, the registration looks like this:

# <modulepath>sampledata/lib/puppet/bindings/sampledata/default.rb
#
Puppet::Bindings.newbindings('sampledata::default') do
  bind {
    name         'sample'                             # the name
    in_multibind 'puppet::environment_data_providers' # boilerplate (for env)
    to_instance  'PuppetX::Author::SampleEnvData'     # the classname as a string
  }
  bind {
    name          'sample'                            # the name
    in_multibind  'puppet::module_data_providers'     # boilerplate (for module)
    to_instance   'PuppetX::Author::SampleModuleData' # the classname as a string
  }
end

As before, replace Author with your name, and replace 'sample' with the name you want to give your bindings provider. The to_instance references should be the fully qualified class names of the implementations of the data providers.

The two bindings, registers the respective implementation class with a symbolic name, which allows users to use this name instead of the more complicated class name of the data provider class we have implemented.

As there can be many implementations available and active at the same time, the Puppet Binder's multibind capability is used to bind the implementation for a given "extension point" (e.g 'puppet::environment_data_providers').

Note:

  • The name you give your implementation must be unique among all implementations of the same type so you should really prefix the name with the module name to be safe.

Using the Implementations

As shown in the previous post, using a data provider implementation is simple. The examples in this post adds a provider named 'sample'; so simply change the use of 'function' in the previous post's examples to switch to the providers we just implemented.

The lifecycle of Data

The implementation of lookup probably needs to cache information (e.g. if we were writing an implementation for hiera it could be reading and caching the hiera.yaml file, and various data files).

Caching is somewhat complicated since we need to associate the cached data with something that has the same lifecycle as the data - we do not want to hold on to information that is stale and just occupies memory until Puppet's master process is restarted.

There are two things that it makes sense to associate a cache with:

  • the environment, if the data is static for the entire life of the environment. An environment goes out of scope when it times out (a configurable amount of time).
  • the compiler, if the data is static for the compilation (but varies from request to request for different nodes in the same environment instance). The compiler goes out of scope and the end of each catalog compilation.

It is not suitable to associate the cache with the data provider instance itself (e.g. in a class or instance variable in SampleModuleData).

The absolute best way of doing this is to use an Adapter. There is no reusable implementation of a caching adapter and the implementor of a data provider should design one for the specific purpose of handling its caching needs. This can be as simple as in this example:

class PuppetX::Author::MyCacheAdapter < Puppet::Pops::Adaptable::Adapter
  attr_accessor :cache
end

The provider implementation then associates the adapter with either the environment, or the compiler. the implementation can naturally have as many instance variables as it needs (the one in the example just has a cache variable), and additional methods. (If you want to look at a real implementation, the 'function' data provider built into Puppet 4.0 has a class called Puppet::DataBindings::DataAdapter that serves as a cache as well as performing the calls to the data functions).

The approach of using adapters is much preferred over monkey patching existing code. For more information about adapters - see my blog post on the topic).

It is simple to use the adapter - here are examples for associating one with the environment, and the compiler.

adapter = MyCacheAdapter.adapt(Puppet.lookup(:current_environment))
cached = adapter.cache()

adapter = MyCacheAdapter.adapt(scope.compiler)
cached = adapter.cache()

I am stopping there, since what you need to cache and how will be specific to what you are implementing support for.

General notes about caching data content

Do not implement file watching. Directory environments use a stable state for the given timeout and everything is evicted when the environment times out. Since there can be a very large number of directory environments (users have reported using several hundred, e.g. for a master running various development branches), and directory environments may also be quite volatile. If you are not using the adapter approach to caching, you must ensure that your caching does not leak memory by binding stale data for environments that potentially never will be used again during the running process' life cycle.

Experiment with the Sample in Puppet's code base

There are two test data fixtures in Puppet's code base (used when running spec test) that you can also run from the command line. You can naturally make a copy of them for your own experiments (if you do not want to type in the examples in this blog post from scratch).

The 'function' example

The first tests the function data provider, and can be invoked like this (all on one line):

bundle exec puppet apply
--environmentpath=spec/fixtures/unit/data_providers/environments
--environment=production -e 'include abc'

The fixture has a parameterized classes. One that is not in a module, and one in a module. The module class gets two of its three parameters overridden by environment data.

You should see this printout

Notice: env_test1
Notice: /Stage[main]/Abc::Def/Notify[env_test1]/message: defined 'message' as 'env_test1'
Notice: env_test2
Notice: /Stage[main]/Abc::Def/Notify[env_test2]/message: defined 'message' as 'env_test2'
Notice: module_test3
Notice: /Stage[main]/Abc::Def/Notify[module_test3]/message: defined 'message' as 'module_test3'

The 'sample provider' example

The second example can be run like this (all on one line):

bundle exec puppet apply
--environmentpath=spec/fixtures/unit/data_providers/environments
--environment=sample
spec/fixtures/unit/data_providers/environments/sample/manifests/site.pp

This fixture uses parameterized classes and use an implementation of the sample providers shown in this blog post but with lookup functions that return hard coded values for the classes in the fixture.

You should see this printout:

Notice: env data param_a is 10, env data param_b is 20, 3
Notice: /Stage[main]/Test/Notify[env data param_a is 10, env data param_b is 20, 3]/message: defined 'message' as 'env data param_a is 10, env data param_b is 20, 3'
Notice: module data param_a is 100, module data param_b is 200, env data param_c is 300
Notice: /Stage[main]/Dataprovider::Test/Notify[module data param_a is 100, module data param_b is 200, env data param_c is 300]/message: defined 'message' as 'module data param_a is 100, module data param_b is 200, env data param_c is 300'

Saturday, January 31, 2015

Puppet 4.0 Data in Modules and Environments

In Puppet 4.0.0 there is a new technology-agnostic mechanism for data lookup that makes it possible to provide default values for class parameters in modules and in environments. The mechanism looks first in the "global" data binding mechanism across all environments (i.e. the existing mechanism for data binding, which in practice means hiera, since this is the only available implementation). It then looks for data in the environment, and finally in the module.

The big thing here is that a user of a module does not have to know which implementation the module author has chosen - the module is simply installed (with its dependencies). The user is free to override values using an implementation of their choice (in the environment using the new mechanism, or with the existing data binding / hiera support).

It is expected that there will be implementations for hiera as well available in a module.

In this part 1 about the new data binding feature I will show how it can be used in environments and modules. In the next part I will show how to make new data binding implementations.

How does it work?

Out of the box, the new feature:

  • provides module authors with a way to select which data binding implementation to use in their module without affecting how other modules get their data.

  • provides users configuring an environment to select which data binding implementation to use in an environment (or all environments) - different environments can use different implementations, and the environment does not have to use the same implementation as the modules.

  • contains a data binding implementation named 'function' which calls a puppet function that returns a hash of data. The module author can select this mechanism and simply implement the function. A user can also configure an environment to use a function to provide the data - the function is then added to the environment.

  • provides module author with a way to package and share a data binding implementation in a module. It can be delivered in the same module as regular content, or in a separate module just containing the data binding implementation.

Using a function to deliver data in an environment

This is the easiest, so I am starting with that. Two things are needed:

  • Configuring the environment to state that a function delivers data.
  • Writing the function

configuring the environment

The binding provider to use for an environment can be selected via the environment specific setting environment_data_provider. The value is the name of the data provider implementation to use. In our example this is 'function'. If not set in an environment specific environment.conf, the environment inherits the global setting - which is handy if all your environments work the same way.

writing the function

The function must be written using the 4x function API and placed in a file called lib/puppet/functions/environment/data.rb under the root directory of the environment.

# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
  def data()
    # return a hash with key to value mappings 
    { 'abc::param_a' => 'default value for param a in class abc',
      'abc::param_b' => 'default value for param b in class abc',
    }
  end
end

Later in the 4x series of Puppet, it will be possible to also write such functions in the puppet language which makes authoring more accessible.

Note that the name of the function is always environment::data irrespective of what the actual name of the environment is. This because, it would not be good if the name of the function had to change as you test a new environment named 'dev' and later merged it into 'production'.

Using a function to deliver data in a module

The steps to deliver data with a function for a module is different because there are no individual settings for a module. Here are the steps:

  • Creating a binding using the Puppet Binder to declare that the module should use the 'function' data provider for this module.
  • Writing the Function

Note that in the future, the data provider name may be made part of the module's metadata. This is however not the case in the Puppet 4.0.0 release.

writing the binding

The binding is very simple as it is all boilerplate except for the name of the module and the name of the data provider implementation - 'mymodule' and 'function' in the example below. The name of the file is lib/puppet/bindings/mymodule/default.rb where the mymodule part needs to reflect the name of the module it is placed in. (The file is always called 'default.rb' since it contains the default puppet bindings for this module).

# <moduleroot>/lib/puppet/bindings/mymodule/default.rb
#
Puppet::Bindings.newbindings('mymodule::default') do
  bind {
    name         'mymodule'            # name of the module this is placed in
    to           'function'            # name of the data provider
    in_multibind 'puppet::module_data' # boiler-plate
  }
end

writing the function

This is exactly the same as for the environment, but the function is named mymodule::data where mymodule is the name of the module this function provides data for. The file name is lib/puppet/functions/mymodule/data.rb

# <moduleroot>/lib/puppet/functions/mymodule/data.rb
#
Puppet::Functions.create_function(:'mymodule::data') do
  def data()
    # Return a hash with parameter name to value mapping
    { 'mymodule::abc::param_a' => 'default value for param a in class mymodule::abc',
      'mymodule::abc::param_b' => 'default value for param b in class mymodule::abc',
    }
  end
end

Overriding a parameter in the environment

As you may have figured out already, it is easy to override the module's data in the environment. As an example we may want to provide a different value for mymodule::abc::param_b at the environment level. This is how that would look:

# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
  def data() 
    { # ... other keys and values
      'mymodule::abc::param_b' => 'env specific value for param b in class mymodule::abc',
    }
  end
end

Getting the data

To get the data, there is absolutely nothing you need to do in your manifests. Just as before, if a class parameter does not have a value, it will be looked up as explained in this blog post. Finally, if there was no value to lookup the default parameter value given in the manifest is used.

Using the examples above - if you have this in your init.pp for the mymodule module:

class mymodule::abc($param_a, $param_b) {
  notice $param_a, $param_b
}

the two parameters $param_a and $param_b will be given their values from the hashes returned by the data functions, looking up mymodle::abc::param_a, and mymodule::abc::param_b.

Note that there is no need to use the "params pattern" now in common use in modules for Puppet 3x!

More about Functions

Since the new 'function' data provider is based on the general concept of calling functions and you can call other functions from them, you have a very powerful mechanism to help you organize data and to do advanced composition.

The data function is called once during a compilation for the purpose of producing a Hash with qualified name strings to data values. The function body can call other functions, use expressions, transformations, composition etc. When the data binding kicks in, it will call the function on the first request to get a parameter in the compilation, it will then cache the returned hash and reuse it for lookup of additional parameters (this in contrast to calling the function for each and every parameter which would be much slower).

Note that the data function can be called like any other function!. This means that a module or environment can use another module's data function, transform it etc. before using its data.

Naturally, since we are dealing with functions it is easy to divide the composition of data into multiple functions, and then hierarchically compose them. Say that we want to divide the data up into two parts, one for osfamily, and one for common and we then want to combine them. We can now do a simple function composition and merge the result.

In the examples, the functions are written using the puppet language (even though they are not available in the 4.0.0 release). At the moment, it is left as an exercise to translate them into Ruby. What I want to show here is the power of combining data with functions without cluttering the examples with what you need to do in Ruby to get variables in scope, call other functions etc.

Data Composition with Puppet functions

When we add support for functions in the Puppet Language data composition can look like this:

function mymodule::data {
  mymodule::common() + mymodule::osfamily()
}

function mymodule::osfamily() {
  case $osfamily {
    'Debian' : {
       { mymodule::abc::param_a => 'the debian value for a' }
    }
    'Darwin': {
      { mymodule::abc::param_a => 'the osx value for a' }
    default: {
      { }  # empty hash
  }
}

function mymodule::common() {
  { mymodule::abc::param_a => 'the default for param a',
    mymodule::abc::param_b => 'the default for param b',
  }
}

Naturally, the functions called from the data function can take parameters. The data() function itself however does not take any parameters.

Example - Module with multiple use cases

A module author wants to provide a set of default values for a base use case of the module, but also wants to offer defaults for other use cases. Clearly, there can only be one set of defaults applied at any given time, and the data() function in a module is for that module only, so these defaults must be provided at a higher level i.e. in the environment (where it is known how the module is getting used). If the environment is also using the function data provider, it is very simple to achieve this:

function environment::data() {
  # merge usecase_x from module with the overrides
  mymodule::usecase_x() + {
    mymodule::abc::param_b => 'default from environment for param_b'
  }
}

This illustrates that mymodule has a special data function named mymodule::usecase_x() that provides an alternate set of default values for classes inside the mymodule, these are then overridden with a hash of specific overrides wanted in this environment.

Example - Hierarchical keys

If you find it tedious to retype mymodule::classname::foo, mymodule::classname::bar, etc. etc. you can instead construct the keys programmatically. Since the "data functions" are general functions, variables and interpolation can be used - e.g:

function mymodule::data() {
  $m = 'mymodule::abc'
  { "${m}::param_a" => 'the value', 
    ...
  }
}

Or why not call a function that reorganizes a hierarchical hash; say that we have param_a in classes a::b::x, a::b::y, and a::b::z, we could then do something like this:

function mymodule::data() {
  $hierarchical = { 
    a => {
      b => {
        x => { param_a => 'default for a::b::x::param_a' },
        y => { param_a => 'default for a::b::y::param_a' },
        z => { param_a => 'default for a::b::z::param_a' },
  }}}
  # Calling a function that expands the hash (left as an exercise)
  expand_hierarchical_keys($hierarchical)
}

Trying out this new featue

When this is written, the new data binding feature is available in the nightlies for Puppet 4.0.0, or you can run it from source using Puppet's master branch. (The new feature will not be available for 3x with future parser). If you are reading this after Puppet 4.0.0 has been released, just get the release.

Summary

The new data provider mechanism is a technology agnostic way of defining default data for modules and environments without dictating that a particular technology is used by the users of a module.

The new mechanism comes with a built in implementation based on functions that provides a simple yet powerful way of delivering, using and composing data. Functions in Ruby provide a simple way to extend the functionality without having to write a complete data provider.

The function mechanism, while relatively easy to write in Ruby for delivering data since they consist mostly of boilerplate code will become much more powerful and accessible when functions can be written in the Puppet Language.

In the next post about the new data binding feature I will show how to write a new implementation of a data provider.

Sunday, January 25, 2015

The Puppet 4x Function API - part 2

In the first post about the 4x Function API I showed the fundamentals of the new API. In this post I am going to show how you can write more advanced functions that take a code block / lambda as an argument and how you can call this block from Ruby. This can be used to create your own iterative functions or functions that make it possible to write puppet code in a more function oriented style.

Accepting a Code Block / Lambda

A 4x function can accept a code block / lambda. You can make it required by calling required_block_parameter in the definition of the dispatcher, or optional by calling optional_block_parameter.

Here is an example of a simple function called then, that takes one argument and a block and calls the block with argument unless the argument is nil.

Puppet::Functions.create_function(:then) do
  dispatch :then do
    param 'Any', :x
    required_block_param
  end

  def then(x)
    x.nil? ? nil : yield(x)
  end
end  

Note that: Puppet blocks are passed the same way as Ruby blocks are and we can simply yield to the given block. Just as with Ruby blocks, the block can be captured in a parameter by having a &block parameter last, the block_given? method can be used, etc.

The then function is useful when looking up a nested value in a hash as it removed the need to check intermediate results for undef. Say, there may or may not be a value in a $hash such that $hash[a][b][c] and we just want that value, or undef if either a, b, or c are not found instead of an error if we say try to lookup c in undef (if b did not exist).

Instead we use the then function we just defined - like this:

$result = $hash
 .then |$x| { $x[a] }
 .then |$x| { $x[b] }
 .then |$x| { $x[c] }

And for completeness, if you were to write that without the function, you end up with something like this:

$result =
if $hash[a] != undef and $hash[a]|b] != undef and $hash[a][b][c] != undef {
  $hash[a][b][c]
}

...or worse if you start using variables for the intermediate steps

The block's number of parameters and their types

If nothing is specified about the number of parameters and types expected in the accepted block, the user can give the function any block. This is what you get by just calling required_block_parameter, or optional_block_parameter. You still get type checking, but this takes place when the block is called.

If you want to involve the number of parameters and their types in the dispatching - i.e. selecting which ruby method to call based on what the user defined in the block you can do so by stating the Callable type of the block. (The Callable type was added in Puppet 3.7, and is described in this blog post). In brief - Callable[2,2], means something that can be called with exactly two arguments of any type).

Here is the dispatcher part of the each function (from Puppet source code):

Puppet::Functions.create_function(:each) do
  dispatch :foreach_Hash_2 do
    param 'Hash[Any, Any]', :hash
    required_block_param 'Callable[2,2]', :block
  end

  dispatch :foreach_Hash_1 do
    param 'Hash[Any, Any]', :hash
    required_block_param 'Callable[1,1]', :block
  end

  dispatch :foreach_Enumerable_2 do
    param 'Any', :enumerable
    required_block_param 'Callable[2,2]', :block
  end

  dispatch :foreach_Enumerable_1 do
    param 'Any', :enumerable
    required_block_param 'Callable[1,1]', :block
  end

  def foreach_Hash_1(hash)
    enumerator = hash.each_pair
    hash.size.times do
      yield(enumerator.next)
    end
    # produces the receiver
    hash
  end

And to be complete, here are the methods the dispatchers calls - the actual implementation of the each function. As you can see, each variation on how this function can be called; with an Array, a Hash, a String, and one or two arguments are now handled in a small and precise method. (It is really just Hash that needs special treatment, all others are handled as enumerables (i.e. what ever the Puppet Type System has defined as something that can be enumerated / iterated over in the Puppet Language).

  def foreach_Hash_2(hash)
    enumerator = hash.each_pair
    hash.size.times do
      yield(*enumerator.next)
    end
    # produces the receiver
    hash
  end

  def foreach_Enumerable_1(enumerable)
    enum = asserted_enumerable(enumerable)
      begin
        loop { yield(enum.next) }
      rescue StopIteration
      end
    # produces the receiver
    enumerable
  end

  def foreach_Enumerable_2(enumerable)
    enum = asserted_enumerable(enumerable)
    index = 0
    begin
      loop do
        yield(index, enum.next)
        index += 1
      end
    rescue StopIteration
    end
    # produces the receiver
    enumerable
  end

  def asserted_enumerable(obj)
    unless enum = Puppet::Pops::Types::Enumeration.enumerator(obj)
      raise ArgumentError, ("#{self.class.name}(): wrong argument type (#{obj.class}; must be something enumerable.")
    end
    enum
  end
end

What about Dependent Types and Type Parameters?

If you read the above example carefully, or if you already are used to working with a rich type system you may wonder about type parameters and if it is possible to use dependent type.

The short answer is no, the puppet type system, while capable of describing rich types we have not added the ability to use type parameters. They would be really useful - take the hash example, where we instead of:

    param 'Hash[Any, Any]', :hash
    required_block_param 'Callable[2,2]', :block

could specify that the block must accept the key and value type of the given Hash - e.g. something like:

    param 'Hash[K Any, V Any]', :hash
    required_block_param 'Callable[K,V]', :block

This however requires quite a lot of complexity both in the type system itself and what users are exposed to. (The syntax has to be something more elaborate than what is shown above since the references to K and V must naturally find the declared K and V somehow - in the sample that is solved by magic :-).

If we do provide a mechanism to reference the type parameters of the actual types given in a call, we could fully support dependent types. As an example, this would enable declaring that a function takes two arrays of equal length.

How about Return Type?

Return type is also something we decided to leave out for the time being. In hindsight it should have been added from the start as this enables both advanced type inference and type checking to be performed. For this reason we may add this into the dispatch API early in the 4x series. The most difficult part will be figuring out the syntax for the Callable type since it also needs to be able to describe the return type of the callable.

The Puppet 4.0.0 Type System Changes

The Puppet Type System in Puppet 4.0.0 (and in 3.7 when the future parser/evaluator is in use) has undergone some change. I am posting this update for those that have already experimented with the Type system and that just want to know what has changed.

Also, the already published posts in this series about the type system will be updated with these changes (where not already done).

Object Renamed to Any

We felt that the word object had too many associations with an object oriented programming language and did not fit very well with the rest of the Puppet Language. There is already confusion over what a "class" is (especially if you come from an OO language).

From now on, the most abstract type in the Puppet Type System is Any. As the name implies, it accepts assignment of an instance of any other type, including Undef. Thus, there is no need to use Optional[Any].

"All Your Types are Belong to Any"2

All types are now also Any. Earlier you would have to use a Variant type if you wanted a type to be able to accept both Type and Object instances (i.e. Variant[Object, Type]), now you just use Any.

The Ruby[name] Type Renamed to Runtime['ruby', name]

The Puppet Type System supports references to types in an underlying runtime system. Currently only Ruby, but the Puppet master will run on JRuby on top of a JVM and there is then also expected to be the need to reference types in the JVM name-space. The implementation in 3.6 supports a type called Ruby, and the specification reserves names of other runtime systems (e.g. Java).

The 3.6 implementation does however block usage of those names (e.g. Ruby, Java) as the name of a resource type (plugin), or user defined resource type (define) using the short notation, and users would be required to use the longer notation for form e.g. Resource[java] to reference such a resource.

This is unfortunate as the obvious names of the types in the type system also are obvious names for managing these technologies with Puppet. References to runtime types are used far more seldom so we decided to rename Ruby[class_name], to the more generic Runtime['ruby', type_name].

Runtime['ruby',class_name] is currently the only supported runtime type, but you can expect there to be a Runtime['jvm'] or Runtime['java'] when/if the need arises.

This change only affects those who have played with the advanced features in the puppet bindings system or played with advanced puppet functions where a reference to a Ruby type was passed using a Ruby type defined in .pp logic, or in internal ruby logic inside the puppet runtime.

The Default Type

We also realized that we forgot about one symbolic value in the Puppet Language, the default. It is a value in the language (represented by the Ruby symbol :default internally), and it can be passed around. In 3.6, the type of a default expression is Ruby['Symbol'], and would have been Runtime['ruby', 'Symbol'] in 4.0 unless we did something.

The solution was to add a type to the type system unsurprisingly called Default. There is only one value that is an instance of this class, and such an instance is only assignable to Any or Default.

Note that the value itself holds no magic powers unless it is used in a position that acts on it; like in a case expression where the case expression takes the default value to mean 'match against anything'. If you do this matching yourself, say 1 == default the result is false.

The 'default value' has practical value where there is a need to pass two different kinds of unknown values as well as values. You can use it to get one behavior for undef/missing, one for given values, and one for the default value. Note that passing a value of default, does not mean that it will assign a parameter's default value, it means setting that parameter to the special value of Default type.

Also note that this is not a default-type; a type that is used by default, that type is called Any.

The Callable Type

Also added is the Callable Type. It currently has no practical use in the Puppet Language since it is not possible to assign or pass a lambda/block as a value. It is however of importance when writing Puppet functions in Ruby using the 4x function API since it can accept lambdas/blocks and there is the need to also be able to define the types of an acceptable block's parameters.

Although, you may see references to the Callable type in error messages, if you are not into writing functions using the 4x function API that accepts lambdas/blocks, you can probably skip the rest of this post as such errors should be understandable from context.

Here is an excerpt from the Puppet Language Specification:

Callable is the type of callable elements; functions and lambdas. The Callable type will typically not be used literally in the Puppet Language until there is support for functions written in the Puppet Language. Callable is of importance for those who write functions in Ruby and want to type check lambdas that are given as arguments to functions in Ruby. They are also important in error messages when communicating why a given set of arguments do not match a signature.

The signature of a Callable denotes the type and multiplicity of the arguments it accepts and consists of a sequence of parameters; a list of types, where the three last entries may optionally be min count, max count, and a Callable (i.e. calling a lambda with another lambda).

  • If neither min or max are specified the parameters must match exactly.
  • A min < size(params) means that the difference is optional.
  • If max > size(params) means that the last type repeats until the given max cap number of arguments
  • if max is literal default, the max value is unbound (+Infinity).
  • If no types and no min/max are given, the Callable describes any callable i.e. Callable[0, default] (i.e. no type constraint, and any number of parameters).
  • Callable[0,0] is a callable that does not accept parameters
  • If no types are given, and the min/max count is not [0,0], then the callable describes only the untyped arity and it places no constraints on the parameter types, e.g. Callable[2,2] means callable with exactly 2 parameters.

Callable type algebra is different from other types as it seems to work in reverse. This is because its purpose is to describe the callability of the instance, not its essence (even if the type serves dual purpose by simply reversing the comparison). (This is known as Contravariance in computer science). As an example, a lambda that is Callable[Numeric] can be called with one argument being a Numeric, Float, or an Integer, but not with a Scalar, or Any. Thus, while it seems intuitive that a Callable[Integer] should be assignable to a Callable[Any] (since Any is a wider type), this is not true because it cannot be called with an Any. The reason for checking the type of a callable is to detect if it can be called a certain way - thus assignable?(Callable[Any], Callable[Integer]) really is a declaration that there is an intent to call the callable with one Any argument (which it does not accept).

This also means that generality works the opposite way; Callable[String] ∪ Callable[Scalar] yields Callable[String] - since both can be called with a String, but both cannot be called with any Scalar.

You can read the full specification text for Callable in the Puppet Language Specification.

Isn't something missing?

If you read all of the above about the Callable type, you may have wondered how the type system deals with callables that do not specify the types of the parameters. What are they? They cannot really be typed as Any for the reasons given above - are they just Undef or nil?

The answer is that there is a type that is used internally in the type system to represent this case. This type is known as Unit, and it is basically a chameleon that says 'I am whatever you want me to be' - technically the contravariant of Any.

It cannot be used directly from the Puppet Language; you can however observe instances of this type when specifying something like Callable[1,1] (a callable that accepts exactly one parameter) in your 4x function API for a block parameter and then introspect the created type.

You are not expected to ever use this internal type directly. If you type Unit in the Puppet Language, you actually get a reference to the resource type Resource[Unit]. The internal type is however required in the type system to avoid special cases, and since you may observe it or come across it when reading the source code of puppet I thought it was worth mentioning.

The Puppet 4x function API

In Puppet 4.0.0 there is a new API for writing Ruby functions that extend the functionality of the Puppet language. This API is available in the 3.7.x versions of Puppet when using --parser future, so you can try out this functionality today.

The new 4x API for functions was created to fix problems and add missing features in the 3x API:

  • The function runs as a method on Scope (and has access to too much non-API)
  • Undefined arguments are given to the function as empty strings, but as a :undef Symbol if undefined values are given inside collections.
  • There is no automatic type checking
  • Functions share a flat namespace and you have to ensure you use a unique name
  • Functions cannot be private to a module
  • Functions are defined in the Puppet::Parser::Functions namespace. Future use of functions is to also use them where no parser is available. The concept of "parser function" is just odd.
  • Methods defined in a Function pollute Scope - if you require helper logic it must be in a separate class.
  • There are problems with reloading complex functions
  • There is a distinction between functions of expression and statement kind, and this distinction is no longer meaningful.
  • The specification of arity (number of arguments) used in 3x to describe parameters to a function, is a blunt tool (no typing, no overloading, and it can not express a variable number of arguments that is capped).
  • Documentation can not (at least not easily) be retrieved without running the ruby code that defines the function.

The 4x function API solves all of these issues. (With the exception of private functions, which did not make it into 4.0.0, but will be added during the 4x series).

A simple function in the 4x API

The new API has many features, yet, for simple functions, it is very easy to use. Here is a basic example.

Puppet::Functions.create_function(:max) do
  def max(x, y)
    x >= y ? x : y
  end
end

This defines the function max taking two arguments (of Any kind). As you can see, it is slightly different from the 3x function API in that the body of the function is expressed in a defined method.

Also different is that functions are now stored under <moduleroot>/lib/puppet/functions instead of under the terribly confusing <moduleroot>/lib/puppet/parser/functions in 3x which has mislead everyone to talk about "parser functions" - which I guess could mean some kind of function used for parsing. Neither the 3x nor the 4x function plays any role during parsing, and they should be referred to as "functions". So please, no more "parser function" crazy talk...

Automatic Type Checking

In the 4x API there is support for type checking. Here is the same function again, now with type checking:

Puppet::Functions.create_function(:max) do
  dispatch :max do
    param 'Numeric', :a
    param 'Numeric', :b
  end

  def max(x, y)
    x >= y ? x : y
  end
end

As you can see, the max method is identical to the first version. A call to a dispatch method has been added to type the parameters. In addition to typing the parameters, the dispatch call also informs puppet that the call should be dispatched to a particular method (in the example above to :max). If we inside our function want to call the method max_num (instead of the method max) we would change the definition like this:

Puppet::Functions.create_function(:max) do
  dispatch :max_num do
    param 'Numeric', :a
    param 'Numeric', :b
  end

  def max_num(x, y)
    x >= y ? x : y
  end
end

The function is still named max() in the Puppet Language, but internally, when it is called with two Numeric arguments, the call is now dispatched to the max_num method. As you will see in the next section, this is very useful when we want to write functions that have different implementations depending on the types of the arguments given to it when it is called.

When defining a parameter, the type is always given in a string using the Puppet Language Type System notation. This means you can be very detailed in your specification and get type checking with high fidelity.

Multiple Dispatch

The 4x API supports multiple dispatch; so far you have seen two examples. In the first there where no calls to dispatch and the system automatically figured out that the call should be dispatched to a method with the same name as the function.

In the second example we took over dispatching, and declared that a call requires two Numeric arguments.

What if you want to call max with either Numeric, or String arguments? We could certainly type the arguments as Variant[Numeric, String], but we would then also need to write the logic in our method to deal with all of the possible cases. A much simpler approach is to use multiple dispatch. Here is an example - this time for a min function:

Puppet::Functions.create_function(:min) do
  dispatch :min do
    param 'Numeric', :a
    param 'Numeric', :b
  end

  dispatch :min_s do
    param 'String', :s1
    param 'String', :s2
  end

  def min(x,y)
    x <= y ? x : y
  end

  def min_s(x,y)
    cmp = (x.downcase <=> y.downcase)
    cmp <= 0 ? x : y
  end
end

Now the system will look at the types of the given arguments and pick the first matching dispatcher. Thus, in min we know that the arguments are Numeric, and in min_s we know that they are String. Everything is precise, small, clear and easy to read. We also did not have to spend time on dealing with error handling as type checking always takes place in all calls.

Variable Number of Arguments

The 4x API can handle a variable number of arguments. If you do not use a dispatcher the logic introspects the Ruby method declaration and checks the types of the arguments. If we change the max function to return the maximum of a variable number of arguments we can do that like this:

Puppet::Functions.create_function(:max) do
  def max(*args)
    args.reduce {|x, y| x >= y ? x : y }
  end
end

If you want to also type the arguments, or cap the max number of arguments, then this is done in the dispatcher by defining the minimum and maximum argument count with a call to arg_count. In the example below a minimum of 1 argument is specified, and a maximum of :default (which means any number of arguments).

Puppet::Functions.create_function(:max) do
  dispatch :max do
    param 'Numeric', :args
    arg_count 1, :default
  end

  def max(*args)
    args.reduce {|x, y| x >= y ? x : y }
  end
end

Note that the arg_count specifies the min required and max allowed number of arguments given to the function (i.e. it is not just for the last parameter). Also note that the method the call is dispatched to can be defined in any compatible way (i.e. it must handle missing arguments by using default values, or capture variable arguments in an array as in the example below:

Puppet::Functions.create_function(:example) do
  dispatch :example do
    param 'Numeric', :name
    param 'String', :value
    param 'Numeric', :name2
    param 'String', :value2
    arg_count 2, 4
  end

  def example(name, value, *args)
  end
end

Namespaced Functions

In 3x it is not possible to give functions a name-spaced name. They all live in the same name space. This is a problem because one module may override functions in another module. In 4x, the functions can be given a complex name. To do this, the function should be placed in a directory that corresponds to the name space, and it should be named accordingly in the call to create_function.

Here, the function max is placed in the namespace mymodule (which is also the name of the module).

# in <moduleroot>/lib/functions/mymodule/max.rb
Puppet::Functions.create_function(:'mymodule::max') do
  dispatch :max do
    param 'Numeric', :args
    arg_count 1, :default
  end

  def max(*args)
    args.reduce {|x, y| x >= y ? x : y }
  end
end

Note that it is only the name of the function that needs to be given the fully qualified name, in the dispatcher the name of the Ruby method to dispatch to is still used, and it is not a fully qualified name.

You can nest namespaces further if you like.

To call to a fully qualified function from the Puppet Language simply uses the full name - e.g:

mymodule::max(1,2,3,4)

Helper Logic

You can have as many helper methods you like in the function - it is only the methods being dispatched to that are being used by the 4x function API. You are however not allowed to define nested ruby classes, modules, or introduce constants inside the function definition. If you have that much code, you should deliver that elsewhere and call that logic. Note that such external logic is static across all environments.

Documenting the Function

The new Puppet Doc tool (a.k.a Puppet Strings) that will be released with Puppet 4.0.0 can produce documentation from functions written using the 4x function API. In the 3x API functions are documented with a Ruby String that is given in the call to create a function. The 4x API instead processes comments that are associated with the created function. This processing supports a set of YARD tags to make it possible to write documentation of higher quality. Tags for @param, @example, and @since are examples of such tags.

See the Puppet String project at github for examples and more information.

In the Next Post

In the next post I describe how you can pass code blocks to puppet functions and call them from within the function.