In Puppet 4.0.0 there is a new technology-agnostic mechanism for data lookup that makes it possible to provide default values for class parameters in modules and in environments. The mechanism looks first in the "global" data binding mechanism across all environments (i.e. the existing mechanism for data binding, which in practice means hiera, since this is the only available implementation). It then looks for data in the environment, and finally in the module.
The big thing here is that a user of a module does not have to know which implementation the module author has chosen - the module is simply installed (with its dependencies). The user is free to override values using an implementation of their choice (in the environment using the new mechanism, or with the existing data binding / hiera support).
It is expected that there will be implementations for hiera as well available in a module.
In this part 1 about the new data binding feature I will show how it can be used in environments and modules. In the next part I will show how to make new data binding implementations.
How does it work?
Out of the box, the new feature:
provides module authors with a way to select which data binding implementation to use in their module without affecting how other modules get their data.
provides users configuring an environment to select which data binding implementation to use in an environment (or all environments) - different environments can use different implementations, and the environment does not have to use the same implementation as the modules.
contains a data binding implementation named 'function' which calls a puppet function that returns a hash of data. The module author can select this mechanism and simply implement the function. A user can also configure an environment to use a function to provide the data - the function is then added to the environment.
provides module author with a way to package and share a data binding implementation in a module. It can be delivered in the same module as regular content, or in a separate module just containing the data binding implementation.
Using a function to deliver data in an environment
This is the easiest, so I am starting with that. Two things are needed:
- Configuring the environment to state that a function delivers data.
- Writing the function
configuring the environment
The binding provider to use for an environment can be selected via the environment specific setting environment_data_provider
. The value is the name of the data provider implementation to use. In our example this is 'function'
.
If not set in an environment specific environment.conf
, the environment inherits the global setting - which is handy if all your environments work the same way.
writing the function
The function must be written using the 4x function API and placed in a file called
lib/puppet/functions/environment/data.rb
under the root directory of the environment.
# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
def data()
# return a hash with key to value mappings
{ 'abc::param_a' => 'default value for param a in class abc',
'abc::param_b' => 'default value for param b in class abc',
}
end
end
Later in the 4x series of Puppet, it will be possible to also write such functions in the puppet language which makes authoring more accessible.
Note that the name of the function is always environment::data
irrespective of what the actual
name of the environment is. This because, it would not be good if the name of the function had to change as you test a new environment named 'dev' and later merged it into 'production'.
Using a function to deliver data in a module
The steps to deliver data with a function for a module is different because there are no individual settings for a module. Here are the steps:
- Creating a binding using the Puppet Binder to declare that the module should use the 'function' data provider for this module.
- Writing the Function
Note that in the future, the data provider name may be made part of the module's metadata. This is however not the case in the Puppet 4.0.0 release.
writing the binding
The binding is very simple as it is all boilerplate except for the name of the module and the name of the data provider implementation - 'mymodule
' and 'function
' in the example below. The name of the file is lib/puppet/bindings/mymodule/default.rb
where the mymodule
part needs to reflect
the name of the module it is placed in. (The file is always called 'default.rb'
since it contains the default puppet bindings for this module).
# <moduleroot>/lib/puppet/bindings/mymodule/default.rb
#
Puppet::Bindings.newbindings('mymodule::default') do
bind {
name 'mymodule' # name of the module this is placed in
to 'function' # name of the data provider
in_multibind 'puppet::module_data' # boiler-plate
}
end
writing the function
This is exactly the same as for the environment, but the function is named mymodule::data
where mymodule is the name of the module this function provides data for. The file name is
lib/puppet/functions/mymodule/data.rb
# <moduleroot>/lib/puppet/functions/mymodule/data.rb
#
Puppet::Functions.create_function(:'mymodule::data') do
def data()
# Return a hash with parameter name to value mapping
{ 'mymodule::abc::param_a' => 'default value for param a in class mymodule::abc',
'mymodule::abc::param_b' => 'default value for param b in class mymodule::abc',
}
end
end
Overriding a parameter in the environment
As you may have figured out already, it is easy to override the module's data in the environment.
As an example we may want to provide a different value for mymodule::abc::param_b
at the
environment level. This is how that would look:
# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
def data()
{ # ... other keys and values
'mymodule::abc::param_b' => 'env specific value for param b in class mymodule::abc',
}
end
end
Getting the data
To get the data, there is absolutely nothing you need to do in your manifests. Just as before, if a class parameter does not have a value, it will be looked up as explained in this blog post. Finally, if there was no value to lookup the default parameter value given in the manifest is used.
Using the examples above - if you have this in your init.pp
for the mymodule
module:
class mymodule::abc($param_a, $param_b) {
notice $param_a, $param_b
}
the two parameters $param_a
and $param_b
will be given their values from the hashes returned
by the data functions, looking up mymodle::abc::param_a
, and mymodule::abc::param_b
.
Note that there is no need to use the "params pattern" now in common use in modules for Puppet 3x!
More about Functions
Since the new 'function' data provider is based on the general concept of calling functions and you can call other functions from them, you have a very powerful mechanism to help you organize data and to do advanced composition.
The data function is called once during a compilation for the purpose of producing a Hash
with qualified name strings to data values. The function body can call other functions, use expressions, transformations, composition etc. When the data binding kicks in, it will call the function on the first request to get a parameter in the compilation, it will then cache the returned hash and reuse it for lookup of additional parameters (this in contrast to calling the function for each and every parameter which would be much slower).
Note that the data function can be called like any other function!. This means that a module or environment can use another module's data function, transform it etc. before using its data.
Naturally, since we are dealing with functions it is easy to divide the composition of data into multiple functions, and then hierarchically compose them. Say that we want to divide the data up into two parts, one for osfamily
, and one for common
and we then want to combine them.
We can now do a simple function composition and merge the result.
In the examples, the functions are written using the puppet language (even though they are not available in the 4.0.0 release). At the moment, it is left as an exercise to translate them into Ruby. What I want to show here is the power of combining data with functions without cluttering the examples with what you need to do in Ruby to get variables in scope, call other functions etc.
Data Composition with Puppet functions
When we add support for functions in the Puppet Language data composition can look like this:
function mymodule::data {
mymodule::common() + mymodule::osfamily()
}
function mymodule::osfamily() {
case $osfamily {
'Debian' : {
{ mymodule::abc::param_a => 'the debian value for a' }
}
'Darwin': {
{ mymodule::abc::param_a => 'the osx value for a' }
default: {
{ } # empty hash
}
}
function mymodule::common() {
{ mymodule::abc::param_a => 'the default for param a',
mymodule::abc::param_b => 'the default for param b',
}
}
Naturally, the functions called from the data function can take parameters. The data()
function itself however does not take any parameters.
Example - Module with multiple use cases
A module author wants to provide a set of default values for a base use case of the module, but also wants to offer defaults for other use cases. Clearly, there can only be one set of defaults applied at any given time, and the data()
function in a module is for that module only, so these defaults must be provided at a higher level i.e. in the environment (where it is known how the module
is getting used). If the environment is also using the function data provider, it is very simple to
achieve this:
function environment::data() {
# merge usecase_x from module with the overrides
mymodule::usecase_x() + {
mymodule::abc::param_b => 'default from environment for param_b'
}
}
This illustrates that mymodule
has a special data function named mymodule::usecase_x()
that provides an alternate set of default values for classes inside the mymodule
, these are then overridden with a hash of specific overrides wanted in this environment.
Example - Hierarchical keys
If you find it tedious to retype mymodule::classname::foo
, mymodule::classname::bar
, etc. etc.
you can instead construct the keys programmatically. Since the "data functions" are general functions, variables and interpolation can be used - e.g:
function mymodule::data() {
$m = 'mymodule::abc'
{ "${m}::param_a" => 'the value',
...
}
}
Or why not call a function that reorganizes a hierarchical hash; say that we have param_a
in classes a::b::x
, a::b::y
, and a::b::z
, we could then do something like this:
function mymodule::data() {
$hierarchical = {
a => {
b => {
x => { param_a => 'default for a::b::x::param_a' },
y => { param_a => 'default for a::b::y::param_a' },
z => { param_a => 'default for a::b::z::param_a' },
}}}
# Calling a function that expands the hash (left as an exercise)
expand_hierarchical_keys($hierarchical)
}
Trying out this new featue
When this is written, the new data binding feature is available in the nightlies for Puppet 4.0.0, or you can run it from source using Puppet's master branch. (The new feature will not be available for 3x with future parser). If you are reading this after Puppet 4.0.0 has been released, just get the release.
Summary
The new data provider mechanism is a technology agnostic way of defining default data for modules and environments without dictating that a particular technology is used by the users of a module.
The new mechanism comes with a built in implementation based on functions that provides a simple yet powerful way of delivering, using and composing data. Functions in Ruby provide a simple way to extend the functionality without having to write a complete data provider.
The function mechanism, while relatively easy to write in Ruby for delivering data since they consist mostly of boilerplate code will become much more powerful and accessible when functions can be written in the Puppet Language.
In the next post about the new data binding feature I will show how to write a new implementation of a data provider.