Tuesday, September 18, 2018

There is More Than 0 and 1 to Binary Data in Puppet

There is More Than 0 and 1 to Binary Data In Puppet

There is More Than 0 and 1 to Binary Data in Puppet

Sometimes you have the need to manage resources that have binary data. Historically you needed luck to be on your side to make this work (more about this below), and more recently Puppet was given the ability to handle binary data by using the Puppet specific PSON on the wire format instead of JSON.

The problems caused by binary data are caused by two things:

  • Ruby does not have a data type for binary data other than a String with ASCII-8BIT encoding.
  • There is no binary data type in textual data formats such as JSON or YAML and Base64 encoding is required of these values - but how do you then differentiate between a text string and a Base64 encoded one? You don’t want for example a binary file to end up with the Base64 encoded text representation of the binary content it was supposed to have.

Historically you needed to be lucky

Luck played a big part historically as (in very old Puppet versions) the encoding of the catalog was undefined, and in more modern versions the encoding has been set to UTF-8. The problem then was that binary content could contain bytes or sequences of bytes that were invalid in the catalog’s encoding (the undefined; whatever happened to be the default encoding, or in UTF-8).

Thus, this “kind of” worked if your binary data did not contain any illegal byte sequences.

Enter PSON

To take luck out of the equation, the PSON format was invented to allow the catalog to contain ASCII-8-BIT. It also allows Ruby specific data types to be included in the format. Using PSON is bad for several reasons:

  • It is implemented in Ruby, and is thus much slower than the native JSON support in moder Ruby.
  • Using Ruby serialization breaks JSON compatibility and it is not possible to correctly read the stream with an off-the-shelf JSON parser in other languages.
  • Regressing to using ASCII-8BIT and then either keeping strings with that encoding, or trying to make them into UTF-8 and leaving those that could not be converted as ASCII-8BIT requires processing and is not really robust as it leaves some binary content looking like UTF-8 and subsequent operations may fail.

Enter Rich Data

Puppet has for some time had the experimental option --rich_data which when turned on uses a fully JSON standard representation of data that cannot be expressed natively in a textual format like JSON or YAML.

Use of rich_data is the standard in Puppet 6.0.0. A Puppet 6 agent will request the catalog in rich data format from the master. (Before Puppet 6.0.0 using rich_data on an agent did not really work, but was fine when used with apply).

So now, finally, we can send values of all Puppet data types in a catalog, including binary data. Hooray !!

There is a lot more to say about the rich data format in the catalog - and I will get back to that after we looked at binary data in more detail.

Binary Data in Ruby

As an example - here is a string (in Ruby) that is binary (since \xFF cannot be a byte on its own in UTF-8).

bad_utf8 = "hi \xFF"
bad_utf8.bytes  # is [104, 105, 32, 255]
a.split(' ') # fails with bad encoding error

If the bad_utf8 string was returned into Puppet Language and then used as a resource parameter value the behavior is undefined as we don’t know what operations someone may do on that value. If it reaches the catalog, it would (before Puppet 6.0.0) make Puppet switch to PSON since it is invalid UTF-8. In summary: bad stuff happens. What happens in Puppet 6.0.0 is described further on.

We can make the bad_utf8 value ok by forcing the encoding to be ASCII-8-BIT like this:

ok_binary = bad_utf8.force_encoding(Encoding::ASCII_8BIT)
ok_binary.bytes  # is [104, 105, 32, 255]
ok_binary.split(' ') # works and results in ["hi", "\xFF"]

After the change of encoding we still have the exact same bytes, but now Ruby string operations know that we are dealing with just bytes and not variable length encoding of characters and operations will not fail.

If we return ok_binary and use it in the Puppet Language, there is a small risk it could still fail if the value is given to a function and it tries to change encoding and doing it wrong. If the value is assigned to a resource parameter, it will (before Puppet 6.0.0) require switching to PSON, but otherwise be fine. (Again, what happens in Puppet 6.0.0 is described further on).

Binary Data in the Puppet Language

Historically binary data in the Puppet Language was only possible by calling a function that return a string with binary content (either possibly invalid UTF-8, ASCII-8BIT encoded, or any other encoding in fact since it can be set by a user). The file() function is the primary example of this.

The Binary data type was introduced in Puppet 4.8.0 together with the function binary_file() that returns a Binary. It is also possible to create a binary from a text string, from a Base64 encoded string, or from an array of integer byte values. See the documentation for Creating a Binary for the details.

No Poking of Byte Values into Strings

In contrast to Ruby, the Puppet Language does not support the \x escape sequence for strings as it is too easy to create illegal byte sequences. Puppet supports the \u escape to insert Unicode characters into a string. While you can specify a character that may not exist the result will never blow up later with a bad encoding error.

Going the other way is possible - if you have a Binary, it can be turned into a String with \xHH content for non UTF-8 Strings (where HH is two hex digits). This is for display purposes only, such a string cannot again be parsed by the Puppet Language.
See the documentation Binary value to String for the details.

The Rich Data Format

The rich data format used as standard in Puppet 6.0.0 is based on standard JSON.
When rich data values are encountered, they are encoded as a Hash (i.e. JSON Object). Data values that can represented as a String will have two reserved keys in the resulting hash __ptype, and __pvalue where __ptype is the name of the data type and __pvalue is the value in string form. For example - with either of these two values (Puppet Language):

Binary('hello', '%s') # Make binary out of text (non base64 string)

or

Binary('aGVsbG8=') # Make binary from base64 string

would appear like this in a catalog:

{"__ptype": "Binary", "__pvalue": "aGVsbG8=\n"}

In Puppet 6.0.0: When serializing, both Binary values and ASCII-8BIT encoded strings are serialized as being Binary (as in the example above). When deserializing a catalog on the agent it will transform all instances of Binary into instances of ASCII-8BIT. Types and providers thus only needs to deal with String and possibly check for the encoding of a string (to not mess things up).

Before Puppet 6.0.0: When serializing Binary and ASCII-8BIT strings are handled without any transformation, and possibly regressing to using PSON on the wire if ASCII-8BIT strings were not UTF-8 compatible (i.e. all the mess discussed earlier). Types and Providers would need to know about the Binary data type - only the File data type supported this, and then only when using puppet apply.

Learn more

There is a lot more to say about the rich data format, and you can read all the details in the Pcore Data Representation specification, and specifically the Pcore Generic Data document that describes the format used in the Puppet 6.0.0 catalog.

Rich Data and PDB

So, what does PDB do with rich data? Can it be queried etc?

Before Puppet 6.0.0: If using --rich_data a catalog with rich data would be handled the same way as if the hashes representing rich data were just hashes. In practice this kind of worked since the data was stored and could be retrieved, but both viewing and querying was difficult. Also, Reports could contain rich data hashes, and this could cause downstream problems.

In Puppet 6.0.0: A hard decision had to be made, and the least bad option was to simply stringify rich data into most reasonable human consumable form. While this is lossy (you cannot read a catalog back again and get an actual catalog with the same data as the original), it serves the current use cases where a catalog is stored in PDB.
The same approach was taken for Reports.

So, to answer the question - “Can rich data be queried?”, well, kind off, if you base your query on the string representation.

Needless to say, the entire domain of rich data; more advanced queries and producing more meaningful diffs in Reports are examples of work that is ahead of us.

Summary

  • Use binary_file() to read binary content, or when you don’t know if the file is not UTF-8 compatible.
  • It is recommended to use the Binary data type to type binary input in all functions and to prefer returning Binary over returning ASCII-8BIT string.
  • From Puppet 6.0.0 expect binary contents to be ASCII-8BIT encoded String in types and providers, but not deferred “agent side” functions.
  • Before Puppet 6.0.0 expect binary content to be either ASCII-8BIT encoded strings or instances of the Puppet Binary data type as the representation differs depending on use of --rich_data or not, or evaluating the logic on the agent, or as part of a puppet apply.

Monday, September 17, 2018

More about undef

More about undef

More about undef

In my earlier blogpost Let’s talk about undef I covered the data type Undef itself. In this blog post I am going to cover what happens when you use undef in puppet manifests and in Ruby.

Over time the Puppet Language undef has been represented internally in different ways. Starting with Puppet 4 (and with future parser in Puppet 3) the compiler (i.e. the puppet language) represents undef as the Ruby nil value.

However, the resource API and the older so called 3.x function API required nil to be transformed to other values for backwards compatibility.

Undef and Functions

Puppet has two function APIs, the so called 3x (old) and 4x (since Puppet 3 with future parser).

Here is how the transformations are done for a function call to function f() in puppet and what the resulting Ruby values given to the function are in the 3x and 4x APIs:

puppet 3x API < 6.0.0 3x API >= 6.0.0 4x API
f(undef) '' '' nil
f([undef]) [:undef] [nil] [nil]
f({undef => undef}) {:undef => :undef}] {nil => nil} {nil => nil}

As you can see from the table above:

  • In the 3x API (in all Puppet versions) a top level value of undef is translated to the empty string.
  • In versions before 6.0.0, an undef nested (at any depth) inside an Array or Hash is translated to the Ruby Symbol :undef.
  • From Puppet 6.0.0 nested undef in the 3x API is handled the same was as for the 4x API - using the Ruby runtime value nil.

Returned values

Returning values from 4.x functions works as you expect; anything set to nil is semantically the same as Puppet’s undef. This is however royally screwed up when it comes to 3x functions! Since they are given the transformed undef values (in the form of either empty string or symbol :undef, they could return such values back to the compiler. And thus, the compiler may end up feeding values encoded that way to 4x functions - thus exposing the 4x functions to the 3x API encoding.

The compiler treats the :undef symbol as if it was nil in terms of type checking and it will also be serialized as if it was a nil, but since there is no transformation going on for 4x functions they were exposed to this problem.

From Puppet 5.5.7 all returned values from 3x functions are subject to a transformation such that the :undef symbol is transformed back to nil.

For 3x functions that you maintain, the recommendation is to change them to use the 4x API since that makes it sane; you get nil for Puppet’s undef and you return nil when you want to return undef values - and it all works in harmony with Ruby. (And if you have special processing of :undef you can remove that in favor or straight forward detection/removal of nil in Ruby). Although it is an action you need to take, it is quite easy to change a 3x to 4x function.

If you for some reason cannot do that, and you want to maintain your function as a 3x function supporting both old and new versions you should treat both :undef and nil as being nil - which means you may need to do operations twice. You also need to make sure you are not returning structures with :undef in them if function is used with any puppet version >= Puppet 3 with future parser <= Puppet 5.5.7.

Undef and Resources

Giving values to resource is almost like giving values to functions, but not quite. A given top-level undef in the resource API means “I want the default value” - that is, it acts as if you had not given a value at all!

# Given this definition:
define myresource($param = 'green') { }
# These two will have exactly the same effect on param1
# setting it to the value 'green' in both resources
myresource { 'title1': param1 => undef }
myresource { 'title2':  }

If we change the default value expression to also be undef, we
will get the effect of not including a value at all for that
parameter. (That is, no null value will appear in the JSON for the serialized catalog sent to the agent).

# Given this definition:
define myresource($param = undef) { }
# These two will have exactly the same effect on param1
# neither will have the param1 set at all
myresource { 'title1': param1 => undef }
myresource { 'title2':  }

The (abbreviated) output in the catalog looks like this:

{
  "type": "Myresource",
  "title": "title1",
},
{
  "type": "Myresource",
  "title": "title2",
}

Getting Undef from Hiera

When Automatic Parameter Lookup (APL) is used for a class, it is possible to bind an undef value (null in JSON and YAML data files). When a value for a parameter looked up with APL results in undef it will set the value of the parameter to undef (in contrast to when giving it in a manifest since that means - “use the default”).

The rationale for this is that APL kicks in to get a default value and it is then not meaningful to also let it return a value that means “use the default”. Instead, the result of the default value expression for a class parameter only gets used if there was no value bound at all for that parameter in hiera.

class car($color = 'blue') { }
class { 'car': color => undef }
  • If nothing was bound in hiera, this would result in the car’s color being 'blue' (the default expression kicks in)
  • If car::color was bound to ‘green’ in hiera, the result would be a 'green' car.
  • If car::color was bound to undef in hiera the result would be an undef color.

If you want to accept a parameter value of undef then declare a default value expression of undef. Then you can either get that default by not specifying a parameter value (or giving undef, which is the same thing), or you can bind undef in hiera and all gives you the same result.

Something like this:

class car(Optional[String] $color = undef) { }

Summary

I hope this has provided you with some of the (otherwise) hard to find details about how undef actually works in different Puppet versions. (I am also a bit sad that something like this blog post is needed - but that is a different story).

Saturday, May 7, 2016

Converting and Formatting Data Like a Pro With Puppet 4.5.0

Before Puppet 4

Before Puppet 4.0.0 there was basically only the data types; String, Boolean, Array, Hash, and Undef. Most notably missing were numeric types (Numeric, Integer, and Float). In Puppet 4.0.0 those and many other types were defined and implemented in a proper type system. This was all good, but a few practical problems were not solved; namely data conversion. In Puppet 4.5.0 there is a new feature that will greatly help with this task. But first lets look at the state of what is available in prior versions.

Converting String to Number - the current way

The most concrete example is having to convert a String to Numeric. While not always required since Puppet performs arithmetic on Strings that looks like numbers, that does not work for all operations.

The scanf function was added to handle general conversion. Thus if $str_nbr is a numeric value in string form you can convert it like this:

$nbr = scanf("%d", $str_nbr)[0]

That is quite a lot of excess violence to get the value because scanf is a general purpose function that can do lots of things:

  • get many values at once (hence the need to pick the first value from the result)
  • values can be embedded in text that is ignored
  • there are many formats to choose from but no defaults
  • if the conversion failed, the result is simply an empty array, so extra code is needed to validate the result and raise an error.

There is a much easier way to do the same, and this is now the idiomatic way of converting a numeric string:

$nbr = $str_nbr + 0   # makes it an integer
$nbr = $str_nbr + 0.0 # makes it a float

This works because Puppet automatically transforms string that looks like numeric information and because the + operator cannot be used to concatenate strings. When doing this, and the string is not numeric an error with a reasonable error message is displayed.

  • what if the string is octal but does not start with a 0 ?
  • what if the string is in hex but does not start with 0x, and the actual string does not have any of the letters A-F in it?
  • what if the string is in binary format?

Converting String to Boolean - the current way

Booleans in string form are also a bit tricky to convert. Since Puppet 4.0.0 the idiomatic way would be:

$bool = case $str_bool {
  "true" : { true }
  "false": { false }
  Boolean : { $str_bool }
  default : { fail("'$str_bool' cannot be converted to Boolean") }
}

Again, a lot more typing than what is necessary. In the above example, you may also want other values to be considered false/true like the empty string, an empty array, the literal value undef, etc. - they are easily added in the case expression. (You can write the above in several different ways, instead of capturing all booleans in a case option, the literal values true and false could be listed as alternative case options in the two entries above, that is using "true", true : { true }. The result would be the same.

Note that the example works because string matching is case independent, so the above also covers ‘True” / “False”, “tRuE”/”falSE” etc. If you do not want that, it is tricker and we would need to use regular expressions to match the strings.

If you have lots of boolean conversions going on, you can package it up as a reusable function:

function mymodule::to_boolean($str_bool) {
  # the case expr from previous example goes here
}
# and then convert like this:
$bool = $str_bool.mymodule::to_boolean()

While this works, it leads down a path to a flea-market of functions for conversion to and from, this or that (just look at the stdlib module which has quite a large number of such functions).

‘New’ is the New ‘New Way’

In Puppet 4.5.0 there is a function called new. It unsurprisingly creates a new instance of a type, which means you can write something like:

$num = Integer.new($str_num)

Added in Puppet 4.5.0 is also the ability to directly “call a type” - and this means calling the new() function on this type. We can thus shorten the above example to this:

$num = Integer($str_num)

This works for most types, but some are ambiguous like Variant types, or Undef which you really do not have to convert to, or Scalar which is also ambiguous.

The Boolean conversion from before can now be written like this:

$bool = Boolean($str_bool)

More Coolness with New

Each type defines what the arguments to its new operation are. Typically they accept (in addition to the value to convert), a format specification that is compatible with what is used in functions like sprintf, and scanf - but the set of formats have been expanded to suit the puppet language. Some conversions have other specific arguments. The entire set of options and what they mean can be found in the documentation of the new() function 1 - here are some examples:

$a_number = Integer("0xFF", 16)  # results in 255 (base 16)
$a_number = Integer("FF", 16)    # results in 255 (base 16)
$a_number = Numeric("010")       # results in 8
$a_number = Numeric("010", 10)   # results in 10 (base 10)
$a_number = Integer("true")      # results in 1
$a_number = Numeric("true")      # results in 1
$a_number = Numeric("0xFF")      # results in 255
$a_number = Numeric("010")       # results in 8
$a_number = Numeric("3.14")      # results in 3.14 (a float)

$a_bool = Boolean("yes")         # results in true
$a_bool = Boolean(1)             # results in true
$a_bool = Boolean(0)             # results in false

As you can see the conversions are flexible - you get a number 0 back for a boolean false. This is by design - the conversion tries it best to convert what it was given to the type you wanted.

Conversion performs assertion

When a conversion is performed it always ends with an assertion that the created value matches the type as in this example:

$port = Integer[1024]($some_string)

To enable easier handling of optional/faulty values, if the type is Optional[T], the assertion that is made accepts an undef result and the conversion will not error on faulty input and instead yield an undef result.

This will result in an error if the result is not an integer >= 1024.

Conversion with Array and Hash

It is possible to convert between arrays and hashes. Here it is also possible to use
Struct and Tuple types since those perform additional type assertion of the result.

$an_array = Array({a => 10, b => 20}) # results in [[a, 10],[b, 20]]
$a_hash = Hash([1,2,3,4])             # results in {1=>2, 3=>4}
$a_hash = Hash([[1,2],[3,4]])         # results in {1=>2, 3=>4}

The Array conversion also have a short form conversion for “make it an array if it is not already an array” by adding a boolean true argument:

$an_array = Array(1, true)    # results in [1]
$an_array = Array([1], true)  # results in [1]
$an_array = Array(1)          # error, cannot convert
$an_array = Array({1 => 2}, true) # [{1 => 2}]
$an_array = Array({1 => 2}}   # [[1, 2]]

Conversion to String

Conversion to String has the most features. There are many different formats to choose from per type, and it supports mapping type to format for nested structures. That is, different formats can be used for values in arrays and hashes, if arrays/hashes are short or long etc. Several blog posts would be needed to cover all of the functionality, so here are some examples:

String(undef)        # produces "" (empty string)
String(undef, '%d')  # produces "NaN" (we asked for a number)

$data = [1, 2, undef]
String($data)        # produces '[1, 2, undef]'

# A format map defines type to format mappings, for
# array and hash, there is a specific map for contents
# that is applied recursively.
# (See documentation for full information).
#
$formats = { Array => {
  format => '%(a',
  string_formats => {
    Integer => '%#x',
    Undef => '%d'
}}}

String($data, $formats) # produces '(0x1, 0x2, NaN)'

# Formatting with indentation
String([1, [2, 3], 4], "%#a")
# produces:
# [1,
#  [2, 3],
#  4]

Conversion is easy to use in interpolation

Typical use of formatting is when interpolating values into strings. The normal interpolation uses a default string conversion mechanism and this does not always give what you want.
Using the new() function is especially convenient when flattening, or unrolling arrays into strings as the String conversion provides full control over start/end delimiters and separators.

$things = [
  'Cream colored ponies',
  'crisp apple strudels'
  'door bells',
  'sleigh bells',
  'schnitzel with noodles'
]
notice "${String($things,'% a')}. These are a few of my favourite things."

would notice

"Cream colored ponies", "crisp apple strudels", "door bells", "sleigh bells", "schnitzel with noodles". These are a few of my favourite things.

Not exactly what we wanted. We did get an array join with separator ", " by default, the format "% a" removed the start and end delimiters from the array, but we got quotes around the favourite items. Also to make this read like the Mary Poppins song, we like to insert the word “and”. So, here is the next version where we define the format to use:

$formats = { Array => {
  format         => '% a',
  separator      => ', and ',
  string_formats => {
    # %s is unquoted string
    String => '%s',  
}}}
notice "${String($things, $formats)}. These are a few of my favourite things."

would notice:

Cream colored ponies, and crisp apple strudels, and door bells, and sleigh bells, and schnitzel with noodles. These are a few of my favourite things.

And just for the fun of it - lets turn that into a function.

function silly::mary_poppinsify(String *$str {
  $formats = {
    Array => {
      format         => '% a',
      separator      => ', and ',
      string_formats => {
        String => '%s',
  }}}
  "${String($things, $formats)}. These are a few of my favourite things."
}

So, finally, with a personal touch:

notice silly::mary_poppinsify(
  "Keys on pianos",
  "food in a bento",
  "progressive metal",
  "solos by Argento", 
)

(Printout left as an exercise).

Read more about type conversion in the specifications repository. Where each type
is documented, for instance String.new. The other types are in the same document.

When Puppet 4.5.0 is released this information will also show up in the regular documentation for function new().

Notes on a couple of advanced things

The String format map is processed in such a way that the formats given when calling new() are merged with the default formats. This merge takes type specificity into account such that types that are more specific have higher precedence. For example if the value to format matches two formats, one for type T, and another for type T2, if T2 < T then the format for T2 will be used, for example {Any => %p, Numeric => '%#d'} which means all values in programmatic form (strings are quoted, arrays and hashes have puppet language style
delimiters, etc.), and all numeric variables in quoted numeric form (that is "10" instead of the default %p which would have resulted in just 10 (without quotes).

Summing Up

The new() function supports creating new objects / values which can be used for data type transformation / casting and formatting. As you probably noticed, simple and common things are easily achieved while more complex things are possible. Conversions have become far more important in the Puppet Language now when there is EPP (templates in the puppet language). where the result is often some kind of configuration file with its own syntax and picky rules - so the details do matter.

The idea behind the more complex formats, and alternatives is to provide a rock bottom implementation that can be used to implement custom functions in the Puppet Language that can be reused in manifests as well as in templates.

There is probably a few common conversion tasks that occur frequently enough to warrant a format flag of their own that I missed to include in the first implementation. When writing this blog post for instance, it would have been nice if there was a format for “array with all things in it in %s format and no delimiters”; but then I would not have been able to show how that is done in long format. File tickets with wishes, or make Pull Requests with code as they are always welcome.

Hope you find this supercalifragilisticexpialidociously useful.


  1. Since 4.5.0 is not yet officially released, you can read the documentation in the source for new.rb, or in the specifications per type (link to String.new).

Thursday, May 5, 2016

Digging out data in style with puppet 4.5.0

In Puppet 4.5.0 there are a couple of new functions dig, then and lest that together with the existing assert_type and with functions makes it easy to do a number of tasks that earlier required conditional logic and temporary variables.

You typically run into a problem in programming languages in general when you are given a data structure consisting of hashes/arrays (or other objects), and you need to “dig out” a particular value, but you do not know if the path you want from the root of the structure actually exists.

Say you are given a hash like this:

$data = {
  persons => {
    'Henrik' => {
      mother => 'Anna-Greta',
      father => 'Bengt',
    },
    'Anna-Greta' => {
          mother => 'Margareta',
          father => 'Harald',
          children => ['Henrik', 'Annika']
    },
    'Bengt' => {
      mother => 'Maja',
      father => 'Ivar'
    },
    'Maja' => {
      children => ['Bengt', 'Greta', 'Britta', 'Helge']
    },
  }
}

Now, you would like to access the first child of ‘Anna-Greta’ (in case you wonder this is part of my family tree). This is typically done like this in Puppet:

$first_child = $data['persons']['Anna-Greta']['children'][0]

Which will work just fine (and set $first_child to 'Henrik') given the $data above. But what if there was no ‘Anna-Greta’, or no ‘children’ keys? We would get an undef result, and the next access would fail with an error.

To ward of the evil undef you would have to break up the logic and test at every step. For example, something like this:

$first_child = 
if $data['persons']
   and $data['persons']['Anna-Greta']
   and $data['persons']['Anna-Greta']['children'] =~ Array {
     $data['persons']['Anna-Greta']['children'][0]
   }

Is what you end up having to do. (Not nice).

This is where the dig function comes in. Using dig the same is done like this:

$first_child = $data.dig('persons', 'Anna-Greta', 'children', 0)

Which automatically handles all the conditional logic. (Yay). If one step happens to
result in an undef value, the operation stops and undef is returned. If this was all we wanted to do, we would be done. But what if we require that the outcome is not undef, or if we wanted a default value as the result if it was undef?

There is already the function assert_type that can assert the result (and optionally return a new value if the assertion fails). If we use that we can write:

$first_child = NotUndef.assert_type(
  $data.dig('persons', 'Anna-Greta', 'children', 0)
)

Which would give us an automated error like “expected a NotUndef value”. While functional
we can do better by customizing the error:

$first_child = NotUndef.assert_type(
  $data.dig(
    'persons', 
    'Anna-Greta', 
    'children', 
    0)) |$expected_type, $actual_type | {
      fail ("Did not find first child of 'Anna-Greta'")
    }

But that is quite tedious to write because the assert_type function is designed to take
two arguments - the expected type (NotUndef in this example), and the actual type of the argument (in this case Undef). But we already knew that would be the only possible outcome, so there is lots of excess code for this simple (and common) case.

This is where the lest function comes in. It takes one argument, and if this argument matches NotUndef, the argument is returned. Otherwise it will call a code block (that takes no arguments), and return what that returns. Thus, this is a specialized variant of assert_type that makes our task easier. Now we we can write:

$first_child = 
  $data.dig('persons', 'Anna-Greta', 'children', 0).lest | | {
      fail("Did not find first child of 'Anna-Greta'")
  }

Much better - it now reads nicely from left to right, and it is clear what is going on.
If we wanted a default value instead of a custom fail, we can do that:

$first_child = 
  $data.dig('persons', 'Anna-Greta', 'children', 0).lest | | {'Cain'}

Now - lets do something more difficult. What if we want to use the value
of the first child of Anna-Greta (that is, ‘me’) to find my aunts and uncles on
my father’s side? That is if we first computed $first_child, we would continue with:

$first_childs_fathers_mother = 
  $data.dig('persons', $first_child, 'father', 'mother')
$first_childs_fathers_mothers_children =
  $data.dig('persons', $first_childs_fathers_mother, 'children')

That works, but we had to use the temporary variables. To be correct we also need to
remove my father (‘Bengt’) from the set of children returned by the last step.

I am not even going to bother writing that out in longhand to handle all the possible ‘sad’ paths. (Left as an exercise if you have run out of regular navel fluff).

Instead, we are going to write out the entire sequence, and now using the function then, which is the opposite of lest. It accepts a single value, and if it matches NotUndef it calls the block with a single argument, and returns what the block returns. If the given value is undef, it simply returns this (to be dealt with by the next step in the chain of calls.

$data.dig('persons', 'Anna-Greta', 'children', 0)
.then |$x| { $data.dig('persons', $x, 'father', 'mother')}
.then |$x| { $data.dig('persons', $x, 'children')}
.then |$x| { $x - 'Bengt' }
.lest | | { fail("Could not find aunts and uncles...") }

We have an obvious flaw here since the name of my father is hard coded.
There is also no handling of the ‘sad’ path of ‘children’ not being an Array as we did
not type the data.

For the final example, lets make this into a generic function that finds the aunts and uncles on the father’s side of any mother’s first child.

We then end up with this function that performs five distinct steps:

function custom_family_search(String $mother) {
  # 1. start by finding the mother's children and pick the first
  $data.dig('persons', $mother, 'children', 0)

  # 2. Get the father of the child (needs to be looked up since
  #    $x here is just the name of the person).
    .then |$x| { $data.dig('persons', $x, 'father') }

  # 3. Look up the siblings of found father, and return those
  #    as well as the father (needed to eliminate father in
  #    the next step. ($x is father from previous step).
    .then |$x| { [ $data.dig(
                      'persons', 
                      $data.dig('persons', $x, 'mother'),
                      'children'),
                    $x
                  ] }

  # 4. Eliminate father from siblings
  # Previous step is never undef since we construct an array,
  # but the first slot in the array may be undef, or something that
  # is not an array! Thus, we don't need the conditional 'then'
  # function, and can instad use the 'with' function.
  # A 'case' expredssion is used to match the 'happy' path where the
  # name of the father is 'subtracted'/removed
  # from the array of his siblings. The 'sad' path produces
  # 'undef' and lets the next step deal with it.
  #
    .with |$x| { case $x {
                 [Array[String], String] : { $x[0] - $x[1] }
                 default                 : { undef }
                 }
               }
   # 5. we fail if we did not get a result
   #
    .lest | | { fail("Could not find aunts and uncles...") }

  # Function returns the value of the last call in the chain
}

notice custom_family_search('Anna-Greta')

And now we can test:

> puppet apply blog.pp
puppet apply blog.pp
Notice: Scope(Class[main]): [Greta, Britta, Helge]

Full Final Example Source.

In Summary:

  • dig - digs into structure with mix of hash keys and array indexes, may return undef
  • then - calls the block on the ‘happy’ path, undef otherwise
  • lest - calls the block on the ‘sad’ path, given value otherwise
  • with - (unconditional), passes on its given value to the block and returns its result
  • assert_type - checks path is ‘happy’ (matches type) and calls block on ‘sad’ path

In case you wonder about the lines of code that start with a period like this:

.then ...

This is simply a continuation from the line above - puppet is generally not whitespace significant (with only a few exceptions). Thus it does not matter where the ‘.’ is placed.
I choose to align the .then steps to make it readable. If you have something short
you can make it a one-liner:

# These are all the same
$x = $facts['myfact'].lest | | { 'default value for myfact'}

$x = $facts['myfact']  .  lest | | { 'default value for myfact'}

$x = $facts['myfact']
     .lest | | { 'default value for myfact'}

$x = $facts['myfact'].
     lest | | { 'default value for myfact'}

Hope this will be useful for you, and that it gives you an additional tool in your Puppet language toolchest.

This was also the first time I used StackEdit to write a blog post. I hope all the formatting of code turns out ok.

Best,

Sunday, February 1, 2015

Puppet 4.0 Data in Modules part II - Writing a Data Provider

In Puppet 4.0.0 there is a new technology agnostic mechanism that makes it possible to provide default values for class parameters in modules and in environments. In the first post about this feature I show how it is used. In this second post I will show how to write and deliver an implementation of a data provider.

The information in this post is only relevant if you are planning to extend puppet with additional types of data providers - you do not need to learn all that is presented here to use the services the new data provider feature provides.

How does it work?

The new data provider feature is built using the Puppet Binder which wires (i.e. binds) the various parts together in a composable way. You do not really need to know all the features of the Puppet Binder to be able to use it as the bindings needed for the data providers are mostly boilerplate and you just have to copy/paste and replace the example names with the names of things in your implementation.

The feature has two different kinds of data providers; one for environments, and one for modules. The steps to implement them are almost the same, so I am going to show both at the same time.

What you need to do:

  • Implement the data provider(s). They have a very simple API - basically just a method named lookup.
  • Register the bindings that makes your data provider implementations available for use.

Implementing the Data Providers

Data providers are implemented as Ruby classes. The two classes (one for environments, and one for modules) have a very simple API - basically they must inherit from the correct base class (as shown in the examples below), and they must implement the method lookup(name, scope, merge).

In this example, I am creating data providers that users will know by the name 'sample'. There will be a provider called 'sample' that can be used for the environment, and one that can be used for modules. This will be made available in a module that I am going to name 'sampledata'.

For use in environments

# <modulepath>sampledata/lib/puppet_x/author/sample_env_data.rb
#
require 'puppet_x'
module PuppetX::Author
  class SampleEnvData < Puppet::Plugins::DataProviders::EnvironmentDataProvider
    def lookup(name, scope, merge)
      # return the value bound to the name/key
    end
  end
end

For use in modules

# <modulepath>sampledata/lib/puppet_x/author/sample_module_data.rb
#
require 'puppet_x'
module PuppetX::Author
  class SampleModuleData < Puppet::Plugins::DataProviders::ModuleDataProvider
    def lookup(name, scope, merge) 
      # return the value bound to the name/key
    end
  end
end

Note:

  • The data provider API guarantees that calls to lookup only occurs for an environment that has
    opted in by setting the environment_data_provider to the key 'sample', and for a module that has opted in with a binding of 'sample' to the 'module_data' (just like we used 'function' in the earlier examples in the previous post on this topic).

  • The PuppetX namespace is available for 3d party Ruby code. When using it, it should be followed by the name of the author (as defined by the Puppet Forge for modules) - i.e. in your code replace Author with your name.

  • When the implementations are loaded by the runtime, the data provider base classes have already been loaded, so there is no need to require 'puppet'.

  • The merge parameter is a string of type Enum[unique, hash, merge] or a hash with the key 'strategy' set to that string with additional keys that control the merge in detail (see the documentation of the lookup function). (In the sample implementation this parameter is ignored since it can only supply one value per key).

There are more things to say how to implement the lookup to make it efficient. More about that later after I have showed how to wire the implementation into puppet.

Registering the Data Provider Implementations

The first thing is to register the bindings that makes it possible for other modules (or an environment) to declare that our new implementation should be used.

The Puppet Binder loads bindings from modules. By default the file <moduleroot>/lib/puppet/bindings/<modulename>/default.rb is loaded (if it exists). In this file, we need to create the bindings we want.

Since we have an implementation for both environment, and modules, the registration looks like this:

# <modulepath>sampledata/lib/puppet/bindings/sampledata/default.rb
#
Puppet::Bindings.newbindings('sampledata::default') do
  bind {
    name         'sample'                             # the name
    in_multibind 'puppet::environment_data_providers' # boilerplate (for env)
    to_instance  'PuppetX::Author::SampleEnvData'     # the classname as a string
  }
  bind {
    name          'sample'                            # the name
    in_multibind  'puppet::module_data_providers'     # boilerplate (for module)
    to_instance   'PuppetX::Author::SampleModuleData' # the classname as a string
  }
end

As before, replace Author with your name, and replace 'sample' with the name you want to give your bindings provider. The to_instance references should be the fully qualified class names of the implementations of the data providers.

The two bindings, registers the respective implementation class with a symbolic name, which allows users to use this name instead of the more complicated class name of the data provider class we have implemented.

As there can be many implementations available and active at the same time, the Puppet Binder's multibind capability is used to bind the implementation for a given "extension point" (e.g 'puppet::environment_data_providers').

Note:

  • The name you give your implementation must be unique among all implementations of the same type so you should really prefix the name with the module name to be safe.

Using the Implementations

As shown in the previous post, using a data provider implementation is simple. The examples in this post adds a provider named 'sample'; so simply change the use of 'function' in the previous post's examples to switch to the providers we just implemented.

The lifecycle of Data

The implementation of lookup probably needs to cache information (e.g. if we were writing an implementation for hiera it could be reading and caching the hiera.yaml file, and various data files).

Caching is somewhat complicated since we need to associate the cached data with something that has the same lifecycle as the data - we do not want to hold on to information that is stale and just occupies memory until Puppet's master process is restarted.

There are two things that it makes sense to associate a cache with:

  • the environment, if the data is static for the entire life of the environment. An environment goes out of scope when it times out (a configurable amount of time).
  • the compiler, if the data is static for the compilation (but varies from request to request for different nodes in the same environment instance). The compiler goes out of scope and the end of each catalog compilation.

It is not suitable to associate the cache with the data provider instance itself (e.g. in a class or instance variable in SampleModuleData).

The absolute best way of doing this is to use an Adapter. There is no reusable implementation of a caching adapter and the implementor of a data provider should design one for the specific purpose of handling its caching needs. This can be as simple as in this example:

class PuppetX::Author::MyCacheAdapter < Puppet::Pops::Adaptable::Adapter
  attr_accessor :cache
end

The provider implementation then associates the adapter with either the environment, or the compiler. the implementation can naturally have as many instance variables as it needs (the one in the example just has a cache variable), and additional methods. (If you want to look at a real implementation, the 'function' data provider built into Puppet 4.0 has a class called Puppet::DataBindings::DataAdapter that serves as a cache as well as performing the calls to the data functions).

The approach of using adapters is much preferred over monkey patching existing code. For more information about adapters - see my blog post on the topic).

It is simple to use the adapter - here are examples for associating one with the environment, and the compiler.

adapter = MyCacheAdapter.adapt(Puppet.lookup(:current_environment))
cached = adapter.cache()

adapter = MyCacheAdapter.adapt(scope.compiler)
cached = adapter.cache()

I am stopping there, since what you need to cache and how will be specific to what you are implementing support for.

General notes about caching data content

Do not implement file watching. Directory environments use a stable state for the given timeout and everything is evicted when the environment times out. Since there can be a very large number of directory environments (users have reported using several hundred, e.g. for a master running various development branches), and directory environments may also be quite volatile. If you are not using the adapter approach to caching, you must ensure that your caching does not leak memory by binding stale data for environments that potentially never will be used again during the running process' life cycle.

Experiment with the Sample in Puppet's code base

There are two test data fixtures in Puppet's code base (used when running spec test) that you can also run from the command line. You can naturally make a copy of them for your own experiments (if you do not want to type in the examples in this blog post from scratch).

The 'function' example

The first tests the function data provider, and can be invoked like this (all on one line):

bundle exec puppet apply
--environmentpath=spec/fixtures/unit/data_providers/environments
--environment=production -e 'include abc'

The fixture has a parameterized classes. One that is not in a module, and one in a module. The module class gets two of its three parameters overridden by environment data.

You should see this printout

Notice: env_test1
Notice: /Stage[main]/Abc::Def/Notify[env_test1]/message: defined 'message' as 'env_test1'
Notice: env_test2
Notice: /Stage[main]/Abc::Def/Notify[env_test2]/message: defined 'message' as 'env_test2'
Notice: module_test3
Notice: /Stage[main]/Abc::Def/Notify[module_test3]/message: defined 'message' as 'module_test3'

The 'sample provider' example

The second example can be run like this (all on one line):

bundle exec puppet apply
--environmentpath=spec/fixtures/unit/data_providers/environments
--environment=sample
spec/fixtures/unit/data_providers/environments/sample/manifests/site.pp

This fixture uses parameterized classes and use an implementation of the sample providers shown in this blog post but with lookup functions that return hard coded values for the classes in the fixture.

You should see this printout:

Notice: env data param_a is 10, env data param_b is 20, 3
Notice: /Stage[main]/Test/Notify[env data param_a is 10, env data param_b is 20, 3]/message: defined 'message' as 'env data param_a is 10, env data param_b is 20, 3'
Notice: module data param_a is 100, module data param_b is 200, env data param_c is 300
Notice: /Stage[main]/Dataprovider::Test/Notify[module data param_a is 100, module data param_b is 200, env data param_c is 300]/message: defined 'message' as 'module data param_a is 100, module data param_b is 200, env data param_c is 300'

Saturday, January 31, 2015

Puppet 4.0 Data in Modules and Environments

In Puppet 4.0.0 there is a new technology-agnostic mechanism for data lookup that makes it possible to provide default values for class parameters in modules and in environments. The mechanism looks first in the "global" data binding mechanism across all environments (i.e. the existing mechanism for data binding, which in practice means hiera, since this is the only available implementation). It then looks for data in the environment, and finally in the module.

The big thing here is that a user of a module does not have to know which implementation the module author has chosen - the module is simply installed (with its dependencies). The user is free to override values using an implementation of their choice (in the environment using the new mechanism, or with the existing data binding / hiera support).

It is expected that there will be implementations for hiera as well available in a module.

In this part 1 about the new data binding feature I will show how it can be used in environments and modules. In the next part I will show how to make new data binding implementations.

How does it work?

Out of the box, the new feature:

  • provides module authors with a way to select which data binding implementation to use in their module without affecting how other modules get their data.

  • provides users configuring an environment to select which data binding implementation to use in an environment (or all environments) - different environments can use different implementations, and the environment does not have to use the same implementation as the modules.

  • contains a data binding implementation named 'function' which calls a puppet function that returns a hash of data. The module author can select this mechanism and simply implement the function. A user can also configure an environment to use a function to provide the data - the function is then added to the environment.

  • provides module author with a way to package and share a data binding implementation in a module. It can be delivered in the same module as regular content, or in a separate module just containing the data binding implementation.

Using a function to deliver data in an environment

This is the easiest, so I am starting with that. Two things are needed:

  • Configuring the environment to state that a function delivers data.
  • Writing the function

configuring the environment

The binding provider to use for an environment can be selected via the environment specific setting environment_data_provider. The value is the name of the data provider implementation to use. In our example this is 'function'. If not set in an environment specific environment.conf, the environment inherits the global setting - which is handy if all your environments work the same way.

writing the function

The function must be written using the 4x function API and placed in a file called lib/puppet/functions/environment/data.rb under the root directory of the environment.

# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
  def data()
    # return a hash with key to value mappings 
    { 'abc::param_a' => 'default value for param a in class abc',
      'abc::param_b' => 'default value for param b in class abc',
    }
  end
end

Later in the 4x series of Puppet, it will be possible to also write such functions in the puppet language which makes authoring more accessible.

Note that the name of the function is always environment::data irrespective of what the actual name of the environment is. This because, it would not be good if the name of the function had to change as you test a new environment named 'dev' and later merged it into 'production'.

Using a function to deliver data in a module

The steps to deliver data with a function for a module is different because there are no individual settings for a module. Here are the steps:

  • Creating a binding using the Puppet Binder to declare that the module should use the 'function' data provider for this module.
  • Writing the Function

Note that in the future, the data provider name may be made part of the module's metadata. This is however not the case in the Puppet 4.0.0 release.

writing the binding

The binding is very simple as it is all boilerplate except for the name of the module and the name of the data provider implementation - 'mymodule' and 'function' in the example below. The name of the file is lib/puppet/bindings/mymodule/default.rb where the mymodule part needs to reflect the name of the module it is placed in. (The file is always called 'default.rb' since it contains the default puppet bindings for this module).

# <moduleroot>/lib/puppet/bindings/mymodule/default.rb
#
Puppet::Bindings.newbindings('mymodule::default') do
  bind {
    name         'mymodule'            # name of the module this is placed in
    to           'function'            # name of the data provider
    in_multibind 'puppet::module_data' # boiler-plate
  }
end

writing the function

This is exactly the same as for the environment, but the function is named mymodule::data where mymodule is the name of the module this function provides data for. The file name is lib/puppet/functions/mymodule/data.rb

# <moduleroot>/lib/puppet/functions/mymodule/data.rb
#
Puppet::Functions.create_function(:'mymodule::data') do
  def data()
    # Return a hash with parameter name to value mapping
    { 'mymodule::abc::param_a' => 'default value for param a in class mymodule::abc',
      'mymodule::abc::param_b' => 'default value for param b in class mymodule::abc',
    }
  end
end

Overriding a parameter in the environment

As you may have figured out already, it is easy to override the module's data in the environment. As an example we may want to provide a different value for mymodule::abc::param_b at the environment level. This is how that would look:

# <environment-root>/lib/puppet/functions/environment/data.rb
#
Puppet::Functions.create_function(:'environment::data') do
  def data() 
    { # ... other keys and values
      'mymodule::abc::param_b' => 'env specific value for param b in class mymodule::abc',
    }
  end
end

Getting the data

To get the data, there is absolutely nothing you need to do in your manifests. Just as before, if a class parameter does not have a value, it will be looked up as explained in this blog post. Finally, if there was no value to lookup the default parameter value given in the manifest is used.

Using the examples above - if you have this in your init.pp for the mymodule module:

class mymodule::abc($param_a, $param_b) {
  notice $param_a, $param_b
}

the two parameters $param_a and $param_b will be given their values from the hashes returned by the data functions, looking up mymodle::abc::param_a, and mymodule::abc::param_b.

Note that there is no need to use the "params pattern" now in common use in modules for Puppet 3x!

More about Functions

Since the new 'function' data provider is based on the general concept of calling functions and you can call other functions from them, you have a very powerful mechanism to help you organize data and to do advanced composition.

The data function is called once during a compilation for the purpose of producing a Hash with qualified name strings to data values. The function body can call other functions, use expressions, transformations, composition etc. When the data binding kicks in, it will call the function on the first request to get a parameter in the compilation, it will then cache the returned hash and reuse it for lookup of additional parameters (this in contrast to calling the function for each and every parameter which would be much slower).

Note that the data function can be called like any other function!. This means that a module or environment can use another module's data function, transform it etc. before using its data.

Naturally, since we are dealing with functions it is easy to divide the composition of data into multiple functions, and then hierarchically compose them. Say that we want to divide the data up into two parts, one for osfamily, and one for common and we then want to combine them. We can now do a simple function composition and merge the result.

In the examples, the functions are written using the puppet language (even though they are not available in the 4.0.0 release). At the moment, it is left as an exercise to translate them into Ruby. What I want to show here is the power of combining data with functions without cluttering the examples with what you need to do in Ruby to get variables in scope, call other functions etc.

Data Composition with Puppet functions

When we add support for functions in the Puppet Language data composition can look like this:

function mymodule::data {
  mymodule::common() + mymodule::osfamily()
}

function mymodule::osfamily() {
  case $osfamily {
    'Debian' : {
       { mymodule::abc::param_a => 'the debian value for a' }
    }
    'Darwin': {
      { mymodule::abc::param_a => 'the osx value for a' }
    default: {
      { }  # empty hash
  }
}

function mymodule::common() {
  { mymodule::abc::param_a => 'the default for param a',
    mymodule::abc::param_b => 'the default for param b',
  }
}

Naturally, the functions called from the data function can take parameters. The data() function itself however does not take any parameters.

Example - Module with multiple use cases

A module author wants to provide a set of default values for a base use case of the module, but also wants to offer defaults for other use cases. Clearly, there can only be one set of defaults applied at any given time, and the data() function in a module is for that module only, so these defaults must be provided at a higher level i.e. in the environment (where it is known how the module is getting used). If the environment is also using the function data provider, it is very simple to achieve this:

function environment::data() {
  # merge usecase_x from module with the overrides
  mymodule::usecase_x() + {
    mymodule::abc::param_b => 'default from environment for param_b'
  }
}

This illustrates that mymodule has a special data function named mymodule::usecase_x() that provides an alternate set of default values for classes inside the mymodule, these are then overridden with a hash of specific overrides wanted in this environment.

Example - Hierarchical keys

If you find it tedious to retype mymodule::classname::foo, mymodule::classname::bar, etc. etc. you can instead construct the keys programmatically. Since the "data functions" are general functions, variables and interpolation can be used - e.g:

function mymodule::data() {
  $m = 'mymodule::abc'
  { "${m}::param_a" => 'the value', 
    ...
  }
}

Or why not call a function that reorganizes a hierarchical hash; say that we have param_a in classes a::b::x, a::b::y, and a::b::z, we could then do something like this:

function mymodule::data() {
  $hierarchical = { 
    a => {
      b => {
        x => { param_a => 'default for a::b::x::param_a' },
        y => { param_a => 'default for a::b::y::param_a' },
        z => { param_a => 'default for a::b::z::param_a' },
  }}}
  # Calling a function that expands the hash (left as an exercise)
  expand_hierarchical_keys($hierarchical)
}

Trying out this new featue

When this is written, the new data binding feature is available in the nightlies for Puppet 4.0.0, or you can run it from source using Puppet's master branch. (The new feature will not be available for 3x with future parser). If you are reading this after Puppet 4.0.0 has been released, just get the release.

Summary

The new data provider mechanism is a technology agnostic way of defining default data for modules and environments without dictating that a particular technology is used by the users of a module.

The new mechanism comes with a built in implementation based on functions that provides a simple yet powerful way of delivering, using and composing data. Functions in Ruby provide a simple way to extend the functionality without having to write a complete data provider.

The function mechanism, while relatively easy to write in Ruby for delivering data since they consist mostly of boilerplate code will become much more powerful and accessible when functions can be written in the Puppet Language.

In the next post about the new data binding feature I will show how to write a new implementation of a data provider.

Sunday, January 25, 2015

The Puppet 4x Function API - part 2

In the first post about the 4x Function API I showed the fundamentals of the new API. In this post I am going to show how you can write more advanced functions that take a code block / lambda as an argument and how you can call this block from Ruby. This can be used to create your own iterative functions or functions that make it possible to write puppet code in a more function oriented style.

Accepting a Code Block / Lambda

A 4x function can accept a code block / lambda. You can make it required by calling required_block_parameter in the definition of the dispatcher, or optional by calling optional_block_parameter.

Here is an example of a simple function called then, that takes one argument and a block and calls the block with argument unless the argument is nil.

Puppet::Functions.create_function(:then) do
  dispatch :then do
    param 'Any', :x
    required_block_param
  end

  def then(x)
    x.nil? ? nil : yield(x)
  end
end  

Note that: Puppet blocks are passed the same way as Ruby blocks are and we can simply yield to the given block. Just as with Ruby blocks, the block can be captured in a parameter by having a &block parameter last, the block_given? method can be used, etc.

The then function is useful when looking up a nested value in a hash as it removed the need to check intermediate results for undef. Say, there may or may not be a value in a $hash such that $hash[a][b][c] and we just want that value, or undef if either a, b, or c are not found instead of an error if we say try to lookup c in undef (if b did not exist).

Instead we use the then function we just defined - like this:

$result = $hash
 .then |$x| { $x[a] }
 .then |$x| { $x[b] }
 .then |$x| { $x[c] }

And for completeness, if you were to write that without the function, you end up with something like this:

$result =
if $hash[a] != undef and $hash[a]|b] != undef and $hash[a][b][c] != undef {
  $hash[a][b][c]
}

...or worse if you start using variables for the intermediate steps

The block's number of parameters and their types

If nothing is specified about the number of parameters and types expected in the accepted block, the user can give the function any block. This is what you get by just calling required_block_parameter, or optional_block_parameter. You still get type checking, but this takes place when the block is called.

If you want to involve the number of parameters and their types in the dispatching - i.e. selecting which ruby method to call based on what the user defined in the block you can do so by stating the Callable type of the block. (The Callable type was added in Puppet 3.7, and is described in this blog post). In brief - Callable[2,2], means something that can be called with exactly two arguments of any type).

Here is the dispatcher part of the each function (from Puppet source code):

Puppet::Functions.create_function(:each) do
  dispatch :foreach_Hash_2 do
    param 'Hash[Any, Any]', :hash
    required_block_param 'Callable[2,2]', :block
  end

  dispatch :foreach_Hash_1 do
    param 'Hash[Any, Any]', :hash
    required_block_param 'Callable[1,1]', :block
  end

  dispatch :foreach_Enumerable_2 do
    param 'Any', :enumerable
    required_block_param 'Callable[2,2]', :block
  end

  dispatch :foreach_Enumerable_1 do
    param 'Any', :enumerable
    required_block_param 'Callable[1,1]', :block
  end

  def foreach_Hash_1(hash)
    enumerator = hash.each_pair
    hash.size.times do
      yield(enumerator.next)
    end
    # produces the receiver
    hash
  end

And to be complete, here are the methods the dispatchers calls - the actual implementation of the each function. As you can see, each variation on how this function can be called; with an Array, a Hash, a String, and one or two arguments are now handled in a small and precise method. (It is really just Hash that needs special treatment, all others are handled as enumerables (i.e. what ever the Puppet Type System has defined as something that can be enumerated / iterated over in the Puppet Language).

  def foreach_Hash_2(hash)
    enumerator = hash.each_pair
    hash.size.times do
      yield(*enumerator.next)
    end
    # produces the receiver
    hash
  end

  def foreach_Enumerable_1(enumerable)
    enum = asserted_enumerable(enumerable)
      begin
        loop { yield(enum.next) }
      rescue StopIteration
      end
    # produces the receiver
    enumerable
  end

  def foreach_Enumerable_2(enumerable)
    enum = asserted_enumerable(enumerable)
    index = 0
    begin
      loop do
        yield(index, enum.next)
        index += 1
      end
    rescue StopIteration
    end
    # produces the receiver
    enumerable
  end

  def asserted_enumerable(obj)
    unless enum = Puppet::Pops::Types::Enumeration.enumerator(obj)
      raise ArgumentError, ("#{self.class.name}(): wrong argument type (#{obj.class}; must be something enumerable.")
    end
    enum
  end
end

What about Dependent Types and Type Parameters?

If you read the above example carefully, or if you already are used to working with a rich type system you may wonder about type parameters and if it is possible to use dependent type.

The short answer is no, the puppet type system, while capable of describing rich types we have not added the ability to use type parameters. They would be really useful - take the hash example, where we instead of:

    param 'Hash[Any, Any]', :hash
    required_block_param 'Callable[2,2]', :block

could specify that the block must accept the key and value type of the given Hash - e.g. something like:

    param 'Hash[K Any, V Any]', :hash
    required_block_param 'Callable[K,V]', :block

This however requires quite a lot of complexity both in the type system itself and what users are exposed to. (The syntax has to be something more elaborate than what is shown above since the references to K and V must naturally find the declared K and V somehow - in the sample that is solved by magic :-).

If we do provide a mechanism to reference the type parameters of the actual types given in a call, we could fully support dependent types. As an example, this would enable declaring that a function takes two arrays of equal length.

How about Return Type?

Return type is also something we decided to leave out for the time being. In hindsight it should have been added from the start as this enables both advanced type inference and type checking to be performed. For this reason we may add this into the dispatch API early in the 4x series. The most difficult part will be figuring out the syntax for the Callable type since it also needs to be able to describe the return type of the callable.