Saturday, May 7, 2016

Converting and Formatting Data Like a Pro With Puppet 4.5.0

Before Puppet 4

Before Puppet 4.0.0 there was basically only the data types; String, Boolean, Array, Hash, and Undef. Most notably missing were numeric types (Numeric, Integer, and Float). In Puppet 4.0.0 those and many other types were defined and implemented in a proper type system. This was all good, but a few practical problems were not solved; namely data conversion. In Puppet 4.5.0 there is a new feature that will greatly help with this task. But first lets look at the state of what is available in prior versions.

Converting String to Number - the current way

The most concrete example is having to convert a String to Numeric. While not always required since Puppet performs arithmetic on Strings that looks like numbers, that does not work for all operations.

The scanf function was added to handle general conversion. Thus if $str_nbr is a numeric value in string form you can convert it like this:

$nbr = scanf("%d", $str_nbr)[0]

That is quite a lot of excess violence to get the value because scanf is a general purpose function that can do lots of things:

  • get many values at once (hence the need to pick the first value from the result)
  • values can be embedded in text that is ignored
  • there are many formats to choose from but no defaults
  • if the conversion failed, the result is simply an empty array, so extra code is needed to validate the result and raise an error.

There is a much easier way to do the same, and this is now the idiomatic way of converting a numeric string:

$nbr = $str_nbr + 0   # makes it an integer
$nbr = $str_nbr + 0.0 # makes it a float

This works because Puppet automatically transforms string that looks like numeric information and because the + operator cannot be used to concatenate strings. When doing this, and the string is not numeric an error with a reasonable error message is displayed.

  • what if the string is octal but does not start with a 0 ?
  • what if the string is in hex but does not start with 0x, and the actual string does not have any of the letters A-F in it?
  • what if the string is in binary format?

Converting String to Boolean - the current way

Booleans in string form are also a bit tricky to convert. Since Puppet 4.0.0 the idiomatic way would be:

$bool = case $str_bool {
  "true" : { true }
  "false": { false }
  Boolean : { $str_bool }
  default : { fail("'$str_bool' cannot be converted to Boolean") }
}

Again, a lot more typing than what is necessary. In the above example, you may also want other values to be considered false/true like the empty string, an empty array, the literal value undef, etc. - they are easily added in the case expression. (You can write the above in several different ways, instead of capturing all booleans in a case option, the literal values true and false could be listed as alternative case options in the two entries above, that is using "true", true : { true }. The result would be the same.

Note that the example works because string matching is case independent, so the above also covers ‘True” / “False”, “tRuE”/”falSE” etc. If you do not want that, it is tricker and we would need to use regular expressions to match the strings.

If you have lots of boolean conversions going on, you can package it up as a reusable function:

function mymodule::to_boolean($str_bool) {
  # the case expr from previous example goes here
}
# and then convert like this:
$bool = $str_bool.mymodule::to_boolean()

While this works, it leads down a path to a flea-market of functions for conversion to and from, this or that (just look at the stdlib module which has quite a large number of such functions).

‘New’ is the New ‘New Way’

In Puppet 4.5.0 there is a function called new. It unsurprisingly creates a new instance of a type, which means you can write something like:

$num = Integer.new($str_num)

Added in Puppet 4.5.0 is also the ability to directly “call a type” - and this means calling the new() function on this type. We can thus shorten the above example to this:

$num = Integer($str_num)

This works for most types, but some are ambiguous like Variant types, or Undef which you really do not have to convert to, or Scalar which is also ambiguous.

The Boolean conversion from before can now be written like this:

$bool = Boolean($str_bool)

More Coolness with New

Each type defines what the arguments to its new operation are. Typically they accept (in addition to the value to convert), a format specification that is compatible with what is used in functions like sprintf, and scanf - but the set of formats have been expanded to suit the puppet language. Some conversions have other specific arguments. The entire set of options and what they mean can be found in the documentation of the new() function 1 - here are some examples:

$a_number = Integer("0xFF", 16)  # results in 255 (base 16)
$a_number = Integer("FF", 16)    # results in 255 (base 16)
$a_number = Numeric("010")       # results in 8
$a_number = Numeric("010", 10)   # results in 10 (base 10)
$a_number = Integer("true")      # results in 1
$a_number = Numeric("true")      # results in 1
$a_number = Numeric("0xFF")      # results in 255
$a_number = Numeric("010")       # results in 8
$a_number = Numeric("3.14")      # results in 3.14 (a float)

$a_bool = Boolean("yes")         # results in true
$a_bool = Boolean(1)             # results in true
$a_bool = Boolean(0)             # results in false

As you can see the conversions are flexible - you get a number 0 back for a boolean false. This is by design - the conversion tries it best to convert what it was given to the type you wanted.

Conversion performs assertion

When a conversion is performed it always ends with an assertion that the created value matches the type as in this example:

$port = Integer[1024]($some_string)

To enable easier handling of optional/faulty values, if the type is Optional[T], the assertion that is made accepts an undef result and the conversion will not error on faulty input and instead yield an undef result.

This will result in an error if the result is not an integer >= 1024.

Conversion with Array and Hash

It is possible to convert between arrays and hashes. Here it is also possible to use
Struct and Tuple types since those perform additional type assertion of the result.

$an_array = Array({a => 10, b => 20}) # results in [[a, 10],[b, 20]]
$a_hash = Hash([1,2,3,4])             # results in {1=>2, 3=>4}
$a_hash = Hash([[1,2],[3,4]])         # results in {1=>2, 3=>4}

The Array conversion also have a short form conversion for “make it an array if it is not already an array” by adding a boolean true argument:

$an_array = Array(1, true)    # results in [1]
$an_array = Array([1], true)  # results in [1]
$an_array = Array(1)          # error, cannot convert
$an_array = Array({1 => 2}, true) # [{1 => 2}]
$an_array = Array({1 => 2}}   # [[1, 2]]

Conversion to String

Conversion to String has the most features. There are many different formats to choose from per type, and it supports mapping type to format for nested structures. That is, different formats can be used for values in arrays and hashes, if arrays/hashes are short or long etc. Several blog posts would be needed to cover all of the functionality, so here are some examples:

String(undef)        # produces "" (empty string)
String(undef, '%d')  # produces "NaN" (we asked for a number)

$data = [1, 2, undef]
String($data)        # produces '[1, 2, undef]'

# A format map defines type to format mappings, for
# array and hash, there is a specific map for contents
# that is applied recursively.
# (See documentation for full information).
#
$formats = { Array => {
  format => '%(a',
  string_formats => {
    Integer => '%#x',
    Undef => '%d'
}}}

String($data, $formats) # produces '(0x1, 0x2, NaN)'

# Formatting with indentation
String([1, [2, 3], 4], "%#a")
# produces:
# [1,
#  [2, 3],
#  4]

Conversion is easy to use in interpolation

Typical use of formatting is when interpolating values into strings. The normal interpolation uses a default string conversion mechanism and this does not always give what you want.
Using the new() function is especially convenient when flattening, or unrolling arrays into strings as the String conversion provides full control over start/end delimiters and separators.

$things = [
  'Cream colored ponies',
  'crisp apple strudels'
  'door bells',
  'sleigh bells',
  'schnitzel with noodles'
]
notice "${String($things,'% a')}. These are a few of my favourite things."

would notice

"Cream colored ponies", "crisp apple strudels", "door bells", "sleigh bells", "schnitzel with noodles". These are a few of my favourite things.

Not exactly what we wanted. We did get an array join with separator ", " by default, the format "% a" removed the start and end delimiters from the array, but we got quotes around the favourite items. Also to make this read like the Mary Poppins song, we like to insert the word “and”. So, here is the next version where we define the format to use:

$formats = { Array => {
  format         => '% a',
  separator      => ', and ',
  string_formats => {
    # %s is unquoted string
    String => '%s',  
}}}
notice "${String($things, $formats)}. These are a few of my favourite things."

would notice:

Cream colored ponies, and crisp apple strudels, and door bells, and sleigh bells, and schnitzel with noodles. These are a few of my favourite things.

And just for the fun of it - lets turn that into a function.

function silly::mary_poppinsify(String *$str {
  $formats = {
    Array => {
      format         => '% a',
      separator      => ', and ',
      string_formats => {
        String => '%s',
  }}}
  "${String($things, $formats)}. These are a few of my favourite things."
}

So, finally, with a personal touch:

notice silly::mary_poppinsify(
  "Keys on pianos",
  "food in a bento",
  "progressive metal",
  "solos by Argento", 
)

(Printout left as an exercise).

Read more about type conversion in the specifications repository. Where each type
is documented, for instance String.new. The other types are in the same document.

When Puppet 4.5.0 is released this information will also show up in the regular documentation for function new().

Notes on a couple of advanced things

The String format map is processed in such a way that the formats given when calling new() are merged with the default formats. This merge takes type specificity into account such that types that are more specific have higher precedence. For example if the value to format matches two formats, one for type T, and another for type T2, if T2 < T then the format for T2 will be used, for example {Any => %p, Numeric => '%#d'} which means all values in programmatic form (strings are quoted, arrays and hashes have puppet language style
delimiters, etc.), and all numeric variables in quoted numeric form (that is "10" instead of the default %p which would have resulted in just 10 (without quotes).

Summing Up

The new() function supports creating new objects / values which can be used for data type transformation / casting and formatting. As you probably noticed, simple and common things are easily achieved while more complex things are possible. Conversions have become far more important in the Puppet Language now when there is EPP (templates in the puppet language). where the result is often some kind of configuration file with its own syntax and picky rules - so the details do matter.

The idea behind the more complex formats, and alternatives is to provide a rock bottom implementation that can be used to implement custom functions in the Puppet Language that can be reused in manifests as well as in templates.

There is probably a few common conversion tasks that occur frequently enough to warrant a format flag of their own that I missed to include in the first implementation. When writing this blog post for instance, it would have been nice if there was a format for “array with all things in it in %s format and no delimiters”; but then I would not have been able to show how that is done in long format. File tickets with wishes, or make Pull Requests with code as they are always welcome.

Hope you find this supercalifragilisticexpialidociously useful.


  1. Since 4.5.0 is not yet officially released, you can read the documentation in the source for new.rb, or in the specifications per type (link to String.new).

Thursday, May 5, 2016

Digging out data in style with puppet 4.5.0

In Puppet 4.5.0 there are a couple of new functions dig, then and lest that together with the existing assert_type and with functions makes it easy to do a number of tasks that earlier required conditional logic and temporary variables.

You typically run into a problem in programming languages in general when you are given a data structure consisting of hashes/arrays (or other objects), and you need to “dig out” a particular value, but you do not know if the path you want from the root of the structure actually exists.

Say you are given a hash like this:

$data = {
  persons => {
    'Henrik' => {
      mother => 'Anna-Greta',
      father => 'Bengt',
    },
    'Anna-Greta' => {
          mother => 'Margareta',
          father => 'Harald',
          children => ['Henrik', 'Annika']
    },
    'Bengt' => {
      mother => 'Maja',
      father => 'Ivar'
    },
    'Maja' => {
      children => ['Bengt', 'Greta', 'Britta', 'Helge']
    },
  }
}

Now, you would like to access the first child of ‘Anna-Greta’ (in case you wonder this is part of my family tree). This is typically done like this in Puppet:

$first_child = $data['persons']['Anna-Greta']['children'][0]

Which will work just fine (and set $first_child to 'Henrik') given the $data above. But what if there was no ‘Anna-Greta’, or no ‘children’ keys? We would get an undef result, and the next access would fail with an error.

To ward of the evil undef you would have to break up the logic and test at every step. For example, something like this:

$first_child = 
if $data['persons']
   and $data['persons']['Anna-Greta']
   and $data['persons']['Anna-Greta']['children'] =~ Array {
     $data['persons']['Anna-Greta']['children'][0]
   }

Is what you end up having to do. (Not nice).

This is where the dig function comes in. Using dig the same is done like this:

$first_child = $data.dig('persons', 'Anna-Greta', 'children', 0)

Which automatically handles all the conditional logic. (Yay). If one step happens to
result in an undef value, the operation stops and undef is returned. If this was all we wanted to do, we would be done. But what if we require that the outcome is not undef, or if we wanted a default value as the result if it was undef?

There is already the function assert_type that can assert the result (and optionally return a new value if the assertion fails). If we use that we can write:

$first_child = NotUndef.assert_type(
  $data.dig('persons', 'Anna-Greta', 'children', 0)
)

Which would give us an automated error like “expected a NotUndef value”. While functional
we can do better by customizing the error:

$first_child = NotUndef.assert_type(
  $data.dig(
    'persons', 
    'Anna-Greta', 
    'children', 
    0)) |$expected_type, $actual_type | {
      fail ("Did not find first child of 'Anna-Greta'")
    }

But that is quite tedious to write because the assert_type function is designed to take
two arguments - the expected type (NotUndef in this example), and the actual type of the argument (in this case Undef). But we already knew that would be the only possible outcome, so there is lots of excess code for this simple (and common) case.

This is where the lest function comes in. It takes one argument, and if this argument matches NotUndef, the argument is returned. Otherwise it will call a code block (that takes no arguments), and return what that returns. Thus, this is a specialized variant of assert_type that makes our task easier. Now we we can write:

$first_child = 
  $data.dig('persons', 'Anna-Greta', 'children', 0).lest | | {
      fail("Did not find first child of 'Anna-Greta'")
  }

Much better - it now reads nicely from left to right, and it is clear what is going on.
If we wanted a default value instead of a custom fail, we can do that:

$first_child = 
  $data.dig('persons', 'Anna-Greta', 'children', 0).lest | | {'Cain'}

Now - lets do something more difficult. What if we want to use the value
of the first child of Anna-Greta (that is, ‘me’) to find my aunts and uncles on
my father’s side? That is if we first computed $first_child, we would continue with:


$first_childs_father = $data.dig('persons', $first_child, 'father')
$first_childs_fathers_mother = $data.dig('persons', $first_childs_father, 'mother')
$first_childs_fathers_mothers_children =
  $data.dig('persons', $first_childs_fathers_mother, 'children')

That works, but we had to use the temporary variables. To be correct we also need to
remove my father (‘Bengt’) from the set of children returned by the last step.

I am not even going to bother writing that out in longhand to handle all the possible ‘sad’ paths. (Left as an exercise if you have run out of regular navel fluff).

Instead, we are going to write out the entire sequence, and now using the function then, which is the opposite of lest. It accepts a single value, and if it matches NotUndef it calls the block with a single argument, and returns what the block returns. If the given value is undef, it simply returns this (to be dealt with by the next step in the chain of calls.

$data.dig('persons', 'Anna-Greta', 'children', 0)
.then |$x| { $data.dig('persons', $x, 'father')}
.then |$x| { $data.dig('persons', $x, 'mother')}
.then |$x| { $data.dig('persons', $x, 'children')}
.then |$x| { $x - 'Bengt' }
.lest | | { fail("Could not find aunts and uncles...") }

We have an obvious flaw here since the name of my father is hard coded.
There is also no handling of the ‘sad’ path of ‘children’ not being an Array as we did
not type the data.

For the final example, lets make this into a generic function that finds the aunts and uncles on the father’s side of any mother’s first child.

We then end up with this function that performs five distinct steps:

function custom_family_search(String $mother) {
  # 1. start by finding the mother's children and pick the first
  $data.dig('persons', $mother, 'children', 0)

  # 2. Get the father of the child (needs to be looked up since
  #    $x here is just the name of the person).
    .then |$x| { $data.dig('persons', $x, 'father') }

  # 3. Look up the siblings of found father, and return those
  #    as well as the father (needed to eliminate father in
  #    the next step. ($x is father from previous step).
    .then |$x| { [ $data.dig(
                      'persons', 
                      $data.dig('persons', $x, 'mother'),
                      'children'),
                    $x
                  ] }

  # 4. Eliminate father from siblings
  # Previous step is never undef since we construct an array,
  # but the first slot in the array may be undef, or something that
  # is not an array! Thus, we don't need the conditional 'then'
  # function, and can instad use the 'with' function.
  # A 'case' expredssion is used to match the 'happy' path where the
  # name of the father is 'subtracted'/removed
  # from the array of his siblings. The 'sad' path produces
  # 'undef' and lets the next step deal with it.
  #
    .with |$x| { case $x {
                 [Array[String], String] : { $x[0] - $x[1] }
                 default                 : { undef }
                 }
               }
   # 5. we fail if we did not get a result
   #
    .lest | | { fail("Could not find aunts and uncles...") }

  # Function returns the value of the last call in the chain
}

notice custom_family_search('Anna-Greta')

And now we can test:

> puppet apply blog.pp
puppet apply blog.pp
Notice: Scope(Class[main]): [Greta, Britta, Helge]

Full Final Example Source.

In Summary:

  • dig - digs into structure with mix of hash keys and array indexes, may return undef
  • then - calls the block on the ‘happy’ path, undef otherwise
  • lest - calls the block on the ‘sad’ path, given value otherwise
  • with - (unconditional), passes on its given value to the block and returns its result
  • assert_type - checks path is ‘happy’ (matches type) and calls block on ‘sad’ path

In case you wonder about the lines of code that start with a period like this:

.then ...

This is simply a continuation from the line above - puppet is generally not whitespace significant (with only a few exceptions). Thus it does not matter where the ‘.’ is placed.
I choose to align the .then steps to make it readable. If you have something short
you can make it a one-liner:

# These are all the same
$x = $facts['myfact'].lest | | { 'default value for myfact'}

$x = $facts['myfact']  .  lest | | { 'default value for myfact'}

$x = $facts['myfact']
     .lest | | { 'default value for myfact'}

$x = $facts['myfact'].
     lest | | { 'default value for myfact'}

Hope this will be useful for you, and that it gives you an additional tool in your Puppet language toolchest.

This was also the first time I used StackEdit to write a blog post. I hope all the formatting of code turns out ok.

Best,