Saturday, May 7, 2016

Converting and Formatting Data Like a Pro With Puppet 4.5.0

Before Puppet 4

Before Puppet 4.0.0 there was basically only the data types; String, Boolean, Array, Hash, and Undef. Most notably missing were numeric types (Numeric, Integer, and Float). In Puppet 4.0.0 those and many other types were defined and implemented in a proper type system. This was all good, but a few practical problems were not solved; namely data conversion. In Puppet 4.5.0 there is a new feature that will greatly help with this task. But first lets look at the state of what is available in prior versions.

Converting String to Number - the current way

The most concrete example is having to convert a String to Numeric. While not always required since Puppet performs arithmetic on Strings that looks like numbers, that does not work for all operations.

The scanf function was added to handle general conversion. Thus if $str_nbr is a numeric value in string form you can convert it like this:

$nbr = scanf("%d", $str_nbr)[0]

That is quite a lot of excess violence to get the value because scanf is a general purpose function that can do lots of things:

  • get many values at once (hence the need to pick the first value from the result)
  • values can be embedded in text that is ignored
  • there are many formats to choose from but no defaults
  • if the conversion failed, the result is simply an empty array, so extra code is needed to validate the result and raise an error.

There is a much easier way to do the same, and this is now the idiomatic way of converting a numeric string:

$nbr = $str_nbr + 0   # makes it an integer
$nbr = $str_nbr + 0.0 # makes it a float

This works because Puppet automatically transforms string that looks like numeric information and because the + operator cannot be used to concatenate strings. When doing this, and the string is not numeric an error with a reasonable error message is displayed.

  • what if the string is octal but does not start with a 0 ?
  • what if the string is in hex but does not start with 0x, and the actual string does not have any of the letters A-F in it?
  • what if the string is in binary format?

Converting String to Boolean - the current way

Booleans in string form are also a bit tricky to convert. Since Puppet 4.0.0 the idiomatic way would be:

$bool = case $str_bool {
  "true" : { true }
  "false": { false }
  Boolean : { $str_bool }
  default : { fail("'$str_bool' cannot be converted to Boolean") }
}

Again, a lot more typing than what is necessary. In the above example, you may also want other values to be considered false/true like the empty string, an empty array, the literal value undef, etc. - they are easily added in the case expression. (You can write the above in several different ways, instead of capturing all booleans in a case option, the literal values true and false could be listed as alternative case options in the two entries above, that is using "true", true : { true }. The result would be the same.

Note that the example works because string matching is case independent, so the above also covers ‘True” / “False”, “tRuE”/”falSE” etc. If you do not want that, it is tricker and we would need to use regular expressions to match the strings.

If you have lots of boolean conversions going on, you can package it up as a reusable function:

function mymodule::to_boolean($str_bool) {
  # the case expr from previous example goes here
}
# and then convert like this:
$bool = $str_bool.mymodule::to_boolean()

While this works, it leads down a path to a flea-market of functions for conversion to and from, this or that (just look at the stdlib module which has quite a large number of such functions).

‘New’ is the New ‘New Way’

In Puppet 4.5.0 there is a function called new. It unsurprisingly creates a new instance of a type, which means you can write something like:

$num = Integer.new($str_num)

Added in Puppet 4.5.0 is also the ability to directly “call a type” - and this means calling the new() function on this type. We can thus shorten the above example to this:

$num = Integer($str_num)

This works for most types, but some are ambiguous like Variant types, or Undef which you really do not have to convert to, or Scalar which is also ambiguous.

The Boolean conversion from before can now be written like this:

$bool = Boolean($str_bool)

More Coolness with New

Each type defines what the arguments to its new operation are. Typically they accept (in addition to the value to convert), a format specification that is compatible with what is used in functions like sprintf, and scanf - but the set of formats have been expanded to suit the puppet language. Some conversions have other specific arguments. The entire set of options and what they mean can be found in the documentation of the new() function 1 - here are some examples:

$a_number = Integer("0xFF", 16)  # results in 255 (base 16)
$a_number = Integer("FF", 16)    # results in 255 (base 16)
$a_number = Numeric("010")       # results in 8
$a_number = Numeric("010", 10)   # results in 10 (base 10)
$a_number = Integer("true")      # results in 1
$a_number = Numeric("true")      # results in 1
$a_number = Numeric("0xFF")      # results in 255
$a_number = Numeric("010")       # results in 8
$a_number = Numeric("3.14")      # results in 3.14 (a float)

$a_bool = Boolean("yes")         # results in true
$a_bool = Boolean(1)             # results in true
$a_bool = Boolean(0)             # results in false

As you can see the conversions are flexible - you get a number 0 back for a boolean false. This is by design - the conversion tries it best to convert what it was given to the type you wanted.

Conversion performs assertion

When a conversion is performed it always ends with an assertion that the created value matches the type as in this example:

$port = Integer[1024]($some_string)

To enable easier handling of optional/faulty values, if the type is Optional[T], the assertion that is made accepts an undef result and the conversion will not error on faulty input and instead yield an undef result.

This will result in an error if the result is not an integer >= 1024.

Conversion with Array and Hash

It is possible to convert between arrays and hashes. Here it is also possible to use
Struct and Tuple types since those perform additional type assertion of the result.

$an_array = Array({a => 10, b => 20}) # results in [[a, 10],[b, 20]]
$a_hash = Hash([1,2,3,4])             # results in {1=>2, 3=>4}
$a_hash = Hash([[1,2],[3,4]])         # results in {1=>2, 3=>4}

The Array conversion also have a short form conversion for “make it an array if it is not already an array” by adding a boolean true argument:

$an_array = Array(1, true)    # results in [1]
$an_array = Array([1], true)  # results in [1]
$an_array = Array(1)          # error, cannot convert
$an_array = Array({1 => 2}, true) # [{1 => 2}]
$an_array = Array({1 => 2}}   # [[1, 2]]

Conversion to String

Conversion to String has the most features. There are many different formats to choose from per type, and it supports mapping type to format for nested structures. That is, different formats can be used for values in arrays and hashes, if arrays/hashes are short or long etc. Several blog posts would be needed to cover all of the functionality, so here are some examples:

String(undef)        # produces "" (empty string)
String(undef, '%d')  # produces "NaN" (we asked for a number)

$data = [1, 2, undef]
String($data)        # produces '[1, 2, undef]'

# A format map defines type to format mappings, for
# array and hash, there is a specific map for contents
# that is applied recursively.
# (See documentation for full information).
#
$formats = { Array => {
  format => '%(a',
  string_formats => {
    Integer => '%#x',
    Undef => '%d'
}}}

String($data, $formats) # produces '(0x1, 0x2, NaN)'

# Formatting with indentation
String([1, [2, 3], 4], "%#a")
# produces:
# [1,
#  [2, 3],
#  4]

Conversion is easy to use in interpolation

Typical use of formatting is when interpolating values into strings. The normal interpolation uses a default string conversion mechanism and this does not always give what you want.
Using the new() function is especially convenient when flattening, or unrolling arrays into strings as the String conversion provides full control over start/end delimiters and separators.

$things = [
  'Cream colored ponies',
  'crisp apple strudels'
  'door bells',
  'sleigh bells',
  'schnitzel with noodles'
]
notice "${String($things,'% a')}. These are a few of my favourite things."

would notice

"Cream colored ponies", "crisp apple strudels", "door bells", "sleigh bells", "schnitzel with noodles". These are a few of my favourite things.

Not exactly what we wanted. We did get an array join with separator ", " by default, the format "% a" removed the start and end delimiters from the array, but we got quotes around the favourite items. Also to make this read like the Mary Poppins song, we like to insert the word “and”. So, here is the next version where we define the format to use:

$formats = { Array => {
  format         => '% a',
  separator      => ', and ',
  string_formats => {
    # %s is unquoted string
    String => '%s',  
}}}
notice "${String($things, $formats)}. These are a few of my favourite things."

would notice:

Cream colored ponies, and crisp apple strudels, and door bells, and sleigh bells, and schnitzel with noodles. These are a few of my favourite things.

And just for the fun of it - lets turn that into a function.

function silly::mary_poppinsify(String *$str {
  $formats = {
    Array => {
      format         => '% a',
      separator      => ', and ',
      string_formats => {
        String => '%s',
  }}}
  "${String($things, $formats)}. These are a few of my favourite things."
}

So, finally, with a personal touch:

notice silly::mary_poppinsify(
  "Keys on pianos",
  "food in a bento",
  "progressive metal",
  "solos by Argento", 
)

(Printout left as an exercise).

Read more about type conversion in the specifications repository. Where each type
is documented, for instance String.new. The other types are in the same document.

When Puppet 4.5.0 is released this information will also show up in the regular documentation for function new().

Notes on a couple of advanced things

The String format map is processed in such a way that the formats given when calling new() are merged with the default formats. This merge takes type specificity into account such that types that are more specific have higher precedence. For example if the value to format matches two formats, one for type T, and another for type T2, if T2 < T then the format for T2 will be used, for example {Any => %p, Numeric => '%#d'} which means all values in programmatic form (strings are quoted, arrays and hashes have puppet language style
delimiters, etc.), and all numeric variables in quoted numeric form (that is "10" instead of the default %p which would have resulted in just 10 (without quotes).

Summing Up

The new() function supports creating new objects / values which can be used for data type transformation / casting and formatting. As you probably noticed, simple and common things are easily achieved while more complex things are possible. Conversions have become far more important in the Puppet Language now when there is EPP (templates in the puppet language). where the result is often some kind of configuration file with its own syntax and picky rules - so the details do matter.

The idea behind the more complex formats, and alternatives is to provide a rock bottom implementation that can be used to implement custom functions in the Puppet Language that can be reused in manifests as well as in templates.

There is probably a few common conversion tasks that occur frequently enough to warrant a format flag of their own that I missed to include in the first implementation. When writing this blog post for instance, it would have been nice if there was a format for “array with all things in it in %s format and no delimiters”; but then I would not have been able to show how that is done in long format. File tickets with wishes, or make Pull Requests with code as they are always welcome.

Hope you find this supercalifragilisticexpialidociously useful.


  1. Since 4.5.0 is not yet officially released, you can read the documentation in the source for new.rb, or in the specifications per type (link to String.new).

3 comments:

  1. meh, not happy with the code syntax highlighting... need to do something about htat

    ReplyDelete
  2. This is awesome, except the separator adds a space after the separator string. In your example the separator is specified as `, and` while the output separator is `, and `. Note the trailing space.

    ReplyDelete
    Replies
    1. Super thanks! I updated the code and added the missing space (in several places).

      Delete