Thursday, May 5, 2016

Digging out data in style with puppet 4.5.0

In Puppet 4.5.0 there are a couple of new functions dig, then and lest that together with the existing assert_type and with functions makes it easy to do a number of tasks that earlier required conditional logic and temporary variables.

You typically run into a problem in programming languages in general when you are given a data structure consisting of hashes/arrays (or other objects), and you need to “dig out” a particular value, but you do not know if the path you want from the root of the structure actually exists.

Say you are given a hash like this:

$data = {
  persons => {
    'Henrik' => {
      mother => 'Anna-Greta',
      father => 'Bengt',
    },
    'Anna-Greta' => {
          mother => 'Margareta',
          father => 'Harald',
          children => ['Henrik', 'Annika']
    },
    'Bengt' => {
      mother => 'Maja',
      father => 'Ivar'
    },
    'Maja' => {
      children => ['Bengt', 'Greta', 'Britta', 'Helge']
    },
  }
}

Now, you would like to access the first child of ‘Anna-Greta’ (in case you wonder this is part of my family tree). This is typically done like this in Puppet:

$first_child = $data['persons']['Anna-Greta']['children'][0]

Which will work just fine (and set $first_child to 'Henrik') given the $data above. But what if there was no ‘Anna-Greta’, or no ‘children’ keys? We would get an undef result, and the next access would fail with an error.

To ward of the evil undef you would have to break up the logic and test at every step. For example, something like this:

$first_child = 
if $data['persons']
   and $data['persons']['Anna-Greta']
   and $data['persons']['Anna-Greta']['children'] =~ Array {
     $data['persons']['Anna-Greta']['children'][0]
   }

Is what you end up having to do. (Not nice).

This is where the dig function comes in. Using dig the same is done like this:

$first_child = $data.dig('persons', 'Anna-Greta', 'children', 0)

Which automatically handles all the conditional logic. (Yay). If one step happens to
result in an undef value, the operation stops and undef is returned. If this was all we wanted to do, we would be done. But what if we require that the outcome is not undef, or if we wanted a default value as the result if it was undef?

There is already the function assert_type that can assert the result (and optionally return a new value if the assertion fails). If we use that we can write:

$first_child = NotUndef.assert_type(
  $data.dig('persons', 'Anna-Greta', 'children', 0)
)

Which would give us an automated error like “expected a NotUndef value”. While functional
we can do better by customizing the error:

$first_child = NotUndef.assert_type(
  $data.dig(
    'persons', 
    'Anna-Greta', 
    'children', 
    0)) |$expected_type, $actual_type | {
      fail ("Did not find first child of 'Anna-Greta'")
    }

But that is quite tedious to write because the assert_type function is designed to take
two arguments - the expected type (NotUndef in this example), and the actual type of the argument (in this case Undef). But we already knew that would be the only possible outcome, so there is lots of excess code for this simple (and common) case.

This is where the lest function comes in. It takes one argument, and if this argument matches NotUndef, the argument is returned. Otherwise it will call a code block (that takes no arguments), and return what that returns. Thus, this is a specialized variant of assert_type that makes our task easier. Now we we can write:

$first_child = 
  $data.dig('persons', 'Anna-Greta', 'children', 0).lest | | {
      fail("Did not find first child of 'Anna-Greta'")
  }

Much better - it now reads nicely from left to right, and it is clear what is going on.
If we wanted a default value instead of a custom fail, we can do that:

$first_child = 
  $data.dig('persons', 'Anna-Greta', 'children', 0).lest | | {'Cain'}

Now - lets do something more difficult. What if we want to use the value
of the first child of Anna-Greta (that is, ‘me’) to find my aunts and uncles on
my father’s side? That is if we first computed $first_child, we would continue with:


$first_childs_father = $data.dig('persons', $first_child, 'father')
$first_childs_fathers_mother = $data.dig('persons', $first_childs_father, 'mother')
$first_childs_fathers_mothers_children =
  $data.dig('persons', $first_childs_fathers_mother, 'children')

That works, but we had to use the temporary variables. To be correct we also need to
remove my father (‘Bengt’) from the set of children returned by the last step.

I am not even going to bother writing that out in longhand to handle all the possible ‘sad’ paths. (Left as an exercise if you have run out of regular navel fluff).

Instead, we are going to write out the entire sequence, and now using the function then, which is the opposite of lest. It accepts a single value, and if it matches NotUndef it calls the block with a single argument, and returns what the block returns. If the given value is undef, it simply returns this (to be dealt with by the next step in the chain of calls.

$data.dig('persons', 'Anna-Greta', 'children', 0)
.then |$x| { $data.dig('persons', $x, 'father')}
.then |$x| { $data.dig('persons', $x, 'mother')}
.then |$x| { $data.dig('persons', $x, 'children')}
.then |$x| { $x - 'Bengt' }
.lest | | { fail("Could not find aunts and uncles...") }

We have an obvious flaw here since the name of my father is hard coded.
There is also no handling of the ‘sad’ path of ‘children’ not being an Array as we did
not type the data.

For the final example, lets make this into a generic function that finds the aunts and uncles on the father’s side of any mother’s first child.

We then end up with this function that performs five distinct steps:

function custom_family_search(String $mother) {
  # 1. start by finding the mother's children and pick the first
  $data.dig('persons', $mother, 'children', 0)

  # 2. Get the father of the child (needs to be looked up since
  #    $x here is just the name of the person).
    .then |$x| { $data.dig('persons', $x, 'father') }

  # 3. Look up the siblings of found father, and return those
  #    as well as the father (needed to eliminate father in
  #    the next step. ($x is father from previous step).
    .then |$x| { [ $data.dig(
                      'persons', 
                      $data.dig('persons', $x, 'mother'),
                      'children'),
                    $x
                  ] }

  # 4. Eliminate father from siblings
  # Previous step is never undef since we construct an array,
  # but the first slot in the array may be undef, or something that
  # is not an array! Thus, we don't need the conditional 'then'
  # function, and can instad use the 'with' function.
  # A 'case' expredssion is used to match the 'happy' path where the
  # name of the father is 'subtracted'/removed
  # from the array of his siblings. The 'sad' path produces
  # 'undef' and lets the next step deal with it.
  #
    .with |$x| { case $x {
                 [Array[String], String] : { $x[0] - $x[1] }
                 default                 : { undef }
                 }
               }
   # 5. we fail if we did not get a result
   #
    .lest | | { fail("Could not find aunts and uncles...") }

  # Function returns the value of the last call in the chain
}

notice custom_family_search('Anna-Greta')

And now we can test:

> puppet apply blog.pp
puppet apply blog.pp
Notice: Scope(Class[main]): [Greta, Britta, Helge]

Full Final Example Source.

In Summary:

  • dig - digs into structure with mix of hash keys and array indexes, may return undef
  • then - calls the block on the ‘happy’ path, undef otherwise
  • lest - calls the block on the ‘sad’ path, given value otherwise
  • with - (unconditional), passes on its given value to the block and returns its result
  • assert_type - checks path is ‘happy’ (matches type) and calls block on ‘sad’ path

In case you wonder about the lines of code that start with a period like this:

.then ...

This is simply a continuation from the line above - puppet is generally not whitespace significant (with only a few exceptions). Thus it does not matter where the ‘.’ is placed.
I choose to align the .then steps to make it readable. If you have something short
you can make it a one-liner:

# These are all the same
$x = $facts['myfact'].lest | | { 'default value for myfact'}

$x = $facts['myfact']  .  lest | | { 'default value for myfact'}

$x = $facts['myfact']
     .lest | | { 'default value for myfact'}

$x = $facts['myfact'].
     lest | | { 'default value for myfact'}

Hope this will be useful for you, and that it gives you an additional tool in your Puppet language toolchest.

This was also the first time I used StackEdit to write a blog post. I hope all the formatting of code turns out ok.

Best,

2 comments:

  1. thanks for the writeup.

    I was looking at the new "dig" function to parse nested hashes, I had the same problem in python, where I would get errors trying to pull out value from an empty or nonexistent index,

    print data['somevalue'][2]['blah']

    IndexError

    I wrote up a function for this, with some additional functionality (like NotUndef)

    this is python, but the idea is the same,

    https://medium.com/@mike.reider/python-dictionaries-get-nested-value-the-sane-way-4052ab99356b

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete