Thursday, November 15, 2018

Puppet 6 type system - Object Inheritance

Puppet 6 type system - Object Inheritance

Puppet 6 Type System - Object Inheritance

Introduction

This is the third posting in the series about the Object data type in the Puppet Type System - Pcore. The first post introduced the Object data type and the history behind Pcore. You probably want to read that first. The second post covers more about how attributes are specified. In this post, I will cover inheritance as well as one feature I forgot in the second post.

Constant Attributes

Yeah, so, I forgot to mention that there is a short form for specifying constants. In the second post I showed that a constant can be defined like this:

attributes => {
    avg_life_expectancy => {
    type => Float,
    kind => constant,
    value => 70.5,
  }
}

Which is a bit much when really all that is needed is the name and the value (since type can be inferred from the value). For that reason there is constants that work the same way as attributes, but the short form expects a value rather than a type.

This means that the above example can be written:

constants => {
    avg_life_expectancy => 70.5
  }

Ok - with that bit out of the way, moving on to the main attraction of this post… Inheritance.

Object Inheritance

The Puppet Type system supports classic OO inheritance such that an object type can inherit the definition of exactly one parent object type (which in turn can inherit, and so on). None of the OO concepts “interface”, “multiple inheritance”, or “extension” exists in Pcore, although an Object definition is in fact an interface from which an implementation is generated or for which an implementation is registered at runtime.

Notably, it is only possible to inherit from data types that are in turn some kind of Object. It is for example not possible to inherit from Integer or String - for those it is possible to instead create a type alias - but it is not possible to create new kinds of integers or strings.

Here are examples of using type aliases:

type PositiveInteger = Integer[0, default]
type Age = PositiveInteger

Specifying Inheritance

To make an Object type inherit another its parent type is
specified. The parent type must be an Object type:

type Vehicle = {
  attributes => {
    manufacturer => String,
  }
}
type Car = {
  parent => Vehicle,
  attributes => {
    reg_nbr => String,
  }
}

Creating Instances

When creating an instance, it is possible to either specify arguments by position or giving them as a hash. This is the same
as when there is no parent type specified, but when there is a parent, all of the parent’s attributes must be given before the attributes of the type. This also applies if there is a longer inheritance chain.

Giving attributes by position - ancestors in order:

notice(Car('Lamborghini', 'XYZ 666'))

Giving attributes as a hash - order does not matter:

notice(Car(
  reg_nbr =>'XYZ 666',
  manufacturer => 'Lamborghini'
))

As you have probably already figured out, using “arguments by
position” is really useful when there is just one or two arguments, but becomes difficult to read and maintain when there are multiple
and attributes. Also, giving arguments in hash form is required when a parent type has optional arguments.

Using ‘override’

A type may override inherited attributes (including constants) and operations (to be described in a separate post). In order to override something there are a number of rules regarding how and with what something can be overridden. These rules are in place to assert that the contract put in place by a parent type is not violated by a child type - for example say we have a Vehicle type with an attribute weight => Integer[0, default], then it would not be good if some child type changed that to be a String since that would mean that not all kinds of vehicle have the same contract!

When specifying override => true for an attribute the attribute it is overriding must exist in a parent. This is a code safety thing as logic could otherwise think it is overriding something but instead ending up adding a new attribute that probably does not get read or used as intended. Likewise, if override is not set for an attribute it is an error if this attribute already exists in a parent. This is also about code safety as the author of a parent type may not be aware of your subtype, and likewise you may not be aware of new attributes added in a later version. In summary, these rules protected code from having potential problems with both accidental and non effective overrides.

When overriding all properties of the attribute must be specified - there is no “merging” of the parent’s specification and the overriding child’s.

The rules are:

  • The given type must be the same or a narrower type than the type specified for the parent attribute. For example if parent specifies attribute as Integer, and override can narrow that to Integer[0, 100] since all values for the child are also valid for the parent.
  • It is allowed to override a non constant attribute (implied required or optional, derived, derived_or_given) with a constant.
  • It is not allowed to override a constant as that would break the contract.
  • It is not allowed to override an attribute that is specified with final => true.

There is a little more to this as attributes and object functions (a.k.a methods) are in the same name space and certain forms of overriding functions with attributes and vice versa are possible, but the rules above apply also in those circumstances.

Using ‘final’

As noted in the section about ‘override’, specifying final => true will remove the possibility to override that attribute. This feature should be used sparingly as it makes it more difficult to reuse types, but it may be warranted for things related to assertions or security - for example you would not want a child type to be able to “lie”.

Matching on Type

With inheritance matching naturally matches on the type and all of the parent types. For example, using Car and Vehicle example:

$a_car = Car(
  reg_nbr => 'ABC 123',
  manufacturer => 'Lamborghini'
)
notice $a_car =~ Car     # notices true
notice $a_car =~ Vehicle # notices true
notice $a_car =~ Object  # notices true

It is currently not possible to match on individual traits/interface - for example this returns false:

notice $a_car =~ Object['attributes' => { 'reg_nbr' => String }]

This does not match even if the value $a_car is a Car and it has an attribute named reg_nbr since it is only matching on the type itself and the object specification used to match creates another (anonymous) subtype of Object.

In the Ruby API it is possible to get the type information (i.e. there is “reflection”), but that is not yet available in the Puppet Language without having to jump through hoops.

Why use Objects

You probably wonder when you should be using objects instead of just plain-old-hashes, they are after all quite similar, especially if you typed the hash as a Struct and gave it an alias.

type MyThing = Struct[ 'reg_nbr' => String, 'manufacturer' => String]

That is pretty much the same as the Car data type we defined earlier. Here is what the differences are:

  • The struct value needs to be type checked every time it is given to a function as the underlying value is just a Hash - it does not know if it matches any particular structs at all.
  • In contrast, an Object is “type checked” when it is created and therefore it is enough to to just check if something is a Car as that implies that the data types for all of its attributes have been checked and found to be ok.
  • There is no inheritance among Hash/Struct - if you want something like Car and Vehicle you have to repeat all of the “parent” struct definitions in a “child” struct. This become a chore and maintenance nightmare if there are many different data types with many common traits. (Some technologies that you may want to manage with Puppet has very rich configuration data for example).
  • Objects support polymorphic behavior - i.e. you can define methods / functions that operate on objects such that different types of objects have different behavior. While you can write functions that operate on structs of certain subtypes of Struct, you cannot select which one to call based on the kind of struct without having prior knowledge of all such structs and having them in a large case expression (or a bunch of if-then-else). More about this in a later blog post.

Summary

This blog post introduced the concept of inheritance between Object based data types in Puppet’s type system (Pcore). Puppet uses a classic Object Oriented single inheritance scheme.

Monday, November 12, 2018

Puppet 6 type system Object attributes

Puppet 6 type system Object attributes

Puppet 6 Type System - More about Object Attributes

Introduction

This is the second posting in the series about the Object data type in the Puppet Type System - Pcore. The first post introduced the Object data type and the history behind Pcore. You probably want to read that first.

In this post I am going to show how attributes of Objects work in more detail.

Recap Defining an Object data type in Puppet

As you may recall from the earlier post - an Object data type can be created like this in Puppet:

type Car = Object[attributes => {
  reg_nbr => String,
  # ...
}]

(And, if done in Ruby, the part between the brackets goes into the interface section of the create_type body (see the first post).)

When defining an Object, the above can be shortened:

# The brackets can be omitted
type Car = Object {attributes => { reg_nbr => String }}
# The type Object can be omitted
type Car = {attributes => { reg_nbr => String }}

Attributes

Attributes are the instance/member variables of instances of an Object data type and they come in different flavours: required, optional, derived (two kinds), constant and they can be marked as being an override or being final - all explained below.

Recap of type creation, creating a new instance, and getting attributes:

# This defines the type
type Car = Object {attributes => { reg_nbr => String }}
# This creates an instance
$a_car = Car('ABC 123')
# This gets that instance's variable/attribute reg_nbr so
# 'ABC 123' will be noticed
notice($a_car.reg_nbr)

Attribute Definition - Short and Long Form

Attribute Name

The name is given as a hash key in both the short and long form.
The attribute’s name is a String and it must be unique within the object among both attributes and operations. This rule extends to its parents attributes and operations as a redeclaration means it is an override of the parent’s definition and it must be marked as such to be accepted. The name must start with a lowercase letter and cannot be qualified (i.e. defined as being in a namespace).

Short Form

You have already seen the short form of attribute definition:

reg_nbr => String

Which is an attribute name to data type mapping. An attribute specified like this is always a regular required attribute. All other kinds of definitions require use of the Long Form.

Long Form

In the long form of attribute declaration the map is from an attribute name to a Hash of attribute options. The equivalence of the short form is this:

reg_nbr => { type => String }

When using the long form there must at least be a definition of type.

Attribute Options

name type meaning
annotations - Advanced: Allows association of annotations - to be explained in a separate blog post
final Boolean A final attribute cannot be overridden in sub types.
kind Enum See “Attribute Kind” below
override Boolean Must be set to true if overriding a parent type attribute or operation, an error is raised if attribute/operation does not exist in a parent type.
type Type An upper cased type reference
value Any A literal default value constrained by kind and type

Note: Inheritance will be covered in a coming blog post and I will explain the importance of final and override then.

Attribute Kind

name meaning
constant The attribute is a constant; cannot be set when creating an instance, must have value specified.
derived The attribute’s value is computed. There must exist a method at runtime to compute the value. The attribute’s value cannot be given when creating an instance.
given_or_derived Same as derived, but the value may be given when an instance is created. Think of this as a computed default value.
reference Advanced: Default is false, and true means that the value is not contained and is thus serialized as a reference to a value that must exist elsewhere (typically in the same serialization). To be explained in another blog post.

Note: derived was covered in the first blog post in this series.

Multi Valued Attributes

Multi valued attributes are simply defined as being of Array/Tuple or Hash/Struct data type where the type parameters are used to constrain the number of allowed values and their data type which can be any type in Pcore.

This is a big win compared to some other modeling technologies where multi valued attributes must be scalars.

type Garage = { attributes => {
  parked_cars => Array[Car]
}}

Union Values are Variants

Since Pcore has a Variant data type; describing that a value must be one out of several possible data types, it is easy to model more complicated data models.

means_of_transportation => Variant[Car, Boat, Bike]

Extra attributes/values

The Object type does not allow “extra attributes” like in some modeling
technologies where it is possible to specify a required set and any named additional extra attributes. With Pcore Object you have to model that
as an Object with a hash attribute where the extra “optional” values go.

Typical Usage

Typically, attributes are either required and can be specified using the short form, or they are optional and can either be specified:

  • in short form using Optional[T] if it is acceptable to have undef as the default value, or…
  • in long form with type and value (if default value should be something other than undef). In this case the type should not be Optional[T] unless you want to be able to explicitly assign undef even if default value is something else.

Here is an example showing different kinds of attributes:

type Person = { attributes => {
  # name is required
  name => String,

  # optional with undef default value
  nick_name => Optional[String],

  # fav color is optional and defaults to 'blue'
  fav_color => {
    type => Enum['blue', 'red', 'green'],
    value => 'blue', 
  }
  # avg_life_expectancy is a constant
  avg_life_expectancy => {
    type => Float,
    kind => constant,
    value => 70.5,    # globally
  }
}}

Summary

In this post I covered the details of specifying Object attributes and the various kinds of attributes “required”, “optional”, “constant”, and “derived”. In the next post I will cover inheritance of Object types.

Tuesday, October 23, 2018

Puppet 6 Type System - Object and Custom Data Types

Puppet 6 type system Object and Custom Data Types

Object and Custom Data Types in Puppet

Type System In Retrospect

In 2014 and 2015 when I was busy with implementing the Puppet Type System I was not sure how it would be received. Today I am very happy with how it turned out as it has been very well received and is now extensively used in Puppet modules for both Puppet and Bolt. (I am talking about data types like Integer, String, Hash, Array, parameterized types like Array[String], plus all the more specialized data types. This blog post is not about those - you can go and read read about all of them in the official documentation Puppet Documentation - Data Types). This post is about something much more exciting. Jumping back in history a bit…

A couple of things were bothering us:

  • There was no way to extend the type system with custom data types; you really would have to contribute the data type to Puppet’s code base for that to work.
  • We were using RGen for meta modeling (how to describe a model using another model (“meta model”)). RGen is an implementation of UML/Ecore metamodeling, and we were not super happy with the performance and implications of using it. The primary use case for us was to model the > 100 classes in the Puppet AST; the data structure being the result of parsing Puppet Language Logic. While RGen is great, it was not a perfect fit.
  • Serialization in Puppet just sucked in general, and was especially difficult to use with data types not having a 1:1 representation in JSON.

As we were discussing this back and forth (“we” being me and Thomas Hallgren at Puppet), he came up with the brilliant idea to implement meta modeling based on the Puppet Type System, and that we should replace RGen with our own implementation rather than trying to fit UML/Ecore style modeling into the Puppet type system. One major incompatibility and headache in trying to marry the type system with RGen was that the type system assumes immutability and RGen/Ecore does not really do that. Further, Ecore sprung more or less from the Java type system and while Java/Ecore has generics that is nowhere near the power of the Puppet Type system’s parameterized types.

In 2015 and 2016 we worked out the design for what we call Pcore - a term we now use as the name of the Puppet type system. The specification for Pcore turned out to be a major opus with lots of things to work out and explain, and while I was working on that, Thomas Hallgren did a herculean job on the implementation; (i.e. my already brilliant implementation 😎 got even more so as the result), and Thomas did an amazing job on the Pcore implementation and the related serialization protocols.

We had two early use cases for Pcore; we built the puppet generate types feature for environment isolation using Pcore, and we had one internal project using it to model network devices. Our major use case was however to update the Puppet AST from using RGen to using Pcore. That was committed in February 2016 - and now the Puppet Language AST became implemented in the Puppet Language 🤯- read the source of ast.pp here. In Puppet 5.0.0 we switched and dropped the use of RGen.

In the Puppet 5.x time frame it was possible to experiment with the features by using rich_data=true in the configuration - but this only worked for puppet apply and for puppet resource, you still could not use this in an agent/master scenario. And, while it worked to send rich data to PDB, it was not exactly what we wanted.

Now in the fall of 2018, with the Puppet 6.0.0 release the use of rich_data=true is on by default and the work we started in 2015 can now finally be used! 🎉

Earlier, it was not terribly meaningful to blog about the wonderful things you can do with Pcore since - well, you could not really use it in practice. But now you can!

A Blog Series about Pcore

I intend to blog about Pcore in a series of posts - this being the first.

There are a lot of things to cover as you can see if you go and read the quite long specification (74 pages), but I am going to take a more pragmatic approach and show useful examples rather that serving you reference material. I also have a lot of work to do taking the Pcore specification in its current form and turning into a more formal specification for the puppet-specifications repository. That work is quite tedious so I am going to mix that with blogging about the features.

The Object Data Type

At the heart of the type system there is the Object data type. You can create one in Puppet if you like, or in Ruby. It can be a simple object only having attributes, or a more complex one also supporting callable methods.

A Car data type in Puppet

For simplicity this is one example in a manifest. You real data types in puppet should be using locations on this form <moduleroot>/types/<typename>.pp as that makes them autoloaded.

In example1.pp:

type MyModule::Car = Object[{
  attributes => {
    reg_nbr => String,
    color => String,
  }
}]
$my_car = MyModule::Car('abc123', 'pink')
notice $my_car

Notices the car:

$ puppet apply example1.pp
Notice: Scope(Class[main]): MyModule::Car({'reg_nbr' => 'abc123', 'color' => 'pink'})

You can use code like this while compiling. Puppet will even autoload the data type just like it does with type aliases - i.e. something like type MyType = Array[String]. You can however not yet use such a data type on the agent side because there is no pluginsync of data types defined in the Puppet Language. If you try you will get an error like this:

Could not intern from rich_data_json: No implementation mapping found for Puppet Type MyModule::Car

It does however work if your data type is implemented in Ruby since everything under lib/puppet in your module is synced to the agent! Let’s implement the same data type in Ruby.

A Car data type in Ruby

In <mymodule>/lib/puppet/datatypes/car.rb:

Puppet::DataTypes.create_type('MyModule::Car') do
  interface <<-PUPPET
    attributes => {
      reg_nbr => String,
      color => String,
    }
   PUPPET
end

A note about file location: as you see it is under lib/puppet/datatypes since lib/puppet/types is for resource types (for historical reasons).

Now we try applying a manifest using that - site.pp:

notify { "example":
  message => MyModule::Car("abc123", "pink")
}

Which we can try out most easily with apply:

puppet apply site.pp

Which results in this:

Notice: /Stage[main]/Main/Notify[example]/message: defined 'message' as MyModule::Car({
  'reg_nbr' => 'abc123',
  'color' => 'pink'
})

You can try that with an agent as well - you should get the same result.

If you look inside the catalog - the notify looks like this:

    {
      "type": "Notify",
      "title": "example",
      "tags": [
        "notify",
        "example",
        "class"
      ],
      "line": 8,
      "exported": false,
      "parameters": {
        "message": {
          "__ptype": "MyModule::Car",
          "reg_nbr": "abc123",
          "color": "pink"
        }
      }
    }

This is the rich_data serialization format which is a “Pcore in human readable JSON” serialization. If you want to learn everything there is to know about serialization and the rich-data format look at the specification for Pcore Data Representation.

An Object data type with methods

In the Puppet Language you cannot yet implement methods of an Object data type. While it is possible to specify the interface for methods in Puppet, the data type cannot be used unless there is an implementation available for the methods.

We can do this in Ruby however. There are a couple of options:

  • The implementation can be done inside the code block given to create_type. This is what I am showing in this blog post.
  • The implementation can be any Ruby class that implements (at least) the interface.
  • The implementation can autoload the implementation from inside the module’s lib. (This should be the last resort as you must use the same version in all environments).

Defining methods in implementation

Methods for instances of the data type are easily added inside
a block given to a call to the implementation method.

The first kind of method I am showing is one that is needed when we declare an attribute to be of kind derived. Note that the specification for the attribute age is now a hash with more details than just the type. The kind derived means that the value of the attribute is computed/derived from other attributes and that it cannot be given when creating an instance of the data type. Since it needs to be computed, there must be an implementation of that computation.

Puppet::DataTypes.create_type('MyModule::Person') do
  interface <<-PUPPET
    attributes => {
      name => String,
      year_of_birth => Integer,
      age => { type => Integer, kind => derived }
    }
  PUPPET

  implementation do
    def age
      DateTime.now.year - @year_of_birth
    end
  end
end

We can use that in a manifest like this:

$p = MyModule::Person('Henrik', 1959) # yeah, that old...
notice "Name: ${p.name}, Age: ${p.age}"

As you may have figured out, with this approach, Pcore will automatically provide a constructor and methods to get the attributes - all we had to do was to supply the missing age computation.

The constructor takes either positional arguments, given in the order
they are specified in the interface, or a Hash of attribute name to value.
Thus we can create the same Person like this: Person('name' => 'Henrik', 'year_of_birth' => 1959)

We can use this to add additional methods - they must be specified in the interface if you want them to be available in the Puppet Language. Methods
not specified in the interface are still available to the Ruby code.

Defining functions in the interface

In order to enable calling methods on a data type (other than those implied
by the attributes, and the general API of all objects) they must be defined in the
data type’s interface.

Puppet::DataTypes.create_type('MyModule::Image') do
  interface <<-PUPPET
    attributes => {
      image_url => URI,
    }
    functions => {
      # resize is an operation that takes two integers (min 1)
      # for x, and y, and returns a new MyModule::Image
      # for the resized result.
      resize => Callable[[Integer[1], Integer[1]], MyModule::Image],
      # image_bytes returns a Binary containing the image
      image_bytes => Callable[[], Binary]
    }
  PUPPET

  implementation do
    def resize(x, y)
      # an imaginary service uploads the image, resizes
      # it, and provides an url to the resized image
      new_url = SomeService::process(@image_url, 'resize', x, y)
      # Return a new MyModule::Image based on the new url
      self.class.new(new_url)
    end
    def image_bytes()
      # an imaginary service gets the image as Base64 encoded string
      bits_base_64 = SomeService::process(@image_url, 'get')
      Binary(bits_base_64)
    end
  end
end

Summary

This post introduced the Object data type and shows how it is defined in Puppet and Ruby and how the Ruby implementation allows also defining behavior in methods that can be used from the Puppet Language.

The use of Objects with methods provides a richer extension mechanism to Puppet than functions and when using the provided support to implement these, they are completely (Puppet) environment friendly since each environment can have a different version of the implementation (still: any external gems you require must be the same for all environments).

While there is a lot more to say about how you can specify attributes, their data type, default values, derived values, and how to define operations/methods, and how to map an object data type to an existing Ruby class - I hope this blog post gives you enough to be able to experiment.

Look out for more posts in this series.

Monday, October 15, 2018

Puppet PAL wants to be your friend

Puppet PAL wants to be your friend

Puppet PAL wants to be your friend.

PAL stands for Puppet As-a Library and it is a new Ruby API in Puppet giving an application written in Ruby access to an API for Puppet Language related operations ranging from full scale features such as compiling a catalog to fine grained parsing and evaluating Puppet Language logic.

PAL was introduced as an experimental feature in the 5.x series (primarily to support Bolt). Now with both Puppet 6.0 and Bolt 1.0 having been released the experimental status of PAL is lifted and it will now follow Semver. And - it is about time this post got written to make the features of PAL more widely known.

This first blog post introduces PAL and contains reference material for its use. I will come back with more posts with additional examples as this blog post is already quite long…

Yet another API ?

You may ask why PAL is needed when Puppet already has APIs for (almost) everything. I would characterize the problem as the existing APIs are either too high level or too low:

  • the high level APIs are not flexible enough - sure you can ask for a catalog just like the agent does, but you have very little say over how that is done and it is very hard to mixin your custom variations.
  • the lower level APIs naturally work, but using them is like getting a dump of Lego pieces to assemble any way you like.

As a result of this, those that wanted some kind of variation of a “puppet apply”, or “puppet master compile” application would typically copy long sequences of code from one of the implementations in Puppet (yes there are several). This creates a problem because it also means copying bugs, and missing features and then having to play catch up whenever the implementation in Puppet changes.

A design goal for PAL was to come up with an API that would work even if the underlying implementation of Puppet was written in another language, or for a remote service. That in turn means that PAL cannot expose the underlying implementation classes directly to the user of the API.

I think we succeeded with the ambitions for PAL, but as always time constraints required us to make a couple of trade offs. The one part that comes to mind is that PAL still requires the Puppet settings system to be initialized and it is thus not free from concern from the rest of Puppet. A number of helper classes used in Puppet does not have wrappers and it did not make sense to create those - they may need to change in some distant future - if anything at this point, it is a bit strange/ugly/confusing to see the odd class popping up in PAL from deeper down in the puppet module hierarchy. Notably, an API for querying the catalog is missing (although the Catalog has an API it exposes your logic to many implementation details). We wish to fix these things in future versions of PAL.

A Conceptual View of Puppet Internals

The following graph is an illustration of what is going on inside Puppet when a catalog is being compiled (or for that matter when something seemingly trivial as getting the result of a Puppet Language expression like 1+1).

defines what is loadable
produces
with side effect
produces
evaluates
Node
Environment
Facts
TopScope
Settings
Code
Compiler
ModulePath
Hiera
Modules
Certificate
Evaluator
Parser
Lexer
Result
Catalog
EppEvaluator
AST
Context

PAL is an API that abstracts this complex internal configuration. While the parts have their own API it is difficult to assemble them correctly (and in the right order). (Note that the graph is a simplification as many of the arrows are bidirectional).

The Context requires a note as it is something that exists in the Puppet Implementation - it is simply a way to set and override what can be thought of as global variables - key/value bindings that can be obtained anywhere inside the code in a particular context. The context is used to enable access to things that would otherwise have to be passed around in every call inside Puppet.

Script and Catalog Compilers

PAL has the concept of a Compiler - being either a ScriptCompiler or a CatalogCompiler. As you can guess, the catalog compiler produces a Catalog, and the script compiler does not. The script compiler is more lightweight and allows use of tasks, plans, and the apply keyword but not any of the catalog building expressions (except when they are inside an apply clause).

While some operations can be done with PAL directly, you almost always will need one of the compilers.

Examples

Evaluating a string from the command line

This small sample is all that is needed to evaluate a string of Puppet Language logic given on the command line (similar to what a puppet apply -e does):

eval_arg_script.rb:

require 'puppet_pal'
Puppet.initialize_settings
result = Puppet::Pal.in_tmp_environment('pal_env',
  modulepath: [],
  facts: {}
  ) do |pal|
    pal.with_script_compiler {|c| c.evaluate_string(ARGV[0])}
  end
puts result

Let’s try it out on the command line:

bundle exec ruby eval_arg_script.rb '1+1'
2

Note: I am leaving out all things related to setting up an environment
with puppet and its dependencies, getting a Ruby of a particular
version etc. etc. as that requires a series of blog posts on its own. I have rbenv
installed, I run puppet from source, and I use bundle install (or update) as I shift
between puppet versions. You will most likely install puppet as a gem and use that. (Note that puppet_pal comes from the puppet gem - there is another gem that has nothing to do with this PAL that is named puppet_pal.)

Here is a breakdown of the example:

require 'puppet_pal'

Here PAL is required, and it will in turn require puppet. This is done this way since right now a require 'puppet' will require almost everything inside puppet, and we may modify that so only the relevant parts of puppet are required when using PAL.

Puppet.initialize_settings

Sadly, this is needed as we did not have time to change the puppet code base to get values from settings in such a way that they can be given to PAL. Thus, a full initialization of the settings is required. This in turn requires a configured puppet installation - from which the settings are read.

result = Puppet::Pal.in_tmp_environment('pal_env',

Here we are telling PAL that we are going to do things in a temporary environment. We let PAL create a temporary location for an environment that we name pal_env. This environment will be empty. As you will see later there are other ways of specifying an environment to operate in. The name of the environment is not really important here, but you may want to avoid production just to make it not be confused with the environment with the same name that is default in Puppet.

  modulepath: [],
  facts: {}

Here we give the environment two important inputs - we don’t have any modules we want to use anywhere so we use an empty array. We also initialize the facts to an empty hash - this is done to speed up loading as PAL runs facter to obtain the facts if they are not specified. This can take something like 0.5-1sec. The downside is naturally that $facts will be empty. There are other ways to specify the facts. As you can see in the diagram, a node is actually required in most situations - and in our simple example we did not specify anything related to node - and PAL with then assume that the host the script is running on is the node to use. Thus, in the example with get “localhost” (whatever its name is), and empty set of facts. More about this later.

  ) do |pal|
    pal.with_script_compiler {|c| c.evaluate_string(ARGV[0])}
  end

Here we give a lambda to the call to in_tmp_environment, it gets an instance of PAL as its argument - pal thus represents the environment in which we are going to be doing something. We then call with_script_compiler to get a script compiler, and it takes a lambda which is called with an instantiated compiler - thus c is our interface to getting things done. We call evaluate_string with ARGV[0] (the puppet language string from the command line). The evaluate_string will lex and parse, and validate the resulting AST before evaluating it. The result is returned. And we are back at:

result = Puppet::Pal.in_tmp_environment('pal_env',

We now have the result, and the script ends with:

puts result

Which prints the result (the output “2” in the example above).

Getting a catalog in JSON

Now, a slightly more elaborate example where we want the Catalog that is built as a side effect of evaluating Puppet Language logic. We will now use the catalog compiler instead of the script compiler and we want the built Catalog in JSON as a result:

require 'puppet_pal'
Puppet.initialize_settings
result = Puppet::Pal.in_tmp_environment('pal_env', modulepath: [], facts: {}) do |pal|
  pal.with_catalog_compiler do |c|
    c.evaluate_string(ARGV[0])
    c.compile_additions # eval lazy constructs and validate again
    c.with_json_encoding { |encoder| encoder.encode }
  end
end
puts result

As you can see this has the same structure. Here are the details for the differences:

pal.with_catalog_compiler do |c|

Here we use with_catalog_compiler instead of with_script_compiler since we want a catalog to be built. The next line is the same - it evaluates the argument string.

c.compile_additions # eval lazy constructs and validate again

Then we call compile_additions to make PAL evaluate all lazy constructs and expected subsequent side effects to the catalog that were introduced by the call to evaluate_string. For example, if the evaluated logic declares a user defined type, that resource would not be evaluated unless compile_additions was called.

As you will see later there are other ways to specify the puppet logic “the code” to evaluate that does not require compile_additions to be called. It is only required when evaluating extra snippets of logic like in this example.

What actually happens in the example is that when the string is evaluated there is already an almost empty catalog already compiled, and compile_additions integrates the side effects of the just evaluated string into the catalog.

When calling compile_additions any future references to resources not yet in the catalog would raise an error as compile_additions also validates the result for dangling resource references.

c.with_json_encoding { |encoder| encoder.encode }

This gets a “json encoder” for the catalog. This encoder’s encode will produce the desired JSON representation of the catalog. By default the result is a pretty printed JSON string. Since this is the last thing in the block, that string becomes the result, and it is assigned to result. At the very end this is output to stdout with puts.

So, when we try this out on the command line:

bundle exec ruby to_catalog.rb 'notify { "awesome": }'

We get this output:

{
  "tags": [
    "settings"
  ],
  "name": "example.com",
  "version": 1539340088,
  "code_id": null,
  "catalog_uuid": "7d80fa68-05eb-4684-93e2-6f61529b7571",
  "catalog_format": 1,
  "environment": "production",
  "resources": [
    {
      "type": "Stage",
      "title": "main",
      "tags": [
        "stage",
        "class"
      ],
      "exported": false,
      "parameters": {
        "name": "main"
      }
    },
    {
      "type": "Class",
      "title": "Settings",
      "tags": [
        "class",
        "settings"
      ],
      "exported": false
    },
    {
      "type": "Class",
      "title": "main",
      "tags": [
        "class"
      ],
      "exported": false,
      "parameters": {
        "name": "main"
      }
    },
    {
      "type": "Notify",
      "title": "awesome",
      "tags": [
        "notify",
        "awesome",
        "class"
      ],
      "line": 1,
      "exported": false
    }
  ],
  "edges": [
    {
      "source": "Stage[main]",
      "target": "Class[Settings]"
    },
    {
      "source": "Stage[main]",
      "target": "Class[main]"
    },
    {
      "source": "Class[main]",
      "target": "Notify[awesome]"
    }
  ],
  "classes": [
    "settings"
  ]
}

Variations on “environment”

The examples used with_tmp_environment but there are other options to specify the environment to use.

Using a tmp environment

The with_tmp_environment takes an environment name (required) and the following optional named arguments:

  • String env_name – a name to use for the temporary environment - this only shows up in errors
  • Array[String] modulepath – an array of directory paths containing Puppet modules, may be empty, defaults to empty array
  • [Hash] settings_hash a hash of settings – currently not used, defaults to empty hash
  • [Hash] facts – map of fact name to fact value - if not given will initialize the facts (which is a slow operation)
  • [Hash] variables – optional map of fully qualified variable name to value

It returns:

  • Any – returns what the given block returns

It yields:

  • Puppet::Pal pal – a context that responds to Puppet::Pal methods

Sadly, the settings part did not get done. In the future this will be how settings are fed into PAL instead of requiring a call to Puppet.initialize_settings.

It should be quite clear what the purpose of the options are. One note though; the variables allows setting any fully qualified variable in any scope. This can be used to test a snippet that has references to variables that would be set by included classes when used in a real compilation - i.e. there is nothing stopping you from passing in {'apache::port' => 666} and thus allowing the tested logic to reference $apache::port without having a complete apache class declared in the catalog. (Naturally: also including the class would result in errors as the variable would already be set).

Using a named, real environment

The alternative to using a tmp environment is to use an existing configured environment on disk that is found on the environment path.

The name of an environment (env_name) is always given. The location of that environment on disk is then either constructed by:

  • searching a given envpath where name is a child of a directory on that path, or…
  • it is the directory given in env_dir (which must exist).
  • (The env_dir and envpath options are mutually exclusive.)

The with_environment takes an environment name (required) which must be an existing environment on disk, and the following optional named arguments:

  • modulepath Array[String] – an array of directory paths containing Puppet
    modules, overrides the modulepath of an existing env. Defaults to
    {env_dir}/modules if env_dir is given,
  • pre_modulepath Array[String] – like modulepath, but is prepended to the modulepath
  • post_modulepath Array[String] – like modulepath, but is appended to the modulepath
  • settings_hash Hash – a hash of settings - currently not used for anything, defaults to empty hash
  • env_dir String – a reference to a directory being the named environment (mutually exclusive with envpath)
  • envpath String – a path of directories in which there are environments to search for env_name (mutually exclusive with env_dir). Should be a single directory, or several directories separated with platform specific File::PATH_SEPARATOR character.
  • facts Hash – optional map of fact name to fact value - if not given will initialize the facts (which is a slow operation).
  • variables Hash – optional map of fully qualified variable name to value

Returns:

  • Any – returns what the given block returns

Yields:

  • Puppet::Pal pal – a context that responds to Puppet::Pal methods

In practice:

  • either:
    • use an environment name and let PAL search the envpath
    • or give an environment directory that does not have to be on an environment path
  • and either:
    • specify the module path
    • or use the default module path (defined by the environment, or is the ./modules directory given as the env_dir)
    • and then use one of:
      • pre_modulepath to push additional modules first on the path
      • post_modulepath to push additional modules last on the path

Inside the PAL context (advanced)

I included this for those that have some familiarity with the internals of Puppet - you can safely skip this section…

Before PAL calls the block given to in_tmp_environment or in_environment it will set values in the global Puppet context like this:

environments: environments, # The env being used is the only one...
pal_env: env, # provide as convenience
pal_current_node: node, # to allow it to be picked up instead of created
pal_variables: variables, # common set of variables across several inner contexts
pal_facts: facts # common set of facts across several inner contexts (or nil)

Thus Puppet.lookup() (not to be confused with hiera lookup) can get those values when needed.

The keys in the context are part of the PAL API, but the values are not. The values for environments and env are not part of PAL as they expose classes in Puppet that may or may not be strictly specified as API.

The pal_current_node allows code to override the automatically created Node object with a custom created one by pushing this onto a context wrapping further operations. This cannot be done from outside PAL as a Node needs some of the other components when it is created. (Not perfect, but this is how far we got on this).

The API of the Compilers

The ScriptCompiler and CatalogCompiler share many methods in an abstract Compiler class. The script compiler is created with a call to PAL’s with_script_compiler, and the catalog compiler with a call to with_catalog_compiler. Both methods take exactly the same (optional) named arguments:

  • configured_by_env Boolean – if the environment in use (as determined by the call to PAL) determines manifest/code to evaluate. Defaults to false.
  • manifest_file String – the path to a .pp file to use as the main manifest.
  • code_string String – a string with puppet logic.
  • facts Hash[String, Any] – a Hash of facts. If not given PAL will run facter to get the facts for localhost.
  • variables Hash[String, Any] – a Hash of variable names (can be fully qualified) to values that will be set before any evaluation takes place.

The parameters code_string, manifest_file and configured_by_env are mutually exclusive.

Here is a look at what you can do with both of the compilers:

Call a function

call_function(function_name, *args, &block)

Calls a function given by name with arguments specified in an Array, and optionally accepts a code block.

  • function_name String – the name of the function to call.
  • *args Any – the arguments to the function.
  • block Proc – an optional callable block that is given to the called function.

Returns:

  • Any– what the called function returns.

Get a function signature

function_signature(function_name)

Returns a Puppet::Pal::FunctionSignature object or nil if function is not found. The returned FunctionSignature has information about all overloaded signatures of the function.

# returns true if 'myfunc' is callable with
# three integer arguments 1, 2, 3
compiler.function_signature('myfunc').callable_with?([1,2,3])

List available functions

list_functions(filter_regex = nil, error_collector = nil)

Returns an array of TypedName objects (see below) for all functions, optionally filtered by a regular expression. The returned array has more information than just the leaf name - the typical thing is to just get the name as showing the following example.

Errors that occur during function discovery will either be logged as warnings or added to the optional error_collector array. When provided, it will be appended with Puppet::DataTypes::Error instances describing each error in detail and no warnings will be logged.

# getting the names of all functions
puts compiler.list_functions.map {|tn| tn.name }
  • filter_regex Regexp – an optional regexp that filters based on name (matching names are included in the result).
  • error_collector Array[Puppet::DataTypes::Error] – an optional array that will get errors during load appended.

Returns

  • Array[Puppet::Pops::Loader::TypedName>] – an array of typed names.

A TypedName is as the name suggests a combination of name and data type.
A typed name has methods to get name, type - which are self expanatory.
It also has methods name_parts which is an array of each part of a
qualified / name-spaced name, and name_authority which is a reference to
what defined this type, and finally compound_name which is a unique identifier.
Instances of TypedName are suitable as keys in hashes and is used extensively by the loaders.

Evaluate a string

evaluate_string(puppet_code, source_file = nil)

Evaluates a string of Puppet Language code in top scope. A “source_file” reference to a source can be given - if not an actual file name, by convention the name should be bracketed with < > to indicate it is something symbolic; for example <commandline> if the string was given on the command line.

If the given puppet_code is nil or an empty string, nil is returned, otherwise the result of evaluating the puppet language string.

The given string must form a complete and valid expression/statement as an error is raised otherwise. That is, it is not possible to divide a compound expression by line and evaluate each line individually.

Parameters:

  • puppet_code Optional[String] – the puppet language code to evaluate, must be a complete expression/statement.
  • source_file Optional[String] – an optional reference to a source (a file or symbolic name/location).

Returns

  • Any – what the puppet_code evaluates to.

Evaluate a file

evaluate_file(file)

Evaluates a Puppet Language file in top scope. The file must exist and contain valid Puppet Language code or an error is raised.

Parameters:

  • file String – an absolute path to a file with puppet language code, must exist.

Returns:

  • Any – what the last evaluated expression in the file evaluated to.

Evaluate AST

evaluate(ast)

Evaluates an AST obtained from parse_string or parse_file in topscope. If the ast is a Puppet::Pops::Model::Program (what is returned from the parse methods), any definitions in the program (that is, any function, plan, etc.) that is defined is available for use.

Parameter:

  • ast Puppet::Pops::Model::PopsObject – typically the returned Program from the parse methods, but can be any Expression if you want to evaluate only part of the returned AST.

Returns:

  • Any – whatever the ast evaluates to.

AST stands for Abstract Syntax Tree - which is the result from parsing. The Puppet AST is described using Puppet Pcore and it is thus a model – a term often used interchangeably with AST when it is clear from context that the only model it could refer to is a particular AST. See Introduction to Modeling for more about modeling.

Evaluate a literal value

evaluate_literal(ast)

Produces a literal value if the AST obtained from parse_string or parse_file does not require any actual evaluation. Raises an error if the given ast does not represent a literal value.

This method is useful if it is expected that the user gives a literal value in puppet form and thus that the AST represents literal values such as string, integer, float, boolean, regexp, array, hash, etc. This for example from having read a string representation of an array or hash from the command line or as values in some file.

Parameters:

  • ast Puppet::Pops::Model::PopsObject – typically the returned Program from the parse methods, but can be any Expression.

Returns:

  • Any – whatever literal value the ast evaluates to.

Parse a String

parse_string(code_string, source_file = nil)

Parses and validates a puppet language string and returns an instance of Puppet::Pops::Model::Program on success (i.e. AST). If the content is not valid an error is raised.

Parameters:

  • code_string String – a puppet language string to parse and validate.

  • source_file Optional[String] – an optional reference to a file or other location in angled brackets, only used for information.

Returns:

  • Puppet::Pops::Model::Program – returns a Program instance on success

Parse a File

parse_file(file)

Parses and validates a puppet language file and returns an instance of Puppet::Pops::Model::Program on success. If the content is not valid an error is raised.

Parameters:

  • file String – a file with puppet language content to parse and validate.

Returns:

  • Puppet::Pops::Model::Program – returns a Program instance on success.

Parse a data type

type(type_string)

Parses a puppet data type given in string format and returns that type, or raises an error. A type is needed in calls to new to create an instance of the data type, or to perform type checking of values - typically using type.instance?(obj) to check if obj is an instance of the type.

# Verify if obj is an instance of a data type
pal.type('Enum[red, blue]').instance?("blue") # returns true

Parameters:

  • type_string String – a puppet language data type.

Returns:

  • Type – the data type

Create a data type

create(data_type, *arguments) – Creates a new instance of a given data type.

Parameters:

  • data_type Variant[String, Type] – the data type as a data type or in String form.
  • *arguments Any – one or more arguments to the called new function.

Returns:

  • Any – an instance of the given data type, or raises an error if it was not possible to parse data type or create an instance.
# Create an instance of a data type (using an already created type)
t = pal.type('Car')
pal.create(t, 'color' => 'black', 'make' => 't-ford')

# same thing, but type is given in String form
pal.create('Car', 'color' => 'black', 'make' => 't-ford')

Check if this is a catalog compiler

has_catalog? – Returns true if this is a compiler that compiles a catalog.

Script Compiler

The Script Compiler has these additional methods:

Get the signature of a plan by name

plan_signature(plan_name)

Parameters:

  • plan_name String – the name of the plan to get the signature of.

Returns:

  • Optional[Puppet::Pal::PlanSignature] – returns a PlanSignature, or nil if plan is not found.

Get a list of available plans with optional filtering on name

list_plans(filter_regex = nil, error_collector = nil)

Returns an array of TypedName objects for all plans, optionally filtered by a regular expression. The returned array has more information than just the leaf name although the typical thing is to just get the name as shown in the following example.

Errors that occur during plan discovery will either be logged as warnings or collected in the optional error_collector array. When provided, it will get Puppet::DataTypes::Error instances appended (i.e. the data type known as Error in the Puppet language) describing each error in detail and no warnings will be logged.

# Example: getting the names of all plans
puts compiler.list_plans.map {|tn| tn.name }

Parameters:

  • filter_regex Regexp – an optional regexp that filters based on name (matching names are included in the result).
  • error_collector Array[Error] – an optional array that will get errors appended during load.

Returns:

  • Array[Puppet::Pops::Loader::TypedName] – an array of typed names.

Get the signature of a task by name

task_signature(task_name)

Returns the callable signature of the given task (that is, the arguments it accepts, and the data type it returns).

Parameters:

  • task_name String – the name of the task to get the signature of.

Returns:

  • Optional[Puppet::Pal::TaskSignature] – returns a TaskSignature, or nil if task is not found.

Get a list of available tasks with optional filtering on name

list_tasks(filter_regex = nil, error_collector = nil)

Returns an array of TypedName objects for all tasks, optionally filtered by a regular expression. The returned array has more information than just the leaf name - the typical thing is to just get the name as shown in the following example:

# Example getting the names of all tasks
compiler.list_tasks.map {|tn| tn.name }

Errors that occur during task discovery will either be logged as warnings or appended to the optional error_collector array. When provided, it will get Error instances appended describing each error in detail and no warnings will be logged.

Parameters:

  • filter_regex Regexp – an optional regexp that filters based on name (matching names are included in the result).
  • error_collector Array[Error] – an optional array that will get errors appended during load.

Returns:

  • Array[Puppet::Pops::Loader::TypedName] – an array of typed names.

Catalog Compiler methods

Produce a Catalog in JSON

with_json_encoding(pretty: true, exclude_virtual: true)

Calls a block of code and yields a configured JsonCatalogEncoder to the block.

Parameters:

  • pretty Boolean – if the resulting Json should be pretty printed or not. Defaults to true.
  • exclude_virtual Boolean – if the resulting catalog should have virtual resources filtered out or not. The default is true.
# Example Get resulting catalog as pretty printed Json
Puppet::Pal.in_environment() do |pal|
  pal.with_catalog_compiler() do |compiler|
    compiler.with_json_encoding {| encoder | encoder.encode }
  end
end

Compiler additions to the catalog - handle lazy evaluation

compile_additions()

Compiles the result of additional evaluation taking place in a PAL catalog compilation. This will evaluate all lazy constructs until all have been evaluated, and then validate the resulting catalog.

This should be called when having evaluated strings or files of puppet logic after the initial compilation took place by giving PAL a manifest or code-string.

This method should be called when a series of evaluations is thought to have reached a valid state (at a point where there should be no relationships to resources that does not exist).

As an alternative the methods evaluate_additions can be called without any requirements on consistency and then calling validate at the end. (Both can be called multiple times).

Note: A Catalog compilation needs to start by creating a catalog and declaring some initial things. The standard compilation then continues to evaluate either what was given as the main manifest, or as a string of Puppet Language code (internally this is referred to as “the initial import”). Normally this defines the entire compilation as the main manifest + definitions from ENC includes all of the wanted classes (and then what they include etc.) via autoloading. When using PAL you may have a use case where you want to do that first, and then continue with additions, or you may want the initial compilation to be as small as possible and build the catalog from a series of calls you make to PAL. Again depending on use case, you may require that what you include in the catalog has been fully evaluated before taking the next step, or you can simply finalize your catalog building at the very end with a compile_additions.

Validating the catalog (after additions)

validate()

Validates the state of the catalog (without performing evaluation of any elements requiring lazy evaluation. (Can be called multiple times). Call this if you want to validate the catalog’s state after having done one or more calls to evaluate_additions(). Will raise an error if catalog is not valid.

Evaluate additions, but do not validate

evaluate_additions()

Evaluates all lazy constructs that were produced as a side effect of evaluating puppet logic. Can be called multiple times. Call this instead of compile_additions() if you want to hold off with the validation of the catalog’s state. May raise an error from the evaluation.

Summary

Oh my, that turned out to be one long post! Sorry about that - simply a lot to cover…
There is probably a lot more you would like to know about how you can use this, and especially if you are interested in writing tooling around language stuff. While I have written about Language internals and modeling in past blog posts, I will probably come back with examples of useful utilities that can easily be written using PAL. Ping me in comments below, or hit me up on one of the #puppet channels on Slack if there is something you would like to see.