The cerberus-cpp documentation

Getting Started

Installation

Cerberus-cpp is header-only, so it should be fairly easy to get up and running. It requires the following software to be available:

  • A C++14-compliant C++ compiler

  • CMake >= 3.11

  • The yaml-cpp library, version >= 0.6

  • git

The easiest way to get yaml-cpp is:

  • On Debian, Ubuntu: sudo apt install libyaml-cpp-dev

  • On MacOS: brew install yaml-cpp

With these prerequisites met, cerberus-cpp is installed just as any other CMake project is, e.g.

git clone https://github.com/dokempf/cerberus-cpp.git
cd cerberus-cpp
mkdir build
cd build
cmake ..
make
make install

Usage example

This is the most basic usage example that validates a given document against a schema.

#include<cerberus-cpp/validator.hh>
#include<yaml-cpp/yaml.h>

#include<iostream>

int main()
{
  YAML::Node schema = YAML::Load(
    "answer:          \n"
    "  type: integer  \n"
    "  default: 42    \n"
    "question:        \n"
    "  type: string   \n"
  );

  YAML::Node document;
  document["question"] = "What is 6x9?";

  cerberus::Validator validator(schema);
  if (validator.validate(document))
  {
    YAML::Node doc = validator.getDocument();
    std::cout << doc["question"].as<std::string>() << " " << doc["answer"].as<int>() << std::endl;
  }
  else
    std::cerr << validator << std::endl;
  return 0;
}

As you can see, both the schema and the document are defined using the YAML::Node data structure. In order to work with cerberus-cpp, it is important to be familiar with the basic usage of yaml-cpp as e.g. described in the yaml-cpp documentation. In the above examples, we are loading the schema from inline YAML with YAML::Load, while we programmatically construct the document. Often, your document will of course come from user input e.g. by loading it from disk with YAML::LoadFile.

Basic Usage

Validation

The most important component provided by cerberus-cpp is the cerberus::Validator class. An instance of this validator is given a schema and a document. The document is then validated against this schema using the validate method - returning a boolean value indicating success. If the validation process fails, the errors can be written to a stream using the validator’s printErrors method, or more conveniently by passing the validator itself into the stream:

cerberus::Validator validator;
if(!validator.validate(document, schema))
  std::cerr << validator << std::endl;

The schema and the document are both provided as instances of YAML::Node. Using the same data structure for schemas and documents is considered a feature of cerberus-cpp. A tutorial on how to construct these documents from YAML files, from inline strings or programmatically can be found in the yaml-cpp documentation. Both schemas and documents are always expected to be mappings.

The validator class has the following configurable validation policies:

  • whether or not unknown fields in the document (fields that do not appear in the schema) should make the validation process fail (can be toggled using the setAllowUnknown(value) method). The default is false.

  • whether or not all fields are considered to be required fields (can be toggled using the setRequireAll(value) method). The default is false.

In the following, we will provide a summary of the available validation rules in cerberus-cpp. As cerberus-cpp aims for compatibility with Python package cerberus, you can read more on the semantics of these rules in the Cerberus Validation Rule Documentation. For inconsistencies between the Python package and cerberus-cpp see Compatibility with cerberus. This is the list of implemented validation rules in order of relevance for most applications:

  • type specifies the expected type for this field. Possible values are integer, float, string, boolean, list, dict or any identifier of a custom type (see Custom Validation Rules).

  • required forces the existence of the field in the document.

  • schema specifies a schema for a submapping or a sublist

  • allowed specifies the list of allowed values for the field.

  • forbidden in contrast specifies a list of forbidden values for the field

  • min and max specify a minimum and maximum for the value and require type to be set to something that allows comparison

  • regex matches the field’s value against the given regular expression

  • keysrules, valuesrules allow rules only for keys resp. values of submapping.

  • minlength and maxlength constrain the allowed length of list or mapping

  • items specifies a list of schemas that the entries of a list need to fulfill

  • contains forces the existence of a value within a list

  • dependencies forces the existence of another field, if the given field is present

  • excludes forces the absence of another field, if the given field is present

  • nullable specifies whether the field accepts a null value

  • allow_unknown changes the validators policy regarding unknown values e.g. for a submapping

  • require_all changes the validators policy regarding requiring all fields e.g. for a submapping

Normalization

Cerberus-cpp can not only perform validation, but also modify the document according to normalization rules. Cerberus-cpp does not do this in-place. Instead you need to use the validator’s getDocument() method to access the normalized document.

  YAML::Node schema = YAML::Load(
    "name:                  \n"
    "  type: string         \n"
    "  default: John Doe    \n"
    "  rename: user         \n"
  );

  cerberus::Validator validator(schema);
  if (validator.validate(YAML::Node()))
    std::cout << "The normalized document: " << validator.getDocument() << std::endl;
  else
    std::cerr << validator << std::endl;

The normalized output document of above example would be user: John Doe.

Additionally, the validator class has a configurable policy whether or not unknown fields should be purged from the normalized document (can be toggled using the setPurgeUnknown(value) method). The default is false.

This is a list of normalization rules available in cerberus-cpp:

  • default provides the field’s default value

  • rename renames a given field in the normalized document

  • purge_unknown changes the validators policy regarding purging unknown fields e.g. for a submapping

Advanced Usage

This section is only relevant to users who seek to enhance the capabilities of cerberus-cpp by e.g. providing custom rules and types. All customizations described in this documentation operate on instances of cerberus::Validator. You may also apply these in the constructor of a derived class.

Custom Validation Rules

Custom validation rules can be registered on instances of cerberus::Validator. This is an example that registers a custom rule oddity that only accepts odd integer values:

  cerberus::Validator validator;
  validator.registerRule(
    YAML::Load(
      "oddity:            \n"
      "  type: boolean    \n"
      "  dependencies:    \n"
      "    type: integer  \n"
    ),
    [](auto& v) {
      if(!v.getDocument().IsDefined())
        return;

      if(v.getDocument().template as<int>() % 2 != v.getSchema().template as<bool>())
        v.raiseError("oddity-Rule violated!");
    }
  );

The first argument here defines a schema that is used to validate the rule in the user-provided schemas (a meta-schema so to say). This on one hand defines the name of the rule (here: oddity) and on the other hand rules out misuse (like e.g. providing oddity: 42, where only bool arguments are allowed). You can use all available schema rules, though typically only the name is required. Here, we additionally enforce the argument to be of the integer type by adding a dependencies rule.

The second argument is expected to be a templated callable (here: a generic lambda) that implements the rule. The only argument is typically a reference to an instance of the ValidationRuleInterface API, although the type is accepted as a template parameter to integrate well with custom derived validator classes. In our example, only the most relevant methods of the ValidationRuleInterface API are used:

  • getDocument() gives the YAML::Node that describes the document snippet that is currently validated.

  • getSchema() provides the YAML::Node that describes the schema snippet for this validation.

  • raiseError() reports a validation error

Some rules require to be applied before or after certain other rules in order to implement the correct semantics. Cerberus-cpp gives control over this by providing a number of hooks, when rules execute. The hook at which a custom rule executes can be controlled by passing a third argument. The enumeration cerberus::RulePriority lists the possible values:

Warning

doxygenenum: Cannot find enum “cerberus::RulePriority” in doxygen xml output for project “cerberus-cpp” from directory: /home/docs/checkouts/readthedocs.org/user_builds/cerberus-cpp/checkouts/stable/doc/build-cmake/doc/xml

Custom Types

By default, cerberus-cpp supports integers, floating point types, strings, boolean values, as well a sequences and mappings. You can however provide custom types as well. We illustrate this by implementing a simple date type that only stores a year. While this of course could be achieved with an integer as well, we use this to illustrate how a custom class is validated.

struct SimpleDate {
  int year;

  bool operator==(const SimpleDate& other) const
  {
    return year == other.year;
  }

  bool operator<(const SimpleDate& other) const
  {
    return year < other.year;
  }
};

This simple implementation at the same time documents the minimum requirement on the interface of eligible C++ types: operator== and operator< need to be defined. On top of that yaml-cpp s (de)serialization needs to be implemented for this type according to their Guide :

namespace YAML {
  template<>
  struct convert<SimpleDate>
  {
    static Node encode(const SimpleDate& rhs)
    {
      return Node(rhs.year);
    }

    static bool decode(const Node& node, SimpleDate& rhs)
    {
      return convert<int>::decode(node, rhs.year);
    }
  };
}

Registration of the new type with a given validator is then as simple as this:

  cerberus::Validator validator;
  validator.registerType<SimpleDate>("date");

Now, this type can be referenced with a rule type: date and validation will fail if the YAML deserialization of the input fails.

Schema Registration

If you intend to reuse schemas a lot, you can also have a validator instance store them by using the registerSchema method. Later on, the schema can either be retrieved by using the special signature of validate that specifies the schema with a string or by specifying the string as the value for a schema validation rule.

  cerberus::Validator validator;
  validator.registerSchema("user", schema);
  if (!validator.validate(document, "user"))
    std::cerr << validator << std::endl;

Compatibility with cerberus

Cerberus-cpp tries to be compatible with the Python package cerberus. In reality, some inconsistencies exist. If you have a use case where cerberus-cpp differs from cerberus that cannot be explained by one of the following reasons please open an issue attaching YAML files with schema and data:

  • Several validation rules require the type rule to be present as well. These are the rules that require equality or comparison to be implemented e.g.:

    • min and max

    • allowed

    Your safest bet is to always define the type rule.

  • The allowed rule does not validate iterables, because that would lead to conflicting semantics of the type field.

  • The contains rule has currently no access to the item type information. It currently assumes string values, although it could be changed to inspect a given schema rule for type information - I am not sure yet I want to go that route.

  • Some of the types built into cerberus are hard to implement in C++ and are therefore omitted from the library. If you need these, register a custom type and choose the correct C++ data structure yourself. These are:

    • date and datetime: With C++ lacking standardization of these types and completely missing a parser for such types, it would be unwise to implement these.

    • binary: There is no sensible C++ equivalent of a Python bytes object, so it seems wise to skip on this one.

    • set: This type seems to be inaccessible when starting from serialized YAML. I am currently not planning to add this.

  • The regex rule is not guaranteed to accept exactly the same dialect of regular expressions as in the Python package. Currently, the C++ implementation uses plain std::regex. Maybe this can be fixed by picking the correct grammar for std::regex.

  • The following rules are currently considered a won’t fix for one reason or the other:

    • allof, anyof, noneof, oneof: These rules are a major headache to implement. Yet, the cerberus documentation actively warns users that the need for such rule hints at a design flaw. Also, these rules disable normalization. Currently, I would rather opt to not doing these rules at all.

    • readonly: Just from reading the documentation I do not get both the semantics or the use case for this rule. So, I am omitting it until somebody urges me to implement it.

    • check_with: In the context of cerberus-cpp, I fail to see how this rule differs from applying a custom rule, which you should do in that case.

    • coerce: Similarly to check_with, a a custom coercer is not really different from a custom normalization rule. Might add a coerce rule later for compatibility with Python cerberus later though.

API documentation

Cerberus-cpp has two core APIs:

  • The Validator API is the end user interface that is used when validating data against schemas.

  • The ValidationRuleInterface API is the interface used when developing custom rules. It gives access to the internal state of the validation process that is necessary to implement custom validation logic.

If you do not intend to implement custom rules, there is no need to understand the latter.

Validator API

Warning

doxygenclass: Cannot find class “cerberus::Validator” in doxygen xml output for project “cerberus-cpp” from directory: /home/docs/checkouts/readthedocs.org/user_builds/cerberus-cpp/checkouts/stable/doc/build-cmake/doc/xml

ValidationRuleInterface API

Warning

doxygenclass: Cannot find class “cerberus::Validator::ValidationRuleInterface” in doxygen xml output for project “cerberus-cpp” from directory: /home/docs/checkouts/readthedocs.org/user_builds/cerberus-cpp/checkouts/stable/doc/build-cmake/doc/xml

Contributing

Cerberus-cpp welcomes contributions. Before considering to contribute, please read the following guidelines:

  • If you have a use case that does not work in cerberus-cpp, but it does work in the Python package cerberus, please open a bug report and attach YAML files with a schema and some data. Ideally, the file follows the syntax that cerberus-cpp tests use (see the test/testdata.yml file).

  • Bear in mind that cerberus-cpp tries to stay compatible with the Python package cerberus. Pull requests that increase incompatibilities will not be considered, while pull requests that remove these are highly welcome.

  • If you are implementing a custom rule and you need to extend the ValidationRuleInterface API, please provide a description of your use case, so that we can better discuss the interface design.

  • When opening a pull request against the cerberus-cpp repository, please add your name to COPYING.md as well.