Advanced Topics - Network Services Orchestrator (NSO) v6.2

Locks

This section will explain the different locks that exist in NSO and how they interact. It is important to understand the architecture of NSO with its management backplane, and the transaction state machine as described in Package Development in Development Guide to be able to understand how the different locks fit into the picture.

Global locks

The NSO management backplane keeps a lock on the datastore: running. This lock is usually referred to as the global lock and it provides a mechanism to grant exclusive access to the datastore.

The global is the only lock that can explicitly be taken through a northbound agent, for example by the NETCONF <lock> operation, or by calling Maapi.lock().

A global lock can be taken for the whole datastore, or it can be a partial lock (for a subset of the data model). Partial locks are exposed through NETCONF and MAAPI and are only supported for operations toward the running datastore.

An agent can request a global lock to ensure that it has exclusive write-access. When a global lock is held by an agent it is not possible for anyone else to write to the datastore the lock guards - this is enforced by the transaction engine. A global lock on running is granted to an agent if there are no other holders of it (including partial locks), and if all data providers approve the lock request. Each data provider (CDB and/or external data providers) will have its lock() callback invoked to get a chance to refuse or accept the lock. The output of ncs --status includes locking status. For each user session locks (if any) per datastore is listed.

Transaction locks

A northbound agent starts a user session towards NSO's management backplane. Each user session can then start multiple transactions. A transaction is either read/write or read-only.

The transaction engine has its internal locks towards the running datastore. These transaction locks exists to serialize configuration updates towards the datastore and are separate from the global locks.

As a northbound agent wants to update the running datastore with a new configuration it will implicitly grab and release the transactional lock. The transaction engine takes care of managing the locks, as it moves through the transaction state machine and there is no API that exposes the transactional locks to the northbound agents.

When the transaction engine wants to take a lock for a transaction (for example when entering the validate state) it first checks that no other transaction has the lock. Then it checks that no user session has a global lock on that datastore. Finally each data provider is invoked by its transLock() callback.

Northbound agents and global locks

In contrast to the implicit transactional locks, some northbound agents expose explicit access to the global locks. This is done a bit differently by each agent.

The management API exposes the global locks by providing Maapi.lock() and Maapi.unlock() methods (and the corresponding Maapi.lockPartial() Maapi.unlockPartial() for partial locking). Once a user session is established (or attached to) these functions can be called.

In the CLI the global locks are taken when entering different configure modes as follows:

config exclusive: The running datastore global lock will be taken.
config terminal: Does not grab any locks

The global lock is then kept by the CLI until the configure mode is exited.

The Web UI behaves in the same way as the CLI (it presents three edit tabs called "Edit private", "Edit exclusive", and which corresponds to the CLI modes described above).

The NETCONF agent translates the <lock> operation into a request for the global lock for the requested datastore. Partial locks are also exposed through the partial-lock rpc.

External data providers

Implementing the lock() and unlock() callbacks is not required of an external data provider. NSO will never try to initiate the transLock() state transition (see the transaction state diagram in Package Development in Development Guide) towards a data provider while a global lock is taken - so the reason for a data provider to implement the locking callbacks is if someone else can write (or lock for example to take a backup) to the data providers database.

CDB

CDB ignores the lock() and unlock() callbacks (since the data-provider interface is the only write interface towards it).

CDB has its own internal locks on the database. The running datastore has a single write and multiple read locks. It is not possible to grab the write-lock on a datastore while there are active read-locks on it. The locks in CDB exists to make sure that a reader always gets a consistent view of the data (in particular it becomes very confusing if another user is able to delete configuration nodes in between calls to getNext() on YANG list entries).

During a transaction transLock() takes a CDB read-lock towards the transactions datastore and writeStart() tries to release the read-lock and grab the write-lock instead.

A CDB external reader client implicitly takes a CDB read-lock between Cdb.startSession() and Cdb.endSession() This means that while an CDB client is reading, a transaction can not pass through writeStart() (and conversely a CDB reader can not start while a transaction is in between writeStart() and commit() or abort()).

The Operational store in CDB does not have any locks. NSO's transaction engine can only read from it, and the CDB client writes are atomic per write operation.

Lock impact on user sessions

When a session tries to modify a data store that is locked in some way, it will fail. For example, the CLI might print:

admin@ncs(config)# commit
Aborted: the configuration database is locked

Since some of the locks are short lived (such as a CDB read lock), NSO is by default configured to retry the failing operation for a short period of time. If the data store still is locked after this time, the operation fails.

To configure this, set /ncs-config/commit-retry-timeout in ncs.conf.

Compaction

CDB implements write-ahead logging to provide durability in the datastores, appending a new log for each CDB transaction to the target datastore (A.cdb for configuration, O.cdb for operational, and S.cdb for snapshot datastore). Depending on the size and number of transactions towards the system, these files will grow in size leading to increased disk utilization, longer boot times, and longer initial data synchronization time when setting up a high-availability cluster. Compaction is a mechanism used to reduce the size of the write-ahead logs to a minimum. It works by replacing an existing write-ahead log, which is composed by a number of consecutive transactions logs created in run-time, with a single transaction log representing the full current state of the datastore. From this perspective, it can be seen that a compaction acts similar to a write transaction towards a datastore. To ensure data integrity, write transactions towards the datastore are not permitted during the time compaction takes place.

Automatic Compaction

By default, compaction is handled automatically by the CDB. After each transaction, CDB evaluates whether compaction is required for the affected datastore.

This is done by examining the number of added nodes as well as the file size changes since the last performed compaction. The thresholds used can be modified in the ncs.conf file by configuring the /ncs-config/compaction/file-size-relative, /ncs-config/compaction/file-size-absolute, and /ncs-config/compaction/num-node-relative settings. It is also possible to automatically trigger compaction after a set number of transactions by setting the /ncs-config/compaction/num-transaction property.

Manual Compaction

Compaction may require a significant amount of time, during which write transactions cannot be performed. In certain use-cases, it may be preferable to disable automatic compaction by CDB and instead trigger compaction manually according to the specific needs. If doing so, it is highly recommended to have another automated system in place.

CDB CAPI provides a set of functions which may be used to create an external mechanism for compaction. See cdb_initiate_journal_compaction(), cdb_initiate_journal_dbfile_compaction(), and cdb_get_compaction_info() in confd_lib_cdb(3) in Manual Pages .

Automation of compaction can be done by using a scheduling mechanism such as CRON, or by using the NCS scheduler. See Scheduler in Development Guide. for more information

By default, CDB may perform compaction during its boot process. This may be disabled if required, by starting NSO with the flag --disable-compaction-on-start.

Delayed Compaction

In the configuration datastore, compaction is by default delayed by 5 seconds when the threshold is reached in order to prevent any upcoming write transaction from being blocked. If the system is idle during these 5 seconds, meaning that there is no new transaction, the compaction will initiate. Otherwise, compaction is delayed by another 5 seconds. The delay time can be configured in ncs.conf by setting the /ncs-config/compaction/delayed-compaction-timeout property.

IPC ports

Client libraries connect to NSO using TCP. We tell NSO which address to use for these connections through the /ncs-config/ncs-ipc-address/ip (default value 127.0.0.1) and /ncs-config/ncs-ipc-address/port (default value 4569) elements in ncs.conf. It is possible to change these values, but it requires a number of steps to also configure the clients. Also there are security implications, see the section called “Security issues” below.

Some clients read the environment variables NCS_IPC_ADDR and NCS_IPC_PORT to determine if something other than the default is to be used, others might need to be recompiled. This is a list of clients which communicate with NSO, and what needs to be done when ncs-ipc-address is changed.

Client	Changes required
Remote commands via the ncs command	Remote commands, such as ncs --reload, check the environment variables NCS_IPC_ADDR and NCS_IPC_PORT.
CDB and MAAPI clients	The address supplied to `Cdb.connect()` and `Maapi.connect()` must be changed.
Data provider API clients	The address supplied to `Dp` constructor socket must be changed.
ncs_cli	The Command Line Interface (CLI) client, ncs_cli, checks the environment variables `NCS_IPC_ADDR` and `NCS_IPC_PORT`. Alternatively the port can be provided on the command line (using the -P option).
Notification API clients	The new address must be supplied to the socket for the `Nofif` constructor.

To run more than one instance of NSO on the same host (which can be useful in development scenarios) each instance needs its own IPC port. For each instance set /ncs-config/ncs-ipc-address/port in ncs.conf to something different.

There are two more instances of ports that will have to be modified, NETCONF and CLI over SSH. The netconf (SSH and TCP) ports that NSO listens to by default are 2022 and 2023 respectively. Modify /ncs-config/netconf/transport/ssh and /ncs-config/netconf/transport/tcp, either by disabling them or changing the ports they listen to. The CLI over SSH by default listens to 2024; modify /ncs-config/cli/ssh either by disabling or changing the default port.

Restricting access to the IPC port

By default, the clients connecting to the IPC port are considered trusted, i.e. there is no authentication required, and we rely on the use of 127.0.0.1 for /ncs-config/ncs-ipc-address/ip to prevent remote access. In case this is not sufficient, it is possible to restrict the access to the IPC port by configuring an access check.

The access check is enabled by setting the ncs.conf element /ncs-config/ncs-ipc-access-check/enabled to "true", and specifying a filename for /ncs-config/ncs-ipc-access-check/filename. The file should contain a shared secret, i.e. a random character string. Clients connecting to the IPC port will then be required to prove that they have knowledge of the secret through a challenge handshake, before they are allowed access to the NSO functions provided via the IPC port.

Note

Obviously the access permissions on this file must be restricted via OS file permissions, such that it can only be read by the NSO daemon and client processes that are allowed to connect to the IPC port. E.g. if both the daemon and the clients run as root, the file can be owned by root and have only "read by owner" permission (i.e. mode 0400). Another possibility is to have a group that only the daemon and the clients belong to, set the group ID of the file to that group, and have only "read by group" permission (i.e. mode 040).

To provide the secret to the client libraries, and inform them that they need to use the access check handshake, we have to set the environment variable NCS_IPC_ACCESS_FILE to the full pathname of the file containing the secret. This is sufficient for all the clients mentioned above, i.e. there is no need to change application code to support or enable this check.

Note

The access check must be either enabled or disabled for both the daemon and the clients. E.g. if /ncs-config/ncs-ipc-access-check/enabled in ncs.conf is not set to "true", but clients are started with the environment variable NCS_IPC_ACCESS_FILE pointing to a file with a secret, the client connections will fail.

Restart strategies for service manager

The service manager executes in a Java VM outside of NSO. The NcsMux initializes a number of sockets to NSO at startup. These are Maapi sockets and data provider sockets. NSO can choose to close any of these sockets whenever NSO requests the service manager to perform a task, and that task is not finished within the stipulated timeout. If that happens, the service manager must be restarted. The timeout(s) are controlled by a several ncs.conf parameters found under /ncs-config/japi.

Security issues

NSO requires some privileges to perform certain tasks. The following tasks may, depending on the target system, require root privileges.

Binding to privileged ports. The ncs.conf configuration file specifies which port numbers NSO should bind(2) to. If any of these port numbers are lower than 1024, NSO usually requires root privileges unless the target operating system allows NSO to bind to these ports as a non-root user.
If PAM is to be used for authentication, the program installed as $NCS_DIR/lib/ncs/priv/pam/epam acts as a PAM client. Depending on the local PAM configuration, this program may require root privileges. If PAM is configured to read the local passwd file, the program must either run as root, or be setuid root. If the local PAM configuration instructs NSO to run for example pam_radius_auth, root privileges are possibly not required depending on the local PAM installation.
If the CLI is used and we want to create CLI commands that run executables, we may want to modify the permissions of the $NCS_DIR/lib/ncs/lib/core/confd/priv/cmdptywrapper program.

To be able to run an executable as root or a specific user, we need to make cmdptywrapper setuid root, i.e.:
1. # chown root cmdptywrapper
2. # chmod u+s cmdptywrapper
Failing that, all programs will be executed as the user running the ncs daemon. Consequently, if that user is root we do not have to perform the chmod operations above.

The same applies for executables run via actions, but then we may want to modify the permissions of the $NCS_DIR/lib/ncs/lib/core/confd/priv/cmdwrapper program instead:
1. # chown root cmdwrapper
2. # chmod u+s cmdwrapper

NSO can be instructed to terminate NETCONF over clear text TCP. This is useful for debugging since the NETCONF traffic can then be easily captured and analyzed. It is also useful if we want to provide some local proprietary transport mechanism which is not SSH. Clear text TCP termination is not authenticated, the clear text client simply tells NSO which user the session should run as. The idea is that authentication is already done by some external entity, such as an SSH server. If clear text TCP is enabled, it is very important that NSO binds to localhost (127.0.0.1) for these connections.

Client libraries connect to NSO. For example the CDB API is TCP based and a CDB client connects to NSO. We instruct NSO which address to use for these connections through the ncs.conf parameters /ncs-config/ncs-ipc-address/ip (default address 127.0.0.1) and /ncs-config/ncs-ipc-address/port (default port 4565).

NSO multiplexes different kinds of connections on the same socket (IP and port combination). The following programs connect on the socket:

Remote commands, such as e.g. ncs --reload
CDB clients.
External database API clients.
MAAPI, The Management Agent API clients.
The ncs_cli program

By default, all of the above are considered trusted. MAAPI clients and ncs_cli should supposedly authenticate the user before connecting to NSO whereas CDB clients and external database API clients are considered trusted and do not have to authenticate.

Thus, since the ncs-ipc-address socket allows full unauthenticated access to the system, it is important to ensure that the socket is not accessible from untrusted networks. However it is also possible to restrict access to this socket by means of an access check, see the section called “Restricting access to the IPC port”.

Running NSO as a non privileged user

A common misfeature found on UN*X operating systems is the restriction that only root can bind to ports below 1024. Many a dollar has been wasted on workarounds and often the results are security holes.

Both FreeBSD and Solaris have elegant configuration options to turn this feature off. On FreeBSD:

# sysctl net.inet.ip.portrange.reservedhigh=0

The above is best added to your /etc/sysctl.conf

Similarly on Solaris we can just configure this. Assuming we want to run NSO under a non-root user "ncs". On Solaris we can do that easily by granting the specific right to bind privileged ports below 1024 (and only that) to the "ncs" user using:

# /usr/sbin/usermod -K defaultpriv=basic,net_privaddr ncs

And check the we get what we want through:

# grep ncs /etc/user_attr
ncs::::type=normal;defaultpriv=basic,net_privaddr

Linux doesn't have anything like the above. There are a couple of options on Linux. The best is to use an auxiliary program like authbind http://packages.debian.org/stable/authbind or privbind http://sourceforge.net/projects/privbind/

These programs are run by root. To start ncs under e.g privbind we can do:

# privbind -u ncs /opt/ncs/current/bin/ncs -c /etc/ncs.conf

The above command starts NSO as user ncs and binds to ports below 1024

Using IPv6 on northbound interfaces

NSO supports access to all northbound interfaces via IPv6, and in the most simple case, i.e. IPv6-only access, this is just a matter of configuring an IPv6 address (typically the wildcard address "::") instead of IPv4 for the respective agents and transports in ncs.conf, e.g. /ncs-config/cli/ssh/ip for SSH connections to the CLI, or /ncs-config/netconf-north-bound/transport/ssh/ip for SSH to the NETCONF agent. The SNMP agent configuration is configured via one of the other northbound interfaces rather than via ncs.conf, see The NSO SNMP Agent in Northbound APIs . For example via the CLI, we would set 'snmp agent ip' to the desired address. All these addresses default to the IPv4 wildcard address "0.0.0.0".

In most IPv6 deployments, it will however be necessary to support IPv6 and IPv4 access simultaneously. This requires that both IPv4 and IPv6 addresses are configured, typically "0.0.0.0" plus "::". To support this, there is in addition to the ip and port leafs also a list extra-listen for each agent and transport, where additional IP address and port pairs can be configured. Thus to configure the CLI to accept SSH connections to port 2024 on any local IPv6 address, in addition to the default (port 2024 on any local IPv4 address), we can add an <extra-listen> section under /ncs-config/cli/ssh in ncs.conf:

  <cli>
    <enabled>true</enabled>

    <!-- Use the built-in SSH server -->
    <ssh>
      <enabled>true</enabled>
      <ip>0.0.0.0</ip>
      <port>2024</port>

      <extra-listen>
        <ip>::</ip>
        <port>2024</port>
      </extra-listen>

    </ssh>

    ...
  </cli>

To configure the SNMP agent to accept requests to port 161 on any local IPv6 address, we could similarly use the CLI and give the command:

admin@ncs(config)# snmp agent extra-listen :: 161

The extra-listen list can take any number of address/port pairs, thus this method can also be used when we want to accept connections/requests on several specified (IPv4 and/or IPv6) addresses instead of the wildcard address, or we want to use multiple ports.