- Start
- What's New
- Getting Started
- User Guide
- About
- Basic Operations
- Network Element Drivers and Adding Devices
- Managing Network Services
- NSO CLI
- The NSO Device Manager
- SSH Key Management
- Alarm Manager
- Plug-and-play Scripting
- Compliance reporting
- NSO Packages
- Life-cycle Operations - Manipulating Existing Services and Devices
- Web User Interface
- Network Simulator
- Administration Guide
- Northbound APIs
- Development Guide
- Preface
- Development Environment and Resources
- The Configuration Database and YANG
- Basic Automation with Python
- Developing a Simple Service
- Applications in NSO
- Implementing Services
- Templates
- Services Deep Dive
- The NSO Java VM
- The NSO Python VM
- Embedded Erlang applications
- The YANG Data Modeling Language
- Using CDB
- Java API Overview
- Python API Overview
- NSO Packages
- Package Development
- Service Development Using Java
- NED Development
- NED Upgrades and Migration
- Service Handling of Ambiguous Device Models
- Scaling and Performance Optimization
- NSO Concurrency Model
- Developing Alarm Applications
- SNMP Notification Receiver
- The web server
- Kicker
- Scheduler
- Progress Trace
- Nano Services for Staged Provisioning
- Encryption Keys
- External Logging
- NSO Developer Studio
- Web UI
- Layered Service Architecture
- Manual Pages
- NSO Documentation Home
- NSO SDK API Reference
- NSO Change Log Explorer
- NSO NED Change Log Explorer
- NSO NED Capabilities Explorer
- NSO on DevNet
- Get Support
OUTDATED
OUTDATED
This documentation corresponds to an older version of the product, is no longer updated, and may contain outdated information.
Please access the latest versions from https://cisco-tailf.gitbook.io/nso-docs and update your bookmarks. OK
NSO embeds a generic alarm manager. It is used for managing NSO native alarms and can easily be extended with application specific alarms. Alarm sources can be notifications from devices, undesired states on services detected or anything provided via the Java API.
The Alarm Manager has three main components:
- Alarm List
-
a list of alarms in NSO. Each list entry represents an alarm state for a specific device, object within the device and an alarm type
- Alarm Model
-
for each alarm type, you can configure the mapping to for example X.733 alarm standard parameters that are sent as notifications northbound
- Operator Actions
-
actions to set operator states on alarms such as acknowledgement, and also actions to administratively manage the alarm list such as deleting alarms
The alarm manager is accessible over all northbound interfaces. A read-only view including an SNMP alarm table and alarm notifications is available in an SNMP Alarm MIB. This MIB is suitable for integration to SNMP based alarm systems.
In order to populate the alarm list there is a dedicated Java API. This API lets a developer add alarms, change states on alarms etc. A common usage pattern is to use the SNMP notification receiver to map a subset of the device traps into alarms.
First of all it is important to clearly define what an alarm means. “An alarm denotes an undesirable state in a resource for which an operator action is required. ” Alarms are often confused with general logging and event mechanisms, thereby overflooding the operator with alarms. In NSO, the alarm manager shows undesired resource states that an operator should investigate. NSO contains other mechanisms for logging in general. Therefore, NSO does not naively populate the alarm list with traps received in the SNMP notification receiver.
Before looking into how NSO handles alarms it is important to define the fundamental concepts. We make a clear distinction between alarms and events in general. Alarms should be taken seriously and be investigated. Alarms have states, they go active with a specific severity, they change severity and they are cleared by the resource. The same alarm may become active again. A common mistake is to confuse the operator view with the resource view. The model described so far is the resource view. The resource itself may consider the alarm cleared. The alarm manager does not automatically delete cleared alarms. An alarm that has existed in the network may still need investigation. There are dedicated actions an operator can use to manage the alarm list, for example delete alarms based on criterias such as cleared and date. These actions can be performed over all north-bound interfaces.
Rather than viewing alarms as a list of alarm notifications NSO
defines alarms as states on objects. The NSO alarm list uses
four keys for alarms: the alarming object
within a device and the alarm
type and an optional specific problem.
Alarm types are normally unique
identifiers for a specific alarm state and are defined statically.
An alarm type corresponds to the well-known X.733 alarm standard tuple
event type, and probable cause. Specific problem is an optional key that
is string based and can further redefine an alarm type at run-time.
This is needed for alarms that are not known before a system is deployed.
Imagine a system with general digital inputs. A MIB might specify
traps called input-high, input-low. When defining the SNMP notification
reception, an integrator might define an alarm type called "External-Alarm".
Input-high
might imply a major alarm and
input-low
might imply clear.
At installation some detectors report "fire-alarm" and some "door-open"
alarms. This is configured at the device and sent as free text in the SNMP
var-binds. This is then managed by using the specific problem field of the
NSO alarm manager to separate this different alarm types.
The data model for the alarm-manager is outlined in Figure 5, “Alarm Model”
This means that we have a list with key: (managed device, managed object, alarm type, specific problem). In the example above we might have the following different alarms:
-
Device : House1; Managed Object : Detector1; Alarm-Type : External Alarm; Specific Problem = Smoke;
-
Device : House1; Managed Object : Detector2; Alarm-Type : External Alarm; Specific Problem = Door Open;
Each alarm entry shows the last status change for the alarm and also a child list with all status changes sorted in chronological order.
- is-cleared
-
was the last state change clear?
- last-status-change
-
time stamp for last status change
- last-perceived-severity
-
last severity (not equal to clear)
- last-alarm-text
-
the last alarm text (not equal to clear)
- status-change, event-time
-
the time reported by the device
- status-change, received-time
-
the time the state change was received by NSO
- status-change, perceived-severity
-
the new perceived severity
- status-change, alarm-text
-
descriptive text associated with the new alarm status
It is fundamental to define alarm types (specific problem) and managed objects with a fine-grained mechanism that still is extensible. For objects we allow YANG instance-identifiers to refer to a YANG instance identifier, an SNMP OID, or a string. Strings can be used when the underlying object is not modelled. We use YANG identities to define alarm types. This has the benefit that alarm types can be defined in a named hierarchy and thereby provide an extensible mechanism. In order to support "dynamic alarm types" so that alarms can be separated by information only available at run-time the string based field specific problem can also be used.
So far we have described the model based on the resource view. It is common practice to let operators manipulate the alarms corresponding to the operators investigation. We clearly separate the resource and the operator view, for example, there is no such thing as an operator "clearing an alarm". Rather the alarm entries can have a corresponding alarm handling state. Operators may want to acknowledge an alarm, set the alarm state to closed or similar.
We also support some alarm list administrative actions:
- Synchronize alarms
-
try to read the alarm states in the underlying resources and update the alarm list accordingly (this action needs to be implemented by user code for specific applications)
- Purge alarms
-
delete entries in the alarm list based on several different filter criteria
- Filter alarms
-
with a XPATH as filter input, this action returns all alarms fulfilling the filter
- Compress alarms
-
since every entry may contain a large amount of state change entries this action compresses the history to the latest state change
Alarms can be forwarded over NSO northbound interfaces. In many telecom environments alarms need to be mapped to X.733 parameters. We provide an alarm model where every alarm type is mapped to the corresponding X.733 parameters such as event type and probable cause. In this way, it is easy to integrate the NSO alarms into whatever X.733 enumerated values the upper fault management system is requiring.
The central part of the YANG Alarm model,
tailf-ncs-alarms.yang
has the following structure.
module tailf-ncs-alarms { namespace "http://tail-f.com/ns/ncs-alarms"; prefix "al"; ... typedef managed-object-t { type union { type instance-identifier { require-instance false; } type yang:object-identifier; type string; } ... typedef event-type { type enumeration { enum other {value 1;} enum communicationsAlarm {value 2;} enum qualityOfServiceAlarm {value 3;} enum processingErrorAlarm {value 4;} enum equipmentAlarm {value 5;} ... } description "..."; reference "ITU Recommendation X.736, 'Information Technology - Open Systems Interconnection - System Management: Security Alarm Reporting Function', 1992"; } typedef severity-t { type enumeration { enum cleared {value 1;} enum indeterminate {value 2;} enum critical {value 3;} enum major {value 4;} enum minor {value 5;} enum warning {value 6;} } description "..."; } ... identity alarm-type { description "Base identity for alarm types." ... } identity ncs-dev-manager-alarm { base alarm-type; } identity ncs-service-manager-alarm { base alarm-type; } identity connection-failure { base ncs-dev-manager-alarm; description "NCS failed to connect to a device"; } .... container alarm-model { list alarm-type { key "type"; leaf type { type alarm-type-t; } uses alarm-model-parameters; } } ... container alarm-list { config false; leaf number-of-alarms { type yang:gauge32; } leaf last-changed { type yang:date-and-time; } list alarm { key "device type managed-object specific-problem"; uses common-alarm-parameters; leaf is-cleared { type boolean; mandatory true; } leaf last-status-change { type yang:date-and-time; mandatory true; } leaf last-perceived-severity { type severity-t; } leaf last-alarm-text { type alarm-text-t; } list status-change { key event-time; min-elements 1; uses alarm-state-change-parameters; } leaf last-alarm-handling-change { type yang:date-and-time; } list alarm-handling { key time; leaf time { tailf:info "Time stamp for operator action"; type yang:date-and-time; } leaf state { tailf:info "The operators view of the alarm state"; type alarm-handling-state-t; mandatory true; description "The operators view of the alarm state."; } ... } ... notification alarm-notification { ... rpc synchronize-alarms { ... rpc compress-alarms { ... rpc purge-alarms {
The first part of the YANG listing above shows the definition for
managed-object
type in order for alarms to refer to
YANG, SNMP and other resources. We also see basic definitions
from the X.733 standard for severity levels.
Note well the definition of alarm-type using YANG identities. In
this way we can create a structured alarm type hierarchy all
rooted at alarm-type
. In order for you to add your
specific alarm types, define your own alarm types YANG file and
add identities using alarm-type
as base.
The alarm-model
container contains the mapping from
alarm types to X.733 parameters used for north-bound interfaces.
The alarm-list
container is the actual alarm list
where we maintain a list mapping
(device, managed-object, alarm-type, specific-problem) to
the corresponding alarm state changes [(time, severity, text)].
Finally, we see the northbound alarm notification and alarm administrative actions.
The NSO alarm manager has support for the operator to acknowledge alarms. We call this alarm handling. Each alarm has an associated list of alarm handling entries as:
container alarms { .... container alarm-list { config false; .... list alarm { key "device type managed-object specific-problem"; ..... list alarm-handling { key time; leaf time { type yang:date-and-time; description "Time-stamp for operator action on alarm."; } leaf state { mandatory true; type alarm-handling-state-t; description "The operators view of the alarm state"; } leaf user { description "Which user has acknowledged this alarm"; mandatory true; type string; } leaf description { description "Additional optional textual information regarding this new alarm-handling entry"; type string; } } tailf:action handle-alarm { tailf:info "Set the operator state of this alarm"; description "An action to allow the operator to add an entry to the alarm-handling list. This is a means for the operator to indicate the level of human intervention on an alarm."; input { leaf state { type alarm-handling-state-t; mandatory true; } } } }
The following typedef defines the different states an alarm can be set into.
typedef alarm-handling-state-t { type enumeration { enum none { value 1; } enum ack { value 2; } enum investigation { value 3; } enum observation { value 4; } enum closed { value 5; } } description "Operator actions on alarms"; }
It is of course also possible to manipulate the alarm handling list
from either Java code or Javascript code running in the web browser
using the js_maapi
library.
Below follows a simple scenario to illustrate the alarm concepts.
The example can be found in
examples.ncs/service-provider/simple-mpls-vpn
$make stop clean all start
$ncs-netsim stop pe0
$ncs-netsim stop pe1
$ncs_cli -u admin -C
admin connected from 127.0.0.1 using console onhost
admin@ncs#devices connect
... connect-result { device pe0 result false info Failed to connect to device pe0: connection refused } connect-result { device pe1 result false info Failed to connect to device pe1: connection refused } ... admin@ncs#show alarms alarm-list
alarms alarm-list number-of-alarms 2 alarms alarm-list last-changed 2015-02-18T08:02:49.162436+00:00 alarms alarm-list alarm pe0 connection-failure /devices/device[name='pe0'] "" is-cleared false last-status-change 2015-02-18T08:02:49.162734+00:00 last-perceived-severity major last-alarm-text "Failed to connect to device pe0: connection refused" status-change 2015-02-18T08:02:49.162734+00:00 received-time 2015-02-18T08:02:49.162734+00:00 perceived-severity major alarm-text "Failed to connect to device pe0: connection refused" alarms alarm-list alarm pe1 connection-failure /devices/device[name='pe1'] "" is-cleared false last-status-change 2015-02-18T08:02:49.162436+00:00 last-perceived-severity major last-alarm-text "Failed to connect to device pe1: connection refused" status-change 2015-02-18T08:02:49.162436+00:00 received-time 2015-02-18T08:02:49.162436+00:00 perceived-severity major alarm-text "Failed to connect to device pe1: connection refused"
In the above scenario we stop two of the devices and then ask NSO to connect to all devices. This results in two alarms for pe0 and pe1. Note that the key for the alarm is the devicename, the alarm-type and the full path to the object (in this case the device and not an object within the device) and finally an empty string for specific problem.
In the next command sequence we start the device and request NSO to connect. This will clear the alarms.
admin@ncs#exit
$ncs-netsim start pe0
DEVICE pe0 OK STARTED $ncs-netsim start pe1
DEVICE pe1 OK STARTED $ncs_cli -u admin -C
$ admin@ncs#devices connect
... connect-result { device pe0 result true info (admin) Connected to pe0 - 127.0.0.1:10028 } connect-result { device pe1 result true info (admin) Connected to pe1 - 127.0.0.1:10029 } ... admin@ncs#show alarms alarm-list
alarms alarm-list number-of-alarms 2 alarms alarm-list last-changed 2015-02-18T08:05:04.942637+00:00 alarms alarm-list alarm pe0 connection-failure /devices/device[name='pe0'] "" is-cleared true last-status-change 2015-02-18T08:05:04.942637+00:00 last-perceived-severity major last-alarm-text "Failed to connect to device pe0: connection refused" status-change 2015-02-18T08:02:49.162734+00:00 received-time 2015-02-18T08:02:49.162734+00:00 perceived-severity major alarm-text "Failed to connect to device pe0: connection refused" status-change 2015-02-18T08:05:04.942637+00:00 received-time 2015-02-18T08:05:04.942637+00:00 perceived-severity cleared alarm-text "Connected as admin" alarms alarm-list alarm pe1 connection-failure /devices/device[name='pe1'] "" is-cleared true last-status-change 2015-02-18T08:05:04.84115+00:00 last-perceived-severity major last-alarm-text "Failed to connect to device pe1: connection refused" status-change 2015-02-18T08:02:49.162436+00:00 received-time 2015-02-18T08:02:49.162436+00:00 perceived-severity major alarm-text "Failed to connect to device pe1: connection refused" status-change 2015-02-18T08:05:04.84115+00:00 received-time 2015-02-18T08:05:04.84115+00:00 perceived-severity cleared alarm-text "Connected as admin"
Note that there are two status-change entries for the alarm and that the alarm is cleared. In the following scenario we will state that the alarm is closed and finally purge (delete) all alarms that are cleared and closed. (Again note the distinction between operator states and the states from the underlying resources.)
admin@ncs#alarms alarm-list alarm pe0 connection-failure /devices/device[name='pe0'] "" handle-alarm state closed description Fixed
admin@ncs#show alarms alarm-list alarm alarm-handling
DEVICE TYPE STATE USER DESCRIPTION --------------------------------------------------------- pe0 connection-failure closed admin Fixed admin@ncs#alarms purge-alarms alarm-handling-state-filter { state closed }
Value for 'alarm-status' [any,cleared,not-cleared]:cleared
purged-alarms 1
Assume you need to configure the northbound parameters. This is done using
the alarm-model. A logical mapping of the connection problem above is
to map it to X.733 probable cause
connectionEstablishmentError (22)
. This is done in the NSO CLI
in the following way:
admin@ncs#config
Entering configuration mode terminal admin@ncs(config)#alarms alarm-model alarm-type connection-failure probable-cause 22
admin@ncs(config-alarm-type-connection-failure/*)#commit
Commit complete. admin@ncs(config-alarm-type-connection-failure/*)#show full-configuration
alarms alarm-model alarm-type connection-failure * event-type communicationsAlarm has-clear true kind-of-alarm root-cause probable-cause 22