EPICURE Design Note 131.7<P> <b> Alarm Monitor Process Upgrades</b>

EPICURE Design Note 131.7

Alarm Monitor Process Upgrades

David M. Kline

Introduction

This paper will outline the proposed upgrades for the Alarm Monitor Process (AMP). The upgrades can be implemented in two phases: the first phase would deal with modifications that relate to AMP internally, the second outlines enhancements which relate to AMP and the alarm system. The following sections describe the items for each of the phases.

Phase-I

Phase-I upgrades intend to be made to AMP internals and leave hooks for Phase-II upgrades. Listed below are the proposed items:

list

[a)] Set requests would be separated by device class and dispatched to lower level Alarm Monitoring Tasks (AMT). Separate AMTs would be implemented to provide data acquisition for normal and synthetic devices. Normal devices would use the QVI or TVI common memory services, whereas synthetic devices would use the EPICURE da_ services. Since the AMT used to service normal devices is using the common memory services, AMP would be required to implement some DAR and DAS functionality to build and manage DALs. This upgrade has the additional benefit of distributing the load of data acquisition requests from other layers of EPICURE subsystems and continued operation if DAS abnormally terminates. Intelligent devices, such as ones which can communicate over ARCnet, could be identified using the class property and submitted to another AMT for processing (VMEbus or VAX/AXP).

[b)] An image rundown handler (IRH) would be declared to expunge DAL lists and terminate da_ services in the event that AMP terminates abnormally. Additional functionality can be included such as flushing file buffers.

[c)] AMP would declare itself as a network object allowing another path to query alarms. This also provides a hook into AMP for Phase-II upgrades and other possible functionality.

[d)] Currently, AMP monitors the existence of DAS every second, then terminates when it finds DAS missing. This can be more efficiently accomplished by DAS defining an image rundown handler which sends a ``DAS down message'' to the AMP process. Another possibility is for the AMP UTI to provide an automatic image rundown handler (IRH) that would be invoked upon process termination. The routine would automatically send the a message indicating the termination of the process. In addition, the UTI could provide an optional parameter that allows a user specified routine to be invoked preceeding the automatic IRH defined by the AMP UTI. This would avoid conflict between the AMP UTI and applications.

This would conclude Phase-I. I would estimate that it would take 5-6 weeks to implement the above items and debug them.

Phase-II

Phase-II upgrades would make use of hooks in place from Phase-I and would encompass other components of the alarm system. This section lists the items which comprise the Phase-II upgrade and ideas which have been presented:

list [a)] A UTI can be written to request alarms reads, communicate DAE messages from the DEMServer process, and implement a consistent interface to set alarms from DAS or other requesters. In addition, since the alarm information can be read from AMP, the ASB report facility can eliminate accessing AMPs database directly.

[b)] Repeat reads have been determined as necessary for alarms on page.

[c)] Multiple alarms for a given device and property (READING,SETTING,STATUS) can be implemented by defining Alarm State Descriptors (ASD) for individual alarm limits. The ASDs follow the ASB header and represent multiple alarms for the given device/property pair. Refer to the appendix for the data structure definitions.

[d)] The user can be notified specifically about device alarms by using the User-Identification-Code (UIC). These codes are consistent between VAX clusters and remote nodes which are managed by the EPICURE system management staff. The user UIC is contained in the ASD (see appendix) and is used by the ARD process to route the alarm message. In addition, wildcards or a UIC value of zero can be used to indicate the alarm is to be sent globally. Alarms which have been set remotely from nodes which are not managed by the EPICURE system management staff, are identified by a bit (foreign) located in the ASD. This bit instructs the ARD process to use the information in the ASB extension of where to send the alarm.

[e)] Events which occur on the DAE can be communicated to the alarm system by the DEMServer sending the status message codes to the AMP. DEMServer would use the AMP UTI to connect and communicate the status message. AMP would construct a device name from the front end the message was sent and retrieve the device index property. The status code contained in the message would be mapped into a particular bit of the devices status property. AMP would set the bit which represents the message and send the appropriate information to the master ARD. ARD would treat the message as a standard alarm and pass the information to the remote ARD and alarm displays.

[f)] DAS could monitor the response of lists from the DAE and drop the links when lists have not been received within a predetermined amount of time. Dropping the links would notify applications that the front end has crashed. The DEM UTI can be used to communicate this to the AMP or DAS can use the AMP UTI and directly communicate the exception.

[g)] The alarm severity residing in the ARB header is passed to the alarm display for interpretation. The user of the alarm display will make the association between what color represents what severity. The severity for a given alarm is defined using ALCON.

[h)] Since the meaning of events is unclear at this point, a full implementation can not be accomplished. Some ideas are for events to indicate when a device exceeds a particular limit, when a quench occurs, or status messages. The ``eventmo'' bit located in the ARB header indicates whether ARD is to hold on to the event until it expires. AMP will be responsible for determining a event and holding on to it for a timeout specified by the user (from ALCON). AMP will set the bit if a timeout was specified indicating that ARD hold on to the event until AMP sends a clear message. When the event expires, AMP will signal to ARD and both dismiss the alarm. If no timeout was specified, AMP immediately signals the event to ARD and both dismiss the event. The event timeout can be a default, or setup in ALCON, and can be identified in the ASD structure as part of the ASB extension (``time'' member).

[i)] The alarm display and local ARD will communicate and execute an application when a particular criteria is met.

[j)] An attempt will be made to support tiered alarms. The user will define the tiered alarm from ALCON which includes a root device and others to be disabled or enabled (as determined by the user) when it goes into alarm. If AMP determines that the root device is in alarm, it signals ARD and disables or enables monitoring the devices associated with it until the root device goes good. The ASD data structure contains two members which support tiered alarms. The ``tiered'' bit is used as an indicator that the particular device is a member of a tiered alarm. The ``override_dipi'' longword contains the device and property index of the root device. Additionally, this is useful for tiered alarms which contain several levels.

[k)] Activating an application (program) given particular alarm conditions can be implemented by a separate application which interfaces to the ARD, or specified as part of the alarm setup in ALCON. Another possibility is for users to write EQL command procedures and run them in a batch environment. At this point, what the users really want is unclear and further discussions about the definition is needed.

[l)] A lower level AMP is placed on the VMEbus using a Cyclone i960, or resides, as a software entity (thread) in the new Alpha/PCI based front end.

This would conclude Phase-II upgrades to the AMP and the alarm system. I would estimate that items which effect AMP would take about another 6-8 weeks to implement. However, testing time may stagger since integration with other components are necessary.

Appendix A: Alarm Header File

This page was intentionally left blank.

Appendix B: Proposed AMP UTI

The proposed AMP UTI is meant for direct communication to the AMP process from remote of local applications. The primary clients of the UTI would be the DAS process, ASB report application, and other diagnostic applications. This appendix describes the proposed AMP UTI routines and their calling sequences.

amp_connect

(int)status = amp_connect( irh, irhp, ccount ) This routine establishes a connection with the alarm server network objects and initializes internal data structures. An image rundown handler (IRH) is declared which notifies the alarm server that the caller process has terminated. The caller can specify another routine to be part of the termination procedure by specifying the ``irh'' parameter. The alarm server IRH executes the user specified ``irh'' before is sends the termination notification to the alarm server. Furthermore, the IRH is executed regardless of the method used to terminate the process. irh address of the image rundown handler that is executed as part of process termination. Passed by reference. irhp value to be passed to the image rundown handler. Passed by value. ccount number of connections made to remote alarm servers. Passed by reference. status returns a condition value: SS$_NORMAL success SS$_NOPRIV fatal, user doesn't posses authorization SS$_NOSUCHOBJ fatal, network object is unknown at remote node AMP__INIT warning, interface already initialized others as returned by EPICURE and VMS system services

amp_disconnect

(int)status = amp_disconnect( ) This routine cancels pending IOs, disconnects from the AMP network object, restores the previous image rundown handler routine, and releases dynamic memory allocated from previous calls to AMP UTI routine. No termination notification is sent to the alarm server. Applications and processes that do not want to notify the alarm server of it's termination MUST call this procedure before exiting. status returns a condition value: SS$_NORMAL success SS$_NOSUCHOBJ fatal, no connection established others as returned by EPICURE and VMS system services

amp_flush_queued

(void)amp_flush_queued( ) This routine is used to flush messages which are queued to be sent to the alarm server without performing a disconnect.

amp_get_status

(int)status = amp_get_status( ast [,astp] ) This routine requests the current status block that is maintained by the alarm server. Notification of the return data is accomplished through a user-provided AST routine (see user_ast) specified by the ``ast'' parameter. The parameters passed to the AST routine include: completion status, node name string (source of data), return data, return data length, and the user-specified parameter (``astp''). If an error occurs on the logical link, the completion status, node name, and user-provided parameters are valid. Furthermore, the AST routine is called for each logical link that exists with an alarm server. ast address of a routine that is called when the status block has returned (see user_ast). Passed by reference. astp option parameter that is passed to the routine specified by the ``ast'' parameter (see user_ast). Passed by value. status returns a condition value: SS$_NORMAL success AMP__QUEUED success, message queued but not sent SS$_BADPARAM fatal, bad parameter value SS$_INSFMEM fatal, insufficient virtual memory SS$_NOSUCHOBJ fatal, no connection established AMP__OVRFLOW warning, too many outstanding messages others as returned by EPICURE and VMS system services

amp_get_users

(int)status = amp_get_users( ast [,astp] ) This routine requests the current user list that is maintained by the alarm server. Notification of the return data is accomplished through a user-provided AST routine (see user_ast) specified by the ``ast'' parameter. The parameters passed to the AST routine include: completion status, node name string (source of data), return data, return data length, and the user-specified parameter (``astp''). If an error occurs on the logical link, the completion status, node name, and user-provided parameters are valid. Furthermore, the AST routine is called for each logical link that exists with an alarm server. ast address of a routine that is called when the status block has returned (see user_ast). Passed by reference. astp option parameter that is passed to the routine specified by the ``ast'' parameter (see user_ast). Passed by value. status returns a condition value: SS$_NORMAL success AMP__QUEUED success, message queued but not sent SS$_BADPARAM fatal, bad parameter value SS$_INSFMEM fatal, insufficient virtual memory SS$_NOSUCHOBJ fatal, no connection established AMP__OVRFLOW warning, too many outstanding messages others as returned by EPICURE and VMS system services

amp_read_alarm

(int)status = amp_read_alarm( dar, ast [,astp] ) This routine requests alarm reads for the list specified by the ``dar'' parameter. Notification of the return data is accomplished through a user-provided AST routine (see user_ast) specified by the ``ast'' parameter. The parameters passed to the AST routine include: completion status, node name string (source of data), return data, return data length, and the user-provided parameter (``astp''). If an error occurs on the logical link, the completion status, node name, and user-provided parameters are valid. dar address of the request message (alarm read). Passed by reference. ast address of the routine called when a list returns. (see user_ast). Passed by reference. astp parameter passed to the AST routine (see user_ast). Passed by value. status returns a condition value: SS$_NORMAL success AMP__QUEUED success, message queued but not sent SS$_BADPARAM fatal, bad parameter value SS$_NOPRIV fatal, no privilege for attempted operation SS$_NOSUCHOBJ fatal, no connection established AMP__TOOMANY fatal, too many connections AMP__INVFTD warning, invalid FTD specified AMP__INVREQ warning, invalid request type AMP__OVRFLOW warning, too many outstanding messages others as returned by EPICURE and VMS system services

amp_read_device_records

(int)status = amp_read_device_records( ast [,astp] ) This routine requests the current list of Alarm Status Blocks that are maintained by each alarm server. Notification of the return data is accomplished through a user-provided AST routine (see user_ast) specified by the ``ast'' parameter. The parameters passed to the AST routine include: completion status, node name string (source of data), return data, return data length, and the user-provided parameter (``astp''). If an error occurs on the logical link, the completion status, node name, and user-provided parameters are valid. Furthermore, the AST routine is called for each logical link that exists with an alarm server. The status code of AMP__NOMORE will be passed to the AST routine when the end of the Alarm Status Block list is reached. ast address of a routine that is called when an Alarm Status Block returns (see user_ast). Passed by reference. astp option parameter that is passed to the routine specified by the ``ast'' parameter (see user_ast). Passed by value. status returns a condition value: SS$_NORMAL success AMP__NOMORE success, no more data AMP__QUEUED success, message queued but not sent SS$_BADPARAM fatal, bad parameter value SS$_NOSUCHOBJ fatal, no connection established AMP__OVRFLOW warning, too many outstanding messages others as returned by EPICURE and VMS system services

amp_repeat_read_alarm

(int)status = amp_repeat_read_alarm( dar, ast [,astp] ) This routine requests repeat read alarm for the list specified by the ``dar'' parameter. Notification of the return data is accomplished through a user-provided AST routine (see user_ast) specified by the ``ast'' parameter. The parameters passed to the AST routine include: completion status, node name string (source of data), return data, return data length, and the user-provided parameter (``astp''). If an error occurs on the logical link, the completion status, node name, and user-provided parameters are valid. dar address of the request message (alarm repeat read). Passed by reference. ast address of the routine called when a list returns. (see user_ast). Passed by reference. astp parameter passed to the AST routine (see user_ast). Passed by value. status returns a condition value: SS$_NORMAL success AMP__QUEUED success, message queued but not sent SS$_BADPARAM fatal, bad parameter value SS$_NOPRIV fatal, no privilege for attempted operation SS$_NOSUCHOBJ fatal, no connection established AMP__INVRQDATA fatal, invalid request data AMP__TOOMANY fatal, too many connections AMP__INVFTD warning, invalid FTD specified AMP__INVREQ warning, invalid request type AMP__OVRFLOW warning, too many outstanding messages others as returned by EPICURE and VMS system services

amp_request_alarm

(int)status = amp_request_alarm( darm, ast [,astp] ) This routine requests alarm sets or reads given the list specified by the ``dar'' parameter. Notification of the return data is accomplished through a user-provided AST routine (see user_ast) specified by the ``ast'' parameter. The AST routine is executed for every list contained in the request list. The parameters passed to the AST routine include: completion status, node name string (source of data), return data, return data length, and the user-provided parameter (``astp''). If an error occurs on the logical link, the completion status, node name, and user-provided parameters are valid. darm address of the alarm request message. Passed by reference. ast address of the routine called when a list returns. (see user_ast). Passed by reference. astp parameter passed to the AST routine (see user_ast). Passed by value. status returns a condition value: SS$_NORMAL success AMP__QUEUED success, message queued but not sent SS$_NOPRIV fatal, no privilege for attempted operation SS$_NOSUCHOBJ fatal, no connection established AMP__TOOMANY fatal, too many connections AMP__OVRFLOW warning, too many outstanding messages others as returned by EPICURE and VMS system services

amp_send_dae_status

(int)status = amp_send_dae_status( code ) This routine is specifically used by the DEMServer process and others which monitor exceptions received from the DAE. The code retrieved from the HERMES queue is hashed to a particular bit number and communicated to the AMP. The source node name and message facility codes are used to derive a device name. The bit number of the device represents the specific message code. If the bit number exceeds the maximum number of allowable bits, an overflow message is communicated. code value of the status code received from the dae. Passed by value. status returns a condition value: SS$_NORMAL success AMP__QUEUED success, message queued but not sent SS$_NOSUCHOBJ fatal, no connection established SS$_NOPRIV fatal, no privilege for attempted operation AMP__OVRFLOW warning, too many outstanding messages others as returned by EPICURE and VMS system services

amp_set_alarm

(int)status = amp_set_alarm( dar, ast [,astp] ) This routine requests alarm sets for the list specified by the ``dar'' parameter. Notification of the return data is accomplished through a user-provided AST routine (see user_ast) specified by the ``ast'' parameter. The parameters passed to the AST routine include: completion status, node name string (source of data), return data, return data length, and the user-provided parameter (``astp''). If an error occurs on the logical link, the completion status, node name, and user-provided parameters are valid. dar address of the request message (alarm set). Passed by reference. ast address of the routine called when a list returns. (see user_ast). Passed by reference. astp parameter passed to the AST routine (see user_ast). Passed by value. status returns a condition value: SS$_NORMAL success AMP__QUEUED success, message queued but not sent SS$_BADPARAM fatal, bad parameter value SS$_NOPRIV fatal, no privilege for attempted operation SS$_NOSUCHOBJ fatal, no connection established AMP__TOOMANY fatal, too many connections AMP__INVFTD warning, invalid FTD specified AMP__INVREQ warning, invalid request type AMP__OVRFLOW warning, too many outstanding messages others as returned by EPICURE and VMS system services

user_ast

(void)user_ast(status, node, data, datalen, usrprm ) This user-provided routine is called at AST level (IPL 2) on behalf of the user and indicates that return data has arrived for the corresponding UTI routine. The routine is called with the above parameters and below provides a description. status completion status of the operation. Passed by value. node a character string of the source node name. Passed by reference. data address of the return data. Passed by reference. datalen size of the return data. Passed by value. usrprm user parameter. Passed by value.

Keywords: ALCON, AMP, ARD, alarms, EPICURE

Distribution:

Normal

Security, Privacy, Legal

rwest@fsus04.fnal.gov