RTN-051: Rubin Observatory Risk Management Tool User Guide

  • Matthew Rumore

Latest Revision: 2023-02-09

Breakdown of Risk, Plan and Action Objects

This page explains and defines the fields associated with risks.

The tables which define the categories when analyzing risks are provided in the Risk Tool Tables. The source of this information is within the Risk Tool.

Breakdown of a risk

This section breaks down an example risk into:

Risk identification

The first section is used to identify and categorize the risk and those responsible for its management.

_images/Risk-Example-Risk-Identification.png

Figure 10 Risk Identification section using an example risk.

Project

Rubin Operations.

Risk ID

Automatically generated unique identifier.

Risk Type

Threats or Opportunities.

Status

Candidate, Active, Retire, Realized, or Depreciated.

Risk Department

Rubin Observatory Department which owns the risk and responsible for its management.

Risk Category; Sub Category

Categorizes the risk using the information in Risk Category and Sub Category Table (defined by Risk Tool). Click the information button next to the field to display the information within the Risk Tool webapp.

Risk Title

Short, descriptive title for the risk.

Risk Statement

“IF-THEN” statement describing the risk.

The statement should present the possible risk event or condition (“if”) and the potential outcome or consequences (“then”).

Date Entered; Date Last Modified; Last Modified By

Automatically generated and updated.

Share Risk Externally

Yes or No depending on if the risk is shared external to Rubin Observatory.

Parent

Automatically generated list of associated Parent Risks of a Child Risk.

Parent Risks are considered a “headline risk” to allow management to drill down to the Child Risk(s) that are of concern. Parent Risks are not assessed directly, and they inherit the risk level of the highest-level Child Risk.

Risk initial impact

Risks are analyzed by the Cost Impact and Schedule Impact to Rubin Observatory, and the Likelihood for it to be realized. These are categorized into five levels of severity, as defined in Risk Tool Tables. The categories are defined within the Risk Tool — click the information button next to the field to display the information within the Risk Tool webapp.

The risk should first be analyzed under the initial condition of realization, i.e., before responses take effect. The impact categorizations will automatically generate the Risk Score fields as the product of the impact and likelihood.

_images/Risk-Example-Analyze-Risk-Impacts.png

Figure 11 Risk Impacts sections using an example risk.

Overall Impact

Optional field to categorize the overall impact of the risk to Rubin Observatory before any response plans take effect.

See Risk Impact Category Table (defined by Risk Tool) for categories.

Overall Impact can be used to increase the Impact Severity field in the Risk Score, as shown in this example (Figure 11 and Figure 12).

Cost Impact

Categorization of cost impact, relative to the Rubin Observatory FY Baseline operating budget of $70,000,000, before any response plans take effect.

See Risk Impact Category Table (defined by Risk Tool) for categories.

Cost impacts are categorized relative to the annual baseline, even though in practice the cost of the realized risk may be felt and/or accumulated over multiple years.

Schedule Impact

Categorization of schedule impact, relative to the critical path of the Rubin Observatory’s schedule (e.g., the data release cycle, the summit maintenance schedule, the start of operations, or the completion of the LSST survey) before any response plans take effect.

See Risk Impact Category Table (defined by Risk Tool) for categories.

You should discuss the specifics with your department’s Associate Director to determine the schedule impact. For example, some delays may have an inconsequential impact to the Observatory’s operations if it can be absorbed into the data release cycle, while others may require extending the LSST survey or delaying a data release as an action if the risk was realized. The latter affects the Observatory’s operational critical path and crucial milestones — these impacts are the most important ones to capture and accurately.

Likelihood

Categorization of overall chance of risk being realized before any response plans take effect.

See Likelihood Category Table (defined by Risk Tool) for categories.

Existential Risk

Yes or No if the risk is existential to NOIRLab.

You should make an initial assessment for the Rubin Observatory Risk and Opportunity Board to review, then the board will confirm if this is appropriate.

Schedule/Cost Impact Description

Text fields to describe and comment on decision for impact categorizations.

Risk score and quantitative analysis

The fields under Risk Score are automatically generated based on input selections from risk impacts. These are categorized into five levels of severity, as defined in Risk Tool Tables. The categories are defined within the Risk Tool — click the information button next to the field to display the information within the Risk Tool webapp.

The Analyze Risk Quantitative section will not affect values and categories; however, the section will record the impact justification and provide information needed to categorize the impacts. In practice, you should assess and adjust the impact categorizations after completing the the Analyze Risk Quantitative section.

_images/Risk-Example-Analyze-Risk-Score-and-Quantitative.png

Figure 12 Risk Score and Analysis Quantitative sections using an example risk.

Impact Severity

Automatically generated category based on Overall Impact, Cost Impact, Schedule Impact and Likelihood.

Impact Score

Automatically generated value based on Cost Impact, Schedule Impact and Likelihood.

Likelihood Score

Automatically generated value based on Likelihood.

Probability

Automatically generated value based on Likelihood.

Initial Risk Score

Automatically generated value based on Overall Impact, Cost Impact, Schedule Impact and Likelihood.

Minimum Delay (Months); Maximum Delay (Months); Likely Delay (Months)

Minimum, maximum and likely delay if risk is realized, in months (round to the nearest integer).

Expected Schedule Delay (Months)

Automatically generated value based on Minimum Delay, Maximum Delay and Likely Delay.

Impact Time

Date when realized risk would impact the schedule.

Impacted Event/Milestone

Event or milestone impacted by the realized risk.

This is important, so that the meaning of the schedule delay is clear. Some examples include LSST Survey Start, Data Release 1 (DR1), DR2, Year 1 Annual Maintenance and LSST Survey Finish.

Basis of Estimate

Reference to basis of estimate capturing impact of realized risk.

Minimum Cost (US Dollars); Maximum Cost (US Dollars); Likely Cost (US Dollars)

Minimum, maximum and likely annual cost of realized risk.

Costs should be estimated as they would occur, i.e., on an approximate, time-averaged, annual basis over the likely time period of impact and in approximate then-year dollars. Rubin Observatory needs to know how much funding to hold in reserve each year in order to address risks as they are realized.

Cost estimates need only be precise to the nearest $1,000,000, although higher precision is appreciated. This resolution is chosen because the cost estimates are multiplied by the estimated likelihood, and the product is expected to be uncertain by at least a factor of two.

Expected Cost Impact

Automatically generated value based on the following formula:

Minimum Cost + Likely Cost × 4 + Maximum Cost ÷ 6

Expected Monetary Value

Automatically generated value based on the following formula:

[Minimum Cost + Likely Cost × 4 + Maximum Cost ÷ 6] × [(Likelihood Score × 0.20) - 0.10]

Financial Provision

This field is not used by Rubin Observatory.

Number of Possible Occurrences

Number of potential occurrences this risk can be realized, as an integer.

Risk residual impact

After a risk is identified, response plans (also known as responses) are used to address it. The Residual Cost Impact, Residual Schedule Impact and Residual Likelihood analyze the realized risk impact after the plan is activated. These are categorized into five levels of severity, as defined in Risk Tool Tables. The categories are defined within the Risk Tool — click the information button next to the field to display the information within the Risk Tool webapp.

Related Actions (also known as actions) are actions taken to implement a response plan if a risk is realized. Actions can be associated with risk and/or responses.

The risk is analyzed under the condition of realization after the response plans take effect within this section The impact categorizations will automatically generate the Residual Risk Score fields as the product of the impact and likelihood.

_images/Risk-Example-Plans-Actions-Residual-Risk.png

Figure 13 Residual Risk Impacts, Response Plans and Related Actions sections using an example risk.

Plan Type

Strategic process of controlling the identified risks via response plans.

Figure 14 shows the four types of processes, and their implementation depends on if the risk is a threat or opportunity. Note that a risk may still be realized after a response plan is implemented: for example, the difference between mitigating or accepting a threat (or enhancing or ignoring an opportunity) before the risk is realized can be summarized as “do something” or “do nothing.”

Some risks may include multiple response plans. In this case, specify the plan type of the costliest present — for threats, the plan types in order of increasing costliness are: Accept (least cost), Transfer, Avoid, Mitigate (greatest cost).

_images/Response-Plan-Types.png

Figure 14 Four processes and respective threat or opportunity response plan types.

Response Types for Threats
Avoid

Changing your strategy or plans to avoid the risk. This risk response strategy is about removing the threat by any means. That can mean changing your management plan to avoid the risk because it’s detrimental to the project/program.

Transfer

Passing ownership and/or liability to a third party to resolve the risk, e.g., purchase fire insurance for an unfinished building.

Mitigate

Reducing the probability and/or impact of the risk below a threshold of acceptability. Some risks cannot be avoided and need to take action to reduce the impact of the risk, e.g., work procedures and equipment designed to reduce workplace safety risks.

Accept

Recognizing residual risks and devising responses to control and monitor them. This risk response strategy consists of identifying a risk and documenting all the risk management information about it, but not taking any action unless the risk is realized.

Response Types for Opportunities
Exploit

Exploiting a risk to make use of the opportunity that becomes available if that risk occurs.

Share

Distributing the risk across multiple stakeholders (teams/projects/programs).

Enhance

An action that is taken to increase the chance of the opportunity occurring.

Ignore

Opportunities that cannot be actively addressed through other opportunity response types can be ignored, with no special measures being taken to address them.

Escalation

Yes or No if the risks is escalated to NOIRLab Directorate or other program/services for their attention.

Related Response Plans

Automatically generated list of response plans associated with this risk.

Related Actions

Automatically generated list of actions associated with this risk.

Residual Overall Impact

Optional field to categorize the overall impact of the risk to Rubin Observatory after a response plan is in effect.

See Risk Impact Category Table (defined by Risk Tool) for categories.

Residual Overall Impact can be used to increase the Residual Impact Severity field.

Residual Cost Impact

Categorization of cost impact, relative to the Rubin Observatory FY Baseline operating budget of $70,000,000, after a response plan is in effect.

See Risk Impact Category Table (defined by Risk Tool) for categories.

Cost impacts are categorized relative to the annual baseline, even though in practice the cost of the realized risk may be felt and/or accumulated over multiple years.

Residual Likelihood

Categorization of overall chance of risk being realized after a response plan is in effect.

See Likelihood Category Table (defined by Risk Tool) for categories.

Residual Schedule Impact

Categorization of schedule impact, relative to the critical path of the Rubin Observatory’s schedule (e.g., the data release cycle, the summit maintenance schedule, the start of operations, or the completion of the LSST survey) after a response plan is in effect.

See Risk Impact Category Table (defined by Risk Tool) for categories.

You should discuss the specifics with your department’s Associate Director to determine the schedule impact. For example, some delays may have an inconsequential impact to the Observatory’s operations if it can be absorbed into the data release cycle, while others may require extending the LSST survey or delaying a data release as an action if the risk was realized. The latter affects the Observatory’s operational critical path and crucial milestones — these impacts are the most important ones to capture and accurately.

Residual Impact Severity

Automatically generated category based on Residual Overall Impact, Residual Cost Impact, Residual Schedule Impact and Residual Likelihood.

See Risk Impact Category Table (defined by Risk Tool) for categories.

You must include Residual Cost Impact, Residual Schedule Impact and Residual Likelihood because the residual impact scores will not include inputs from the Risk initial impact section.

Residual Impact Score

Automatically generated value based on Residual Cost Impact, Residual Schedule Impact and Residual Likelihood.

Residual Likelihood Score

Automatically generated value based on Residual Likelihood.

Residual Probability

Automatically generated value based on Residual Likelihood.

Residual Risk Score

Automatically generated category based on Residual Overall Impact, Residual Cost Impact, Residual Schedule Impact and Residual Likelihood.

Risk comments, notify list and history

A text field is available to include additional comments on the risk and its status. Email notifications are possible and can be customized to project/program/service group needs to notify the appropriate internal stakeholders of ongoing changes, scheduled events and distribution of reports on necessary dates or recurring timeframes. All changes are tracked by the History Trail section to capture the history of modification by users and when the modification occurred.

_images/Risk-Example-Comments-History.png

Figure 15 Risk comment, notification and History Trail sections using an example risk.

Status Description

Text field to describe and comment the status and status changes.

Realized Risk Plan

Text field to describe and comment on planning for if and when the risk becomes realized.

Conclusion

Text field to describe and comment on the conclusion of a retired or depreciated risk.

Updates/Comments

Text field to describe and comment on updates. These comments are logged once they are saved; see “Nov 10 2022” entry in Figure 15.

Notify List

List of users on the Notify List (left) and tools to add/remove users (right) for the risk.

History Trail

Log of all modifications to the risk, including user making the change, the nature of the change and the date/time the change was made; see entry “[16]” in Figure 15.

Risk Tool Tables

This section includes the tables from the Risk Tool. Additional information specific to Rubin Observatory may be found here.

Table 1 Risk Category and Sub Category Table (defined by Risk Tool)

Category — Sub Category

Sub Category Description

Program Science — Astronomy and Astrophysics Community

Priorities, needs and expectations of the community and the changes related to it.

Program Science — Science Related

Science produced by the organization and its relevance and impact.

Technical — Scope

Related to Scope changes of the Organization/Project objectives.

Technical — Requirements

Identifying/missing/not well defined requirements.

Technical — Processes

Inadequate or not well defined technical or operational processes.

Technical — Technology

Technology readiness level and related.

Technical — Interfaces

Technical interfaces, infrastructure and complexity of the interfaces, and related.

Technical — Quality

Verification of the requirements and concept of operations, to ensure the performance. How well the as-built system compares against the requirements.

Management — Program/Project Management

Anything related to project and program like schedule, planning, Monitoring and controlling.

Management — NOIRLab/AURA Management

AURA, NOIRLab rates and other AURA and NOIRLab management related.

Management — Operations Management

Portfolio Management, finance, ITOps, safety group, and other operations related.

Management — Resourcing

Labor resourcing, shared resources availability, conflicts between fraction of shared resources.

Management — Communication

Internal communication within the organization and external communication.

Management — Health & Safety Environment

Mental health of the employees due to pandemic, safety in the observatories, etc.

Commercial/Organizational — Contractual/Procurement

All contractual and procurement related events, liabilities, warranties, legal, compliance.

Commercial/Organizational — Partnerships and Joint Ventures

Any risks associated with tenants, partners and in-kind support, relationship.

Commercial/Organizational — Subcontracts and Suppliers

Any risk related to subcontractor and supplier issues (non-contractual); e.g., supplier going bankrupt.

External — Financial

All sources of risks related to funding and cash flow.

External — Legislation and Regulatory

Lease renewals, political, sites & facilities, applicable law.

External — Exchange Rates

Exchange rates for currency, e.g., USD-CLP, USD-EUR.

External — Natural Environmental Factors

Risks related to weather, earthquakes, tsunami and other natural factors.

External — Human Environmental Factors

Risks related to light pollution, satellites, air pollution and other passive human factors.

External — External Stakeholders

External stakeholders influencing like funding agencies, public, protest group, hackers, hostile competitors or other active human factors.

Table 2 Likelihood Category Table (defined by Risk Tool)

Likelihood Category

Percent Chance

Definition

Remote

10-20%

Extremely unlikely to occur.

Unlikely

21-40%

May occur only in exceptional circumstances.

Possible

41-60%

Could occur in certain circumstances.

Likely

61-80%

Probably will occur in many circumstances.

Very Likely

81-90%

Expected to occur in most circumstances.

Table 3 Risk Impact Category Table (defined by Risk Tool)

Impact Category

Overall Impact

Cost Impact

Schedule Impact

Performance Impact

Safety Impact

Safety Human Impact

Safety Asset Impact

Low

Any other impacts with respect to operations, project, or initiative within the Programs or Services.

Minimal consequence.

Minimal consequence.

Minimal consequence to objectives/goals.

Asset has no sign of physical damage and/or personnel discomfort/nuisance.

Discomfort or nuisance.

No sign of physical damage.

Moderate

Any impacts with respect to delivering an NOIRLab Program or Service POP milestone.

Cost variance less than or equal to 5% of total approved FY baseline.

Critical path does not slip; total slack of slipped tasks will not impact critical path in less than 10 days.

Minor consequence to objectives/goals.

Asset has cosmetic damage and is repairable and/or first aid event per OSHA criteria.

First aid event per OSHA criteria.

Cosmetic damage and is repairable.

Significant

Any impacts with respect to Key Performance Evaluation metric(s) of an NOIRLab Program or Service.

Cost variance greater than 5% but less than or equal to 10% of total approved FY baseline.

Critical path does not slip; total slack of slipped tasks is within 10 days of impacting the critical path.

Unable to achieve a particular objective/goal, but remaining objective goals represent better than minimum success or outcome.

Asset damaged but repairable and/or no personnel lost time injury or illness per OSHA criteria.

No lost time injury or illness per OSHA criteria.

Damaged but repairable.

Damaging/Major

Any impacts with respect to priorities in the POP or LRP of an NOIRLab Program or Service.

Cost variance greater than 10% but less than or equal to 15% of total approved FY baseline.

Critical path slips.

Unable to achieve multiple objectives/goals, but minimum success can still be achieved or claimed.

Asset is substantially damaged but repairable and/or personnel lost time injury or illness per OSHA criteria.

Lost time injury or illness per OSHA criteria.

Substantially damaged but repairable.

Catastrophic/Extreme

Any impacts with respect to NOIRLab Center CA and Programs CSAs, and/or impacting the mission of NOIRLab or any constituent Programs or Services.

Cost variance greater than 15% of total approved FY baseline.

Critical path slips and one or more critical milestones or events cannot be met.

Unable to achieve objectives/goals such that minimum success cannot be achieved or claimed.

Asset is compromised and unrepairable: a total loss and/or personnel loss of life.

Loss of life.

Asset is compromised and unrepairable; a total loss.