Hey, Habr!
Murphy's laws state:
- Anything that can go wrong will go wrong.
- If there is a possibility of several things going wrong, the one that will cause the most damage will be the one to go wrong. Corollary: If there is a worse time for something to go wrong, it will happen then.
- If anything simply cannot go wrong, it will anyway.
When you have an IT, which supports all aspects of your organization's automated performance and you have a firm confidence that IT executes well, there is always a chance that something could go wrong in technology or in related IT processes. Depending on complexity of IT environment varieties of risks scenarios could arise. This article as a summary of different faithworthy sources, which aims to help you in getting high level understanding on what could go wrong with IT and how you can predict it in a more conscious way.
My name is Maxim Tornov and I have been working in various IT areas for a long period of time. Since then, for over 14 years I am working in the area of IT/IS risk management with focus on in IT/IS audits, internal control implementation and assessment.
I am sure that at the present the topic of Information Technologies risk management became more vital. Organization’s efficiency in IT risk management directly affects the achievement of various organization’s goals, goals which have dependency on IT, those goals may include reliability and efficiency of business processes, the organization's compliance with regulatory requirements, the integrity of financial reporting, and many others.
I sincerely hope that this material will be useful to you and may give you some new ideas that you can contribute to the benefit of your personal development and the development of your organization's risk management culture.
What will you learn from this article?
What is risk and risk categories
What is IT risk
How IT risk is linked to business functions
The difference between "RISK CAPACITY" and "RISK APPETITE"
IT risk management
Who are the participants in the risk management process
IT risk management steps
The value of effective IT risk management
Risk - what can go wrong? Definition of risk
Risk has many definitions*, in my opinion the most accurate definition of risk is: "Risk is defined by COSO as the possibility that events will occur and affect the achievement of strategy and business objectives". This definition was given by The Committee of Sponsoring Organizations of the Treadway Commission (COSO).
*For myself, I formulated such a definition of risk - "Due to action / inaction, the result will not meet expectations or plans." A little further we will analyze the variations of risks / risk events in more detail.
Risk can be discussed in qualitative and/or quantitative terms, and the definition of risk may differ depending on the source of information used. However, the fundamental essence of risk is that risk determines the probability or likelihood that an event will occur and what the consequences of this will be for the organization.
Risk categories
There are many categories and types of risks. For clarity, I will note the most important ones, in my opinion, and give you some examples of what can go wrong:
Financial - errors in accounting, the organization's financial statements contain errors, inaccuracies, or do not contain important information for stakeholders
Credit - non-repayment of the loan by the borrower
Market - the price of an investment instrument drops lower than expected
Operational - deviations, inefficiency in the operating activities of the organization. At the same time, for example, the category of operational risks can include Contractor/Contract risk - the contractor completed the project with a lower quality, or did not complete it at all. Also, very often, the category of operational risks includes the risk of IT / IS - the key IT system is working with errors, personal data has been leaked, etc. Generally, IT risk is a subgroup of business risks.
Relation between IT and business functions of the organization
The main purpose of IT is to help business to achieve the mission and goals of the organization. Each line of business creates an IT system that supports its business function. Thus, the higher the automation of the organization's processes, the higher the likelihood that something will go wrong in the automated processes or tools, i.e. information technologies.
What is IT risk?
The European Banking Authority gives, in my opinion, the most accurate definition. IT risk is the risk of loss to an organization caused by:
privacy violation,
failure of the integrity of systems and data,
incorrect operation, or inaccessibility of systems and data
or the inability to change the IT system, at a reasonable time and cost, while the operating environment and/or business requirements change (i.e. the speed of change)
Also , IT risk includes information security risk (IS) resulting from:
inadequate or incorrect internal processes of the organization, or external events, including cyber attacks or inadequate physical security system
Relationship between IT risk and other risk categories
In the case of an IT risk realization, an IT-related risk event, potentially, like a house of cards, such an event triggers the realization of risks from other business risk categories, a more illustrative example is shown in the picture below.
*We will talk in more details about what can be done in each specific case in several examples below, in the Appendix to this article.
Risk tolerance, risk appetite and risk tolerance level
Risk can be measured and risk can be managed. There are various tools for this. In my opinion the most important are:
Acceptable level of risk ( RISK CAPACITY ) is the target amount of losses that an organization can withstand before the successful functioning of the organization's business is called into question
Given the acceptable level of risk, the business owners or the board of directors of the organization set the risk appetite ( RISK APPETITE ). Risk appetite is defined as the amount of risk an organization is willing to accept in order to achieve its mission
The level of risk tolerance is the deviation from Risk Appetite. Such deviations are not desirable, but they are known to be well below the Acceptable level of risk
For more clarity, I will give you some examples:
Acceptable amount of a particular IT system downtime per year: The total amount of IT system downtime does not exceed 100 minutes per year. The IT system is available 99.99% of the time per year.
Permissible amount of monetary losses from downtime/failure of the IT system per year: No more than 0.00001% of the revenue stream generated by this system.
Permissible number of a particular type of failures/errors of the IT system/reports per year: No more than 2 failures/errors per week during the operation of the IT system/reports
IT risk can be measured
As mentioned earlier, risk can be measured both quantitatively and qualitatively. To measure, various metrics are used, I will give you an example of several metrics recommended by the ISC2 organization.
Exposure Factor
EF - Impact factor - measured in % of losses that the organization can incur if the asset is subject to the realization of the risk
Single Loss Expectancy
SLE - One-time expected loss, the value inherent in the one-time realization of risk in relation to an asset
Asset Value
AV - Asset value
Annualized Rate of Occurrence
ARO - Frequency of risk realization per year
Annualized Loss Expectancy
ALE - Expected annual loss from risk realization
Risk Quantification
E.g. using the metrics mentioned above, it is possible to quantify potential losses if the risk inherent in IT materializes, for example:
AV = $200,000
EF = 45%
ARO = 2 times
SLE = AV * EF
ALE = SLE * ARO
In this way:
SLE = $200,000 * 45% = $90,000 If the risk on the asset materializes, the entity is potentially expected to lose $90,000
ALE = $90,000 * 2 = $180,000 if the risk materializes 2 times, the organization will lose twice as much
Well, what does it give us? Knowing about potential losses, even approximately, the organization's management will be able to more accurately allocate costs, focus the necessary forces, expertise and efforts, and make more informed management decisions
Qualitative risk assessment
Qualitative risk assessment is an easier way to assess the likelihood of risk event occurrence, realization of risk. However, it might require more expertise and the involvement of experts from various fields, including representatives of business, IT, information security, and external consultants.
A qualitative risk assessment is usually performed by assigning a risk level of its likelihood or impact on a particular goal of the organization e.g.:
High / Higher / Red marker
Medium / Medium / Yellow marker
Low / Lower / Green marker
Usually the most accurate risk assessment is achieved by using both quantitative and qualitative risk assessments at the same time.
Examples of IT Risks
In my opinion, the risks inherent in IT are most fully and accurately formulated in the International Auditing Standard No. 315 (ISA 315). All other versions of IT and IS risks can be derivated from these categories/general formulations :
Reliance on systems or programs that are inaccurately processing data, processing inaccurate data, or both
Unauthorized access to data that may result in destruction of data or improper changes to data, including the recording of unauthorized or nonexistent transactions, or inaccurate recording of transactions. Particular risks may arise where multiple users access a common database.
The possibility of IT personnel gaining access privileges beyond those necessary to perform their assigned duties thereby breaking down segregation of duties
Unauthorized changes to data in master files
Unauthorized changes to systems or programs
Failure to make necessary changes to systems or programs
Inappropriate manual intervention
Potential loss of data or inability to access data as required
Examples of IT risks realization (risk events occurrence)
Just to have a better feeling for what could go wrong I will quote the most recent excerpts from publicly available sources of information. As I mentioned at the very beginning of the article "Due to action / inaction, the result will not meet expectations or plans". This phrase applies to every event below:
IT risk management
So this brings us to a question - What shall or can you do? As I mentioned earlier, IT risk can and should be managed. The overall approach differs little from the usual approach to managing any other risk category.
Let's clarify the term "Risk Management". ISACA (CRISC) gives the following definition - "Risk management is defined as the coordinated activities to direct and control an enterprise with regard to risk".
Risk can be viewed in the context of the likelihood that the organization's objectives will not be achieved. Thus "Risk Management" is a way of predicting such a probability, and/or reducing the chances of occurrence and/or reducing the consequences of risk occurrence. At the same time - Effective risk management can maximize the opportunities of the organization.
Three lines of defense. Participants in the risk management process
There is a generally accepted approach to organize risk management process. Main goal of this approach is to increase the efficiency of the risk management process and reduce the likelihood of violating the principle of separation of duties. The approach implies the separation of the functionality of participants in the risk management process and their logical division into "Three lines of defense" within organization. Let's talk about them in a little more detail:
The first line is business. Everyone who participates in the day-to-day activities of the organization. For example, sales/purchasing manager, account managers, IT departments, finance department employees, etc.
The second line is risk management experts. Specialists with deep knowledge in risk management area. For example, an internal control function, a risk management function.
The third line is the audit function. The organization's independent review body verifying that the first line is effectively following the rules set by the second line.
At the same time, there may be an infromal fourth line of defense - these are regulatory bodies, an external auditor and other external stakeholders.
IT risk management steps
IT risk management is a cyclical process. Next, we will analyze each step in a little more detail.
IT Risk Identification and Risk Appetite Determination
IT risk identification is the process of detecting, recognizing and documenting the risk to which an organization is exposed.
Assessing and prioritizing identified IT risks
IT risk assessment is the analysis of risk scenarios, their prioritization and evaluation. The assessment can be either qualitative (high/medium/low) or quantitative (inaccessibility of the IT system in minutes/monetary loss from the unavailability of the IT system, loss of data).
IT Risk mitigation or reduction
IT risk mitigation is the development of measures that help reduce the likelihood of realization of various risk scenarios identified at the “Risk Identification” step. Measures can be policies and procedures, restricting access and monitoring the actions of users of IT systems, security settings for IT systems, backups processes, and many others measures which aim to reduce the probability of risk scenarios occurrence.
IT risk monitoring and control, risk reporting
IT risk monitoring includes the development of key risk indicators, monitoring and evaluating the effectiveness of processes and procedures aimed at risk reduction and actualization, updating the risk profile (the list of risks inherent in IT).
This is the final stage of a risk management circle, normally, as a minimum, the whole cycle is repeated annually, in some situations more frequently e.g. on as needed basis but not less than annually.
The value of IT risk management
So what does it give us? What does the organization's IT risk management process provide in return for substantial investments? Effective and regular IT risk management is important to the organization for the tangible benefits it brings. Here are some examples:
Creating a risk-driven culture with less reliance on individuals, which in return increases business processes stability and the likelihood of successful projects completion
Prioritizing risk response efforts in line with the organization’s goals and priorities enhances the organization's ability to achieve goals and create value
Proactive identification of risks ie. threats, vulnerabilities and after-effects, in return enhances the control and security over organisation's assets, reduces potential losses, and systematizes the efforts and approach in terms of compliance to regulators' requirements
Improving the performance of systems, incident management processes and business continuity increases the predictability and reliability of the organization's processes
Enhancing control, monitoring and reporting processes, including access to more accurate and timely information, simplifies the decision-making processes and, as a result, increases shareholder confidence
In conclusion
Risks associated with IT must be managed. And managed on regular and consistent basis, as effectively as possible. Efficient IT risk management brings tangible benefits and advantages for business.
Thank you for reading this material to the end and I sincerely hope that you have learned something new and the information received will be useful to you further!
Appendix. Examples of insufficient attention to IT risk management
After consulting with readers and colleagues, I decided to add some real, but impersonal and slightly simplified examples from my practice, when specific IT risks were implemented or could be implemented with sufficient ease, due to lack of control, and to one degree or another led to or could lead to real losses of organizations. I hope these examples help you better understand what can go wrong with IT.
Example 1
Russia. The company is a developer of technical, industrial equipment. The company conducted a regular audit of financial statements.
As part of an audit of financial statements, an independent auditor must evaluate the impact of a number of risks inherent in IT on the reliability of the financial reporting process, financial statements and the reliability of the reporting itself (ISA 315, PCAOB 2110).
At some point during the financial year, there was a threat of infection of the company's systems with the PETYA encryption virus (if you are interested, you can find more detailed information on the WWW regarding PETYA). Due to the fact that the company's management regularly ignored the auditor's reports of noted deficiencies in internal control, specifically in IT control over IT, the virus penetrated the company's systems and infected (encrypted) approximately a little more than 50% of all company systems. The list of systems also included backup systems and some of the key systems involved in the process of generating financial statements.
The company survived this situation, but incurred significant operational loses. Also, part of the accounting data was restored for several weeks by specialists manually, based on the available paperbased information. This could have been avoided with effective control over IT risks, such as:
Availability of anti-virus software and its timely and regular update
Having a timely and regular process for installing updates and critical patches for company key systems
Existence of a process for informing and educating users about the basics of information security
Availability of an efficiently-protected backup storage with key, significant information for the company
Example 2
USA. Large retail chain. At the time of the described observation, the company had a network of about 800 stores in different states of the United States. POS terminals and POS servers were installed in each store, collecting information about sales. Since there are many stores, the POS system for both terminals and servers was configured centrally at the central office. After the new version of the system was ready, the specialists installed it in all stores remotely or from physical media.
A potential problem arose when, as a result of the audit, it turned out that in the image (the image is a ready-made POS system configured for installation on disk) there were accounts of administrators who were dismissed, retired or transferred to other positions, departments several years ago, before the audit.
Thus, for several years, those dismissed, retired, transferred administrators had full access to POS terminals and servers in all 800 stores in different states.
In this particular case, no damage was found, however, in the case of, for example, resentment on the part of the dismissed administrator, if the dismissed/transferred employees had access to the company's network, the retail chain could lose part of its key systems in stores and, as a result, incur financial, reputational losses, as well as suffer from sanctions by regulators.
This risk could be mitigated with effective IT risk controls, such as:
Reduce the number of pre-installed accounts to the minimum necessary and regularly check their relevance, each time a new system was ready to install an update
Implement regular checks of systems for unlocked accounts of employees who have been laid off or transferred to another position/department
Example 3
Europe. The largest beer producer. In one of the largest brewing companies in the world, SAP ERP was implemented to automate most of the business processes. During one of the external audits, discrepancies were found in accounting and reporting records, for example, significant discrepancies were found in the proceeds from the sale of beer on the company's accounts, unknown counterparties in masterdata, established discounts for a number of counterparties, while not authorized by anyone.
After analyzing the collected information, it turned out that the team of the contractor who configured SAP ERP had unlimited permissions (SAP_ALL, DEBUG), while the analysis of the actions of users with such permissions was not carried out on a regular basis. After an internal investigation and clarification of the details, the company terminated the contractual relationship with the contractor, and also significantly updated the IT and management team responsible for product sales and master data management (MDM).
This situation could be avoided by reducing the risks of violating the principles of separation of duties and unauthorized changes, for example:
Remove or restrict as much as possible access to SAP_ALL, DEBUG
Implement a formalized procedure for granting/blocking user access to such profiles
Implement regular monitoring of the activity of users with such critical permissions
Implement a mechanism that allows you to control the persistence of tested and agreed changes in the system
Log critical actions in a separate storage that is not accessible to users with SAP_ALL, DEBUG access
Limit the contractor's access to a limited period of time, or the duration of the contract and with necessary level of authority
Example 4
Finally, a slightly comical story told by my colleague Peter M. On one of his audit projects. While the audit it turned out that the company does not have a formal document regulating the backup procedure, in particular, what to copy, when and where to store. As a result, the owners of systems and data had one vision of what needs to be protected, preserved, as well as the priority and depth of the stored information, and the IT team had its own vision of what is important and requires the creation and storage of backups, as a result, at the moment, when something went wrong, the company simply could not recover the necessary information due to the fact that no one had saved it. Conclusion - formalize priorities and objectives of backups, coordinate requirements and agree them with all participants of the process.
And a little icing on the cake for those who have read the example appendix to the end: I recommend watching the Office Space movie (just google it). The film gives an example of the implementation of IT risk, I will not reveal the details, enjoy a great movie?