International Journal of Engineering

Improving the tools and mathematical methods to diversify the portfolio of oil and gas assets in the face of limited investment, high market volatility, increasing risks and uncertainties at the current level of technology development is a very urgent task. In order to form an effective investment portfolio, the authors proposed asset diversification using cluster analysis, which implies grouping sample objects according to a set of specific features. The method under consideration involves five stages of asset valuation in order to consider those assets in a three-dimensional space, taking into account the specifics of the oil and gas business, including determination of individual asset trajectory, performing spatial approximation, calculating the clustering coefficient, ranking the resulting pairs


INTRODUCTION 1
Formation of investment portfolios for oil and gas companies is considered as a vital necessity to optimize investment activities and achieve maximum positive effect (MPE). The formation of investment portfolios is possible using various methods depending on the final goals [1][2][3][4][5][6][7]. Large consulting companies are increasingly using and offering their clients a variety of robo-advisers and decision support systems in order to form the most suitable asset portfolios for investors [8,9].
This fact confirms the relevance of the ongoing research aimed at improving the mathematical methods applicable to the formation, rebalancing and diversification of investment portfolios [10][11][12][13].
*Corresponding Author Email: nikolaychuk_la@pers.spmi.ru (L. Nikolaichuk) A relevant area for detailed study is the field of data science that considers various intelligent algorithms to solve optimization problems and ranking a large array of input data characterizing the totality of available alternatives [14][15][16][17][18]. With this regards, the described problem can be solved using cluster analysis [19][20][21].
One of the necessary criteria for the formation of an effective investment portfolio is the observance of the principle of diversification, that is, the inclusion in the portfolio of projects, the chance of success of which depends little or does not depend on each other at all [22]. In the described situation, the approach using cluster analysis is well applicable, which implies the grouping of sample objects according to certain criteria established by the analyst [23,24].
The present study includes cluster analysis as invented by Haddad [25] that proposed an innovative method of cluster analysis based on the use of analytical geometry for calculating a common indicator of a clustering model. One of the main advantage of this method is the ability to measure the level of similarity between assets over time. Moreover, it is capable to clearly distinguish investment alternatives in a consistent manner through selection of relevant metrics that allows determination of the level of similarity between projects through a clear graphical representation.
Within the framework of this study, it seems appropriate to use the designated cluster analysis method for oil and gas projects [26,27]. In fact, application of Haddad [25] method seems appropriate for oil and gas projects and this study is aimed to describe cluster analysis in diversifying the portfolios of global oil and gas companies. It should be noted that the use of the described methodology and the proposed set of metrics in order to diversify oil and gas projects is carried out for the first time while two new steps of cluster analysis are proposed by this work taking into account the specifics and characteristics of the oil and gas business.
The study is aimed to improve the existing methodology for conducting cluster analysis for making investment decisions in the oil and gas business. The work includes an analysis of key features that reflect the specifics of oil and gas industry projects in four main areas: geological, environmental, social and economic. An analysis is made of the selected most relevant metrics that best reflect the specifics of oil and gas projects, regardless of the national characteristics of states.

MATERIALS AND METHODS
Making investment decisions is an important part of a company's development strategy. Making a profit from invested capital is the main purpose of the investment. To eliminate erroneous actions, it is necessary to focus on an investment strategy. There are various investment strategies. This work is focused on a balanced investment strategy and includes asset portfolio diversification and a discounted cash flow model [28][29][30].
The methods of multivariate analysis include, such as, cluster, factorial, correlation and discriminant analysis where cluster analysis allows division of objects into several sets not by one parameter, but by a set of features [31,32]. Haddad [25] carried out a practical calculation based on synthetic data, which made it possible to identify two explicit clusters due to the pairwise intersection of the spheres. And the goal of further research of the scientist is to approximate the proposed method based on real data.
According to the proposed methodology, projects are evaluated sequentially in three stages, and the result is consideration of an asset in the form of a sphere in threedimensional Euclidean space.
Present study diversifys the portfolio of international oil and gas assets using the described method of cluster analysis, projects of onshore segment were collected according to Wood Mackenzie. First, analysis of assets trajectory over time was carried out based on the data followed with calculation of the change in Individual Risk Factor (IRF) [33]. This stage is characterized with collection of up-to-date information about assets and calculation of relevant metrics for constructing geometric shapes.
Next, total asset risk factor (ARF) was determined through complex geometric correlations. It is necessary to approximate in space the assets represented by the spheres and determine the volume of the lenses, in case they are formed due to the intersection of the spheres.
At the last stage, degree of similarity of assets was assessed dynamically, in case of any intersection between spheres, through calculation of common clustering indicator limited between zero and one and based on calculated volumes of spheres [34].
The study substantiates a set of 4 metrics and their location in space in such a way that the cluster analysis carried out takes into account the specifics of the oil and gas business. They are selected according to the following algorithm, which is described in detail in the results section: 1) Substantiation of the resulting indicator chosen among classic indicators of the economic efficiency of an investment project. 2) Analysis of specifics of oil and gas projects in geological, environmental, social and economic areas.
3) Justification of three variables that form threedimensional space. 4) Checking relevance of proposed metrics and their location by conducting cluster analysis for international oil and gas investment projects. 5) Formation of conclusions on proposed metrics and analysis, making adjustments.

1. Determination of Individual Assets Trajectory
The calculation of axial variables and the subsequent determination of the relative position based on the indicators of the individual risk factor allows you to determine the spatial trajectory of each asset. Haddad [25] notes the need for a careful choice of variables when using cluster analysis. It is necessary to give preference to those technical and economic indicators, the growth rates of which are comparable. If one of the parameters obviously changes more than the others, the three-dimensional space is stretched, which leads to a reduction in the number of intersections of the spheres and an artificial decrease in the number of clusters. In other words, the results of the study become inaccurate. Based on this provision, we provide a justification for choosing a set of four indicators, the use of which in the selected methodology of cluster analysis of oil and gas projects allows us to obtain relevant results that adequately reflect the degree of diversification between the assets being evaluated.
The advantages of the Haddad [25] methodology consist of using normalized variables. The metrics applied in the research are shown not in absolute values, but in the form of the rate of change of the indicator over the period of time (the term of the investment project). This condition seems to be the key one, as the dynamic study of clustering patterns over time allows us to investigate in the data set not only at a certain point in time.
To determine the key indicator reflecting the economic efficiency of the project, dynamic methods of evaluation of investment projects are considered, as this assumption is initially included in the conditions of the study. It is also taken into account that the proposed tool will be used by the owners (investors) or management of the company, which, in turn, are interested in profit growth; increase in the value of assets; growth of return on invested capital and ensure the stable operation of the company. This condition justifies the chosen theoretical apparatus, which consists in the consideration of the following indicators of commercial attractiveness of the project. Table 1 summarized the criteria and indicators for the commercial effectiveness of the oil and gas project.

2. Choice of Metrics
Analysis of the commercial effectiveness of projects revealed that present value of the project (i.e. Present Value), deferred on the axes of three-dimensional space, is the most appropriate parameter that sets the volume of sphere and represents the resultant indicator for the parameters.
As a variable reflecting the volume of the figure, the change in the total present value of the projects (ΔPV) was used.
To identify the other three variables (i.e. axis), selected specifications of oil and gas projects including geological, environmental, social and economic aspects in the were taken into account.
Each direction, in turn, can be expressed by metrics presented in Table 2. Taking into account the need to limit to only three explanatory variables, generalized metrics have been proposed.
During the present study, social and environmental metrics were excluded mainly due to incomparability with tax laws, non-comparability of environmental legislation and difficulty in obtaining reasonable data on employment rates.
Thus, the indicator reflecting geological specificitythe change of hydrocarbon production volumes (ΔProduction) was taken as the second variable on the Xaxis.
The components of the economic direction of the specificity of oil and gas projects were also analyzed, as a result of which the payback period was excluded (see the explanation above).  Taking into account the fact that investment projects of oil and gas enterprises are highly capital-intensive, the Z axis adopted the indicator -change in specific capital expenditures per ton of oil equivalent (ΔCAPEX/toe). And the Y-axis is the change in specific operating costs per ton of oil equivalent (ΔOPEX/toe), since the main share of costs is formed by remoteness from populated areas and the sales market, the level of infrastructure development (availability of roads, power supply, etc.). Relative values were taken intentionally to bring information on different projects located in different states into a comparable form.
Specific risks inherent in the fuel and energy complex, find their quantitative reflection in the value of the discount rate, which is in inverse relation to the value of PV. Traditionally, in view of the above features of oil and gas industry projects, the structure of funding sources is dominated by borrowed sources. In addition, the fact of high riskiness of activities highlights the influence of discount rate in drafting of a project that implies the significance of an indirect reflection of discount within a selected metrics of cluster analysis that is the PV value in this case. As a result, it is deemed that the use of other metrics in the analysis of oil and gas projects will be inappropriate. Additional metrics selection parameters are summarized in Table 3.

Hydrocarbon Production Volume
Along with changes in hydrocarbon production volume, specific operating or capital costs will inevitably lead to fluctuations in the value of the total cost of the oil and gas project. On the other hand, it is worth noting the directions in which the dependent variables affect the value of the resulting indicator. For instance, an increase in CAPEX/toe, OPEX/toe can lead to a decrease in the final value of the present value of the project while an increase in hydrocarbon production provides an increase in PV assuming a fixed condition for other parameters. However, at the same time, an increase in output in physical units in the oil and gas sector will naturally increase the level of capital and operating costs. Thus, the variables that form a three-dimensional space in the cluster analysis impose a multidirectional effect on project costs while they also interact with each other.
At the same time, local managers of oil and gas projects seek to ensure a greater growth rate of PV compared to CAPEX and OPEX in the case of incremental hydrocarbon production. Consequently, the rate of sphere volume increase, when using cluster analysis, will usually be greater than the rate of change in the selected variables that form the axes of threedimensional space.
Thus, the use for cluster analysis of oil and gas projects of the described technical and economic indicators in the proposed combination reduces the probability of distortion of the results, allows to obtain relevant results.
It should be noted that the use of specified variables is not mandatory and other technical and economic indicators are also potential to be used to achieve maximum reflection of the relevant parameters.
Diversification of the project portfolio for a modern company is an important direction of investment policy due to the volatility of world hydrocarbon markets. Therefore, methods are being developed to correctly form a stable portfolio. For approbation of the research the data of international oil and gas corporation were taken. The proposed metrics can be applied in the cluster analysis not only by country, but also by region.
In the presented study, calculations are carried out in MS Excel using the Power Query add-in for analyzing and structuring data. All estimated input parameters are listed in Table 4.

Criterion Metrics Justification
[X] ∆Production Reflects the key risk of the industry -mining and geological. Shows the income component of the project.
[Y] ∆OPEX/toe Shows the cost part of the project. Prevails in the cost structure.
[Z] ∆CAPEX/toe Shows the cost part of the project. Reflects the environmental risks inherent in the industry.

.4. Spatial Approximation
Total risk factor of assets (the volume of sphere intersection) can be calculated based on individual risk factors wherever spatial intersection of spheres occur. Moreover, having the overall risk factor takes on a value other than zero suggests the similarity of problems for oil and gas projects in the pair under consideration at a given point in time.

5. Calculation of Clustering Coefficient
Clustering coefficient (Rt) was calculated based on the outcome of previous steps to determine the degree of similarity of the spatial trajectories of a set of assets in a given period of time.
According to Haddad [25], the calculation of the clustering coefficient is carried out according to Equation It is worth noting here that the value of the clustering coefficient is possible in the range from zero to one. The zero value of the indicator indicates that the analyzed spheres do not intersect, with a value equal to one, the spheres completely coincide.
Spatial visualization of the analyzed data in threedimensional space is shown in Figure 1 as implemented in Jupyter Notebook software using the Python programming language.
Algorithm for calculating the clustering coefficient is shown in Figure 2.
Based on calculations, we can draw conclusions about the existence of relationships between various oil and gas assets (by country), and, therefore, we can draw Eventually, ranking of the resulting pairs was carried out with the aim of subsequent removal from the sample of alternative assets with a low degree of diversification.
First of all, using the Power Query tool, the crosstab was converted into a bar format with a pairwise comparison of countries in one row by the value of the clustering coefficient, after which the data was ranked from the least to the most diversified. Further, in order to form a portfolio, pairs of countries were selected with a clustering coefficient equal to zero, that is, completely independent of each other in terms of the dynamics of the indicators under consideration.

Figure 2. Algorithm for calculating the clustering coefficient
For the selected pairs, the metric required for the final ranking of projects was calculated, which is equal to the ratio of the total current cost of the pair to the total production potential of the pair (PV/Q). After that, a cross matrix of those countries was compiled, the clustering coefficient between which is equal to zero. Appearing semantic gaps in the matrix (intersections of countries that were filtered by a non-zero clustering coefficient) are automatically assigned the value "Null", which is replaced by zeros during further transformations.
The calculated ratios of the total present value attributable to the production potential (thousand rubles/toe) were used as the main characteristic attribute of the pair. The final matrix for portfolio analysis is shown in Table 5. Selection methodology's algorithm of valid metrics for the investment projects' evaluation in the oil and gas industry shown in Figure 3.
The next and final step in the formation of an investment portfolio of international oil and gas assets with the maximum degree of diversification in terms of the estimated indicators is to solve the optimization problem for selecting a portfolio that provides the highest value of the total PV/Q, and at the same time, includes countries that do not have cross-overlapping in zero cells, i.e. not having a pairwise clustering coefficient R other than zero.

DISCUSSION
Analyzing Table 2, namely the values of the PV/Q indicator, it can be concluded that the projects of the pairs Libya-Turkmenistan, Libya-Saudi Arabia, as well as Libya-Chad and Libya-Yemen are the most attractive for investment. Investment in oil and gas projects in Kuwait, on the other hand, is undesirable due to a negative present value of projects implemented in this country.
The analysis of the PV/Q ratio shows that the most profitable investments are in the countries Kazakhstan -Saudi Arabia. In their portfolio there is not a single pairwise clustering coefficient other than zero, i.e. none of the pairs in this foursome has zeros in the resulting Analysis of assets trajectory over time with calculation Individual Risk factor (IRF) Collection of up-to-date information about assets and calculation of relevant metrics for constructing geometric shapes Total asset risk factor (ARF) was determined through complex geometric correlations It is necessary to approximate in space the assets represented by the spheres and determine the volume of the lenses, in case they are formed due to the intersection of the spheres Degree of similarity of assets was assessed dynamically   Formation of conclusions on proposed metrics and analysis, making adjustments matrix when intersecting with each other. At the same time, investments in each pair assume a positive PV/Q, which, together with a high degree of diversification, with a significant degree of confidence, allows you to provide income for the investor company. It is important to understand that the results of the analysis are a demonstration of the work of the proposed method and are not real recommendations for investment, since the forecast values are generated synthetically.The results of the analysis are clearly illustrted in Figure 4. It is worth noting that portfolio management in the context of high market volatility in the current environment and increasing risks in the global oil and gas business requires the development of new approaches to the formation and diversification of the investment portfolio. In order to further develop the methodology, a simulation model can be developed. Simulation of the growth rates of the indicators considered in the work will allow us to track the change in the position of clusters in space, the volume of spheres in the dynamics.
Consequently, on the basis of such analysis, it becomes possible to move away from averaging and assess how clusters interact with each other from year to year. The experiments carried out with the simulation model will make it possible to quantitatively assess the stability of a differentiated portfolio in the event that one or more of the projects included in it undergo changes in the indicators used in the cluster analysis. The sets of scenarios developed based on the results of experimentation with the simulation model will reduce the uncertainty in investment decisions.

CONCLUSION
The expansion of the used methodology through the use of simulation modeling methods will ensure a quick response to changes in external environment and timely adaptation of the portfolio to new conditions. Implementation of multi-criteria sensitivity analysis performed by creating a simulation model improves the quality of portfolio risk management. In this regard, it becomes possible to quantitatively assess the probability of portfolio stability reduction and develop in advance the order of actions in case of risk situation occurrence. The methodology modernized in this way allows increasing the accuracy of economic analysis in conditions of high volatility.
The set of metrics and their location in space, proposed in work, on the one hand, reflects features of oil and gas sector, and on the other hand, allows ranking attractiveness of investment projects implemented in different countries. Article presents the modernization of cluster analysis method, focused specifically on the oil and gas industry. The most adequate results are obtained when using the value of ∆PV as an individual risk factor, since it is the resulting indicator for other metrics and accumulates the most key risks of the project. The general risk factor is determined by using ∆OPEX/toe, ∆CAPEX/toe, ∆Production as metrics. These metrics prevent unrealistic results from analysis and allow identifying countries whose investments in oil and gas business are highly diversified.
There are limitations in study, since analysis includes three variables, metrics proposed are generalized. The study does not take into account force majeure, natural disasters, market price collapses, occurrence of which will affect the values of selected metrics. Also, political risks, changes in legislation in each individual country are not taken into account, proposed metrics are aimed primarily at comparing attractiveness of investment projects from a financial point of view.
Due to the fact that the management of an oil and gas company is interested in the growth rate of the present value of the project ahead of the growth rate of costs when increasing production volumes, it is considered appropriate to use these variables in the combination discussed in this paper.
The application of the described methodology with the set of indicators justified in the work can be recommended for implementation by large oil and gas companies in order to diversify their business internationally in an effort to maximize profitability at the lowest risk and to provide the highest return on invested capital.
This study is aimed to project managers of oil and gas companies, namely those who make the decision to include a particular project in the investment portfolio.