Joint optimization of condition-based maintenance policy and buffer capacity for a two-component self-repairable serial system

Highlights Abstract ▪ A new competitive interaction process between external shock and internal degradation process based on random external shocks is proposed. ▪ A new self-repair mechanism is proposed. ▪ The inside-out method for calculating the reliability of the system at any time during this process is given. ▪ Dynamically relates the time and effect of each repair to the magnitude of the component by external shocks. This paper investigates a self-repairable serial system with two components and a buffer. Competitive failure processes are considered due to the internal degradation and external shock processes of components. The system reliability is calculated based on the integration of the internal degradation process and external shock process. When one of the components deteriorates to the PM or CM thresholds, it is restored to an imperfect state under dynamic time limitations based on the previous internal degradation and external shocks. As for the other component, it needs to be repaired or not according to the reliability of the component; it needs to be shut down or not based on the buffer status and the allocation of the component in the system. The optimal initial buffer capacity setting and PM threshold at minimum cost are found by minimizing the system's total cost in a given running cycle. Finally, numerical and case studies are provided to demonstrate the feasibility and superiority of the presented model.


Introduction
In modern manufacturing enterprises, serial production systems are widely used.For example, in a serial automobile production line, any failure or maintenance activity can lead to a shutdown, which disrupts the production plan, affects the production schedule, and causes large-scale production loss.In order to improve the sustainability of production and system reliability, buffered serial systems are applied to promote the productivity of enterprises [1][2][3].In recent years, the optimization of maintenance policy for buffered serial systems has received attention from scholars [4][5][6][7].Fitouhi [4] et al.(2017) considered a two-component flow system with a finite buffer capacity.The degradation of a component is divided into several discrete states.Maintenance is triggered when the specified degradation state of the component exits.The optimal maintenance strategy is presented by minimizing the total cost of maintenance.
Zhang [5] et al. ( 2022) analyzed all the system states of a serial manufacturing system with fixed buffer capacity.The transfer probabilities and sojourn time between states of the system were analyzed under the semi-Markov process.The optimal maintenance threshold and buffer capacity were derived by minimizing the long-run average cost of the system.Wei [6] et al. (2023) presented a condition-based maintenance policy for a serial system consisting of two components and an intermediate buffer with finite capacity; the various side effect costs, including the cost related to quality loss, are considered.Gan and Shen [7] (2023) proposed a maintenance strategy for a serial system operating exposed to shock environments; two types of stochastical shocks are considered, which increase the defect rate or cause failures.However, in the articles mentioned above, it is usually assumed that components or systems suffer from degradation or shock processes, and dependent failure processes of components are rarely considered.In practical engineering, systems commonly undergo multiple failure processes due to various internal and external factors [8][9][10][11][12][13][14][15].For instance, a car tire fails due to internal degradation caused by wear and tear or punctured by a nail from an external impact on the road.Therefore, many scholars proposed competing failure processes to analyze the serial systems.Zhou [8] et al. (2016) analyzed a kind of leasable equipment with an indefinite lease term, where the equipment is subjected to a competing failure process of continuous internal degradation and external random shocks during the lease term.The optimal preventive maintenance program is determined by minimizing the cumulative maintenance cost over the lease term.Yang [9] et al.
(2017) investigated a system with competing failure modes of internal degradation and external shock.The failure process of the system was categorized into three states with different degradation rates in different states.The optimal combination of system maintenance strategies and inspection intervals was obtained by minimizing the expected cost per unit time.Liu [11] et al. (2020) studied an uncertain complex system with competing failure processes of internal degradation and external shocks.The internal degradation process was described as an uncertain degradation process.The external shock is described as an uncertain renewal reward process.The reliability of the system was solved and evaluated based on uncertainty theory.
Li [12] et al. (2021) investigated a phased mission system (PMS) in which the reliability of the system is affected by both internal degradation and random external shocks.Proposed a PMS reliability assessment model based on the Markovian Regeneration process (MRGP), and verified the model using Monte Carlo simulation.Zhang and Zhang [14] (2021) investigated a system with a competing failure process consisting of an internal continuous degradation and an external random shock.The degradation model of the uncertain stochastic reward process is developed for the external random shock process.An optimal preventive maintenance strategy is obtained by solving for the system's reliability and minimizing the average maintenance cost.Lyu [15] (2021) et al. established a system with interdependent competing failure processes.Soft failure is determined by the amount of degradation and the failure threshold, and hard failure is represented by three different shock models, and the degradation rate of soft failure changes when the number of shocks reaches a certain value.
Combining the soft and hard failure reliability models, a closedform reliability function is derived.Finally, the validity of the model is verified by taking microelectromechanical system as an example.Although all of the above studies consider the competitive failure process, most of them assume that internal degradation and external shocks are not related.However, in real production, internal degradation and external shocks are interdependent.For example, a large external shock on a machine may trigger consequences such as an instantaneous increase in the amount of internal degradation.This information motivates the first research question.
Research Question 1: Are the internal degradation process and the external stochastic shock process just competing?Is there some kind of interaction between them?
After one component fails, it needs to perform maintenance or complete replacement (CR) [16].Maintenances are usually categorized into several types, they can be perfect or imperfect, such as preventive maintenance (PM) [17][18][19][20], corrective maintenance (CM), opportunistic maintenance (OM) [21], and breakdown maintenance (BM).Unlike in the past, maintenance strategies are now combined.For example, a long-lived machine can be maintained differently at different times: initially, the machine has a small amount of degradation, and only simple PMs need to be performed, but as the life of the machine increases, it needs to be combined with other types of maintenance to ensure that the machine works normally.
Wang [22] et al. (2020) studied an optimal PM policy for a general time-distributed system.In which PM is periodical, it can be performed in a specified time window.A certain number of imperfect maintenance (IM) needs to be performed before each CR.The state transfer equation and the system steady-state availability are derived based on the supplementary variable method.The optimal PM policy is obtained by maximizing the steady-state state availability.Chen [23] et al. (2022) studied a system with a multi-component continuous degradation process.The components are maintained by setting fixed PM, CM, and OM thresholds during a two-stage inspection of the system.The optimal combination of inspection intervals, PM, CM, and OM thresholds was obtained by minimizing the longterm expected cost of the system.Gao [24] (2023) et al.
investigates a system where the degradation process obeys a linear drift Wiener process, using periodic inspection and PM as the maintenance strategy, a method of optimization of periodic inspection and PM for the detection window is proposed, and the optimal maintenance strategy is obtained by minimizing the cost rate of the system for long term operation.
Although the above articles give reasonable maintenance strategies, they usually assume that maintenance is instantaneous or takes a fixed amount of time.However, in real production, the time for maintenance is variable, and it is affected by a variety of factors, such as the proficiency of the workers, the time that the equipment is in use, and the environment of the equipment.Different types of maintenance may require different times, and maintenance at different life stages of the system may also require different times [25,26].
Chen [25] et al. ( 2021) analyzed a multi-component system.The degradation of components follows a chi-square continuous Markov process.It is also restricted to performing maintenance only serially, with an arbitrary distribution of time for each maintenance.The optimal maintenance strategy is obtained by taking the stationary availability and the expected performance capacity of the system as objectives under the constraint of the average maintenance cost per unit of time.However, the method of randomizing maintenance times mentioned above does not work for production systems with limited variable buffer capacity.Because for this system, maintenance time directly affects buffer stocks.It needs to be ensured that the buffer stock cannot be full or become empty during the maintenance time.Otherwise, the system will be shutdown, which can cause huge economic losses.This information motivates the second research question.
Research Question 2: How do we represent the dynamic maintenance time?How do we set the appropriate initial buffer so that the system does not shutdown during maintenance?All of the above maintenance belongs to human maintenance.However, moving into the Industry 4.0 Era, maintenance is no longer limited to human labor.In order to reduce the cost and time of system maintenance, modern systems are mostly equipped with the ability to self-repair.For example, an advanced modular robot can be viewed as a serial multi-component system.When a component fails, the robot can remove or replace a specific part of the failed component and reorganize the normal component into a new independent robot entity.Different systems self-repair in different ways, and in recent years, scholars have been working to explore more possible ways of self-repair [27][28][29][30].Cui [27] et al. (2018) introduced the concept of self-repair and gave a quantitative measure of the effect of self-repair.He considered that the effect of self-repair may be permanent or limited.A cumulative shock model based on the counting process is developed and described.
Zhao [28]  (2020) investigated optimal task abort strategies for systems subject to controllable shocks.She expanded the definition of self-repair: when the risk of system failure is too high, the system's tasks will be aborted to protect the system.The failure of a system is a competing failure mode of major and minor failures.The decision to abort a task is considered based on the task duration and the number of minor failures experienced.The optimal task abort policy is obtained by minimizing the desired total cost of the system.Shen [30] et al. (2023) analyzed a system operating in a shock environment with limited selfrepair resources.Two self-repair strategies and trigger conditions were designed separately.A new reliability assessment model was established under different strategies.In the studies mentioned above, the self-repair mechanisms were all for external shocks.However, as mentioned earlier, the failure process of today's devices is mostly described as a competing failure process of internal degradation and external shock.It is clear that the above self-repair mechanisms are no longer satisfied for devices with competing failure processes.
Moreover, these studies assume that the effect of self-repair is positive.However, in real production, there is a certain probability that the machine will be maintained with negative effects during self-repair.For example, the external degradation threshold of a device decreases rather than increases after selfrepair.This is likely to happen due to some error in the device.This information motivates the third research question: Research Question 3: Does self-repair have an impact on external degradation processes?How do we quantitatively represent the uncertainty of self-repairing effects?In these uncertain conditions, how do we find the optimal maintenance strategy?
In summary, our contributions to the existing theoretical and practical research are summarized as follows: • A new mechanism for the interaction of internal degradation process and external shocks process is proposed based on the shortage that the processes of internal degradation and external shocks are only competitive without interconnecting.
• A new self-repair mechanism is proposed: a self-repair coefficient is introduced into the external shock threshold, which obeys a truncated normal distribution over a certain interval.This interval varies continuously with component degradation, and this interval will move to the right by a fixed length after each CM to indicate the self-protection effect of the machine at low reliability.
• Inside-out reliability calculation method: the internal degradation process reliability and the external shock process reliability of the machine are calculated separately, and the total reliability of the machine in both degradation environments is finally obtained.
• Dynamically relates the time and effect of each maintenance to the magnitude of the component by external shocks, and obtains the optimal initial buffer by minimizing the system cost in a given running cycle.
The rest of the paper is organized as follows.Section 2 describes the notation used in this paper.In section 3, the composition of the serial system with a buffer, the operation process, the principle of the competitive interaction failure process, and the system self-repair mechanism are described, and a numerical case illustration is given.Section 4 gives the inside-out system reliability calculation method at any moment.Section 5 presents the dynamic time and effect of imperfect PMs and CMs based on the magnitude of external shock.Their quantitative descriptions and a numerical case are also given.
Section 6 summarizes the dynamics of the buffer and cost of the system for all 23 possible repair cases, and the objective function and some necessary constraints on the total cost concerning the initial buffer and PM threshold are given.In section 7, the feasibility and superiority of the model are illustrated by comparing and analyzing the system's minimum cost in a given maintenance cycle.Finally, a sensitivity analysis of the parameters is performed.In section 8, the paper is summarized, and future research directions are given.

Notations and explanations
Some important mathematical notations used in this paper are listed in Tab. 1.

Notations
Explanations The number of machines The overall cumulative internal degradation of the  ℎ machine at time   2 () The magnitude of the  ℎ shock to a machine before the time

𝐷 𝑎 (𝑡)
The cumulative internal degradation of the  ℎ machine itself

𝛥𝑥 𝑎1
The instantaneous increase in the amount of internal degradation caused by an external shock

𝑁(𝑡)
The total number of random shocks to the system before the time   1 () The internal failure threshold of the  ℎ machine at time

𝑇 𝑎2 (𝑡)
The external failure threshold of the  ℎ machine at time

𝛥𝑇 𝑎1
The instantaneous decrease in the internal failure threshold caused by an external shock

𝛥𝑇 𝑎2
The instantaneous decrease in the external failure threshold caused by an external shock

𝑇 𝑎1
The original internal failure threshold of the  ℎ machine

𝑇 𝑎2
The original external failure threshold of the  ℎ machine

𝑡 𝑖
The time of the shocks' occurrence

𝑅 𝑎1 (𝑡)
The reliability of the  ℎ machine for the internal degradation failure process at the time   2 () The reliability of the  ℎ machine for the external degradation failure process at the time

𝑅 𝑎 (𝑡)
The total reliability of the  ℎ machine at the time

𝜆 𝑎
The parameter of the Poisson-distributed random shock suffered by the  ℎ machine

𝜇 𝑎2
The expected value of the distribution of the random shocks

𝜎 𝑎2
The variance value of the distribution of the random shocks

𝜌 𝑎
The slope of the change in the external failure threshold during the self-repair process

𝜑 𝑎𝑖
The external failure threshold at the moment before the self-repair process of the  ℎ machine occurs

𝛿 𝑎𝑗𝐼
The time for the  ℎ machine to perform the  ℎ internal part of PM

𝛿 𝑎𝑗𝑂
The time for the  ℎ machine to perform the  ℎ external part of PM

𝛿 𝑎𝑗
The total time for the  ℎ machine to perform the PM

𝜏 𝑎𝑗𝐼
The time for the  ℎ machine to perform the  ℎ internal part of CM

𝜏 𝑎𝑗𝑂
The time for the  ℎ machine to perform the  ℎ external part of CM

𝜏 𝑎𝑗
The total time for the  ℎ machine to perform the CM  1

𝑃𝑀
The effect of the  ℎ internal part of the PM to the  ℎ machine  2

𝑃𝑀
The effect of the  ℎ external part of the PM to the  ℎ machine  1

𝐶𝑀
The effect of the  ℎ internal part of the CM to the  ℎ machine  2

𝐶𝑀
The effect of the  ℎ external part of the CM to the  ℎ machine

𝑇 1
The reliability threshold that the machine needs to perform PM  2 the reliability threshold that the machine needs to perform CM

𝑙
The detection interval  1

𝐶𝑀
The effect of the  ℎ CM to the overall cumulative internal degradation of the  ℎ machine at time   1 The processing speeds of M1

𝑉 2
The processing speeds of M2

𝑉 𝑚𝑎𝑥
The maximum capacity of B

𝐶 𝑃𝑀
The cost of a PM

𝐶 𝐶𝑀
The cost of a CM

𝐶 𝑆
The cost of downtime per unit of time

𝑋 𝑃𝑀
The number of times the system performs PM

𝑋 𝐶𝑀
The number of times the system performs CM

𝑋 𝑖𝑗
The number of  ℎ sub-cases in which the  ℎ case occurs

𝑇 𝑆
The total time the system is shutdown

𝐶 𝑇
The total cost   The given system operation cycle time

System and the competitive interaction failure process
A serial components system with a buffer device can be simplified to a 2M1B system (Fig. 1), which consists of three main parts: an upstream machine (M1), a downstream machine (M2), and a buffer device(B).Raw materials will be fed into M1, and after processing the semi-finished products will be stored in B, M2 will process the semi-finished products in the buffer again and finally output the finished materials.In real life, M1 and M2 have their fixed production speed.Please note that a certain amount of initial stock needs to be placed in the buffer device before the system is started.This measure is to prevent the system from being shut down due to the inability of another machine to work when one machine fails.For example, if the stock in B becomes empty during the maintenance period after the failure of M1, it will lead to a shutdown of the system due to the inability to output, and if the stock in B becomes full Eksploatacja i Niezawodność -Maintenance and Reliability Vol. 26, No. 2, 2024 during the maintenance period after the failure of M2, it will also lead to a shutdown of the system due to the inability to input.In terms of the machine failure process, the external shock failure threshold undergoes several positive self-repair processes before failure, and therefore its trend is a stepped line with a slope.In terms of the results of machine failure, since the failure time of the external shock is earlier than the internal degradation's, the machine fails due to the external shock.However, in the total time simulated, the maximum expected running time of the machine is the internal degradation failure time of 590, and the actual running time is the external degradation failure time of 430, which reaches 72.9% of the total runnable time.This shows that the self-repair mechanism can increase the life of the machine to more than 72% in extreme cases, which is a very necessary and effective measure to ensure the normal operation of the machine.

Inside-out Reliability Calculation
In the internal degradation failure process, let  1 () denote the overall cumulative internal degradation of the  ℎ machine at time  and   () denotes the cumulative internal degradation of the  ℎ machine itself, which is not affected by external shocks.Therefore, at any time  , the overall cumulative internal degradation of the machine can be calculated when the number of external shocks to the machine before time  is known to be : Let () denote the total number of random shocks to the system before the time  .Therefore, the reliability of the  ℎ machine for the internal degradation failure process at the time  is: (2) In the case of external degradation failure, assuming that the Therefore,  1 () can be further expressed as: , ( = 1,2) (4) Let  2 () denote the magnitude of the  ℎ shock to a machine before the time , then the probability density function of its magnitude is: External shocks can also trigger the self-repair process.In Therefore, the reliability of the  ℎ machine for the external shock failure process [33] at the time  is: In terms of the machine as a whole, a sufficient condition for the machine to work properly at a given point in time is that neither internal degradation processes nor external shock processes have failed the machine.We can think of these two processes as being in series, and the machine as a whole is only safe if neither process reaches its threshold.In summary, the total reliability of the  ℎ machine at the time  can be calculated as: Let  1 denote the reliability threshold that the machine needs to perform PM, and let  2 denote the reliability threshold that the machine needs to perform CM.This idea of setting a common maintenance threshold for two machines is based on conclusions drawn from the research of Wei [6].He pointed out that "In a continuous flow manufacturing system with two machines and a buffer device (2M1B system), the selection of maintenance actions should be considered from the point of view of the system as a whole rather than a single machine." And assume that  1 <  2 , depending on the actual situation.The reliability of the system within the system is not known at all times, there is usually a detection interval, denoted as  .
However, when a machine fails due to internal degradation or external shock, the machine quickly performs maintenance activities rather than waiting until the next inspection time arrives.The flow chart for system detection and maintenance is shown in Fig. 4.

Fig. 4. Flowchart of system detection and maintenance
When the machine's reliability  2 is detected to be lower than  1 and higher than  2 , a PM will be performed.To represent the effect of the PM, we define  1  and  2  to be the effect of the  ℎ PM on the  ℎ machine's  1 and  2 . 1  is proportional to the magnitude of that threshold at the point of maintenance initiation.This also corresponds to reality: when the reliability of the machine is low, maintenance is less effective.While  2  depends on the intensity of the external shock size before the current time.If the current time is subjected to a larger shock, then the PM effect is worse; if the current time is subjected to a smaller shock size, then the effect should be better than the previous case.Then, the maintenance effect of the  ℎ PM can be expressed as: 2 (  )  2 (), ( = 1,2,  ∈ [  ,  +1 ]) (10)   =   +   (11) When the reliability of the machine is detected to be lower than  2 , a CM will be performed since CM affects  1 ,  1 ,  2 =   +   (16) (

Failure for external shock
( 1) T  Fig. 5. Maintenance process of the  ℎ machine.

Discussion of cost and buffer capacity
Maintenance takes time, a machine cannot continue to work during that time.At this time the buffer device began to play a role in ensuring the flow of the assembly line.For example, if M1 is under maintenance, M2 can process the temporarily stored semi-finished products in the buffer device.Until M1 maintenance is completed, the system will work normally again.
If M2 is under maintenance and the buffer is not full, M1 can temporarily store the semi-finished product in the buffer, and the system will work normally again when M2 is maintained.
Define  1 and  2 as the processing speeds of M1 and M2.
Suppose that  1 ≥  2 . 0 is the initial buffer stock in B, and   is the maximum capacity of B. () denotes the buffer stock in B at the time .Meanwhile, the cost of a PM and CM is denoted as   and   , and if the system is shutdown, the cost of downtime per unit of time is   .Suppose that   <   <   .In this section, all the combinations of maintenance methods of the two machines and their cost will be discussed.
In the end, a formula of the system's total cost for a given running cycle and some constraints are given.

No need for maintenance 1 𝑹 𝟏 > 𝑻 𝟏 , 𝑹 𝟐 > 𝑻 𝟏
This is the optimal situation and is usually the result of the first detection after the start of operation.Since the reliability of both machines is greater than the PM threshold  1 , neither machine needs maintenance.But the machines have different production rates  1 ≥  2 , so the buffer stock in B will increase per unit of time by the amount is: Since no maintenance is required, the cost of this stage is  1 = 0.

One machine needs maintenance 1 𝑹 𝟏 > 𝑻 𝟏 , 𝑻 𝟏 > 𝑹 𝟐 > 𝑻 𝟐
In this case, PM is required for M2 since only M2 has reliability less than the PM threshold  1 .There are three subcases of the buffer device in this case.
If M1 normally outputs semi-finished products stored in the value buffer device, the buffer device is not full during the maintenance time.That is () +  1  2 <   .Due to the buffer, the system does not shut down, so the change in buffer stock per unit of time is: The total cost is the PM performed by M2.That is  21 =   .
If M1 normally outputs semi-finished products stored in the buffer device, but the stock reaches   at some time during the maintenance time of M2, then the system is shutdown due to the buffer device being full.That is () +  1  ′ = ′ 2  .
The buffer increment per unit time in this case is also the same as Eq.18.The total cost includes the PM cost of M2 and the cost of downtime.This can be expressed as: If the buffer device is full from the beginning, M1 cannot work during the entire maintenance period, so the system will always be shutdown.The buffer increment per unit time is 0.
The total cost in this case is:

𝑹 𝟏 > 𝑻 𝟏 , 𝑻 𝟐 > 𝑹 𝟐
In this case, M2 needs to perform CM.Similar to the previous case, it also needs to be divided into three sub-cases to be discussed separately.
If M1 puts the semi-finished product into the buffer device, the device will not be full during the time M2 is performing CM.
That is () +  1  2 <   .Because of the buffer stock, the system does not shutdown, so the change in buffer stock per unit of time is the same as Eq.18.The total cost is the CM performed by M2.That is  31 =   .
If M1 normally outputs semi-finished products in the stored value buffer device, but the stock reaches   at some time during the maintenance time of M2, then the system is shutdown due to the buffer device being full.That is () +  1  ′ = ′ 2  .The buffer increment per unit time of the buffer unit in this case is also the same as Eq.18.The total cost includes the CM cost of M2 and the cost of downtime.This can be expressed as: If the buffer device is full from the beginning, M1 cannot work during the entire maintenance period, so the system will always be shutdown.The total cost in this case is: This case is very similar to case 1, except that this time it becomes M1 that performs PM and M2 does not need to be maintained.Two sub-cases need to be considered.
If the semi-finished product in the buffer device is processed into a finished product by M2, there is always a surplus of semifinished product in the buffer device during the time M1 is performing maintenance.That is () −  2  1 ≥ 0 .Then the system will not be shutdown and the change in buffer stock per unit of time can be expressed as: The total cost is the PM performed by M1.That is  41 =   .
If the semi-finished product in the buffer device is used by M2 to the extent that there is nothing left in it during the time that M1 performs maintenance, the system will then be shutdown.That is () −  2  ′ = 0( ′ <  1 ).The total cost is the PM performed by M1 and the cost of the system's downtime.
It can be expressed as: In this case, M1 needs to perform CM, and similar to the previous case, it also needs to be divided into two sub-cases to be discussed separately.
If the semi-finished product in the buffer device is processed into a finished product by M2, there is always a surplus of semifinished product in the buffer device during the time M1 is performing maintenance.That is () −  2  1 ≥ 0 .Then the system will not be shutdown and the change in buffer stock per unit of time can be expressed as in Eq. 23.
The total cost is the cost of PM performed by M1.That is If the semi-finished product in the buffer device is used by M2 to the extent that there is nothing left in it during the time that M1 performs maintenance, the system will then be shutdown.That is () −  2  ′ = 0( ′ <  1 ).The total cost is the CM performed by M1 and the cost of downtime.It can be expressed as In this case, both machines need maintenance, and both perform PM, so the system must be shut down.There are a total of three sub-cases that need to be discussed.
If both machines have the same maintenance time, which means that  1 =  2 .The total cost is the cost of performing PM on both machines and the cost of downtime, that is If the maintenance time for M1 is less than the time for M2, that is  1 <  2 .When M1 finishes its maintenance, it is the first to start working.See case 1 in Section 6.2 for a discussion of the part of M1 after completion of maintenance in this case.
The total cost can be expressed as: In this case, M1 needs to perform PM, and M2 needs to perform CM, so the system must be shutdown.There are a total of three sub-cases that need to be discussed.
If both machines have the same maintenance time, that is In this case, M1 needs to perform CM, and M2 needs to perform PM, so the system must be shutdown.There are a total of three sub-cases that need to be discussed.
If both machines have the same maintenance time, that is The total cost can be expressed as: (37)

Parametric equations for total cost and constraints
Since the costs while the system is running are ignored (including the cost of inspection of the two machines at  time intervals and the cost of storage of semi-finished products by the buffer device), the total cost of the system is the sum of the Let   represent a given system operation cycle time, and when   is given, the total cost for that period can be calculated.

Determination of the optimal solution
In this section, a specific numerical example will be used to illustrate the feasibility and superiority of the optimization model proposed in this paper.
The individual parameters of the model are set as shown in Tab.2.These parameters are based on the PAB system (A system consisting of production units (P) assembly units (A) and intermediate buffers (B), represented in Fig. 6.) in the most productive cycle model based on equipment degradation and on-the-fly demand proposed by Zhou et al.According to Zhou's study [31], the internal degradation of the system obeys the gamma process [32] with parameters  = 2 and  = 1, and the production rate of the M1 is  1 = 5 .At the same time, Zhou proposes that "The production rate of the downstream machine determines how fast the production system changes from controlled to out-of-control, and the smaller the production rate of the downstream machine, the faster the rate of the production system changes to out-of-control.The smaller the speed, the faster the rate of the production system transforms to out-ofcontrol.".To test the maintenance effect of this paper's model in long-time production, we assume that  2 = 4.
Where () is the gamma function, which is defined as The final optimal solution is shown in Fig. 7.If the selfrepair of the machine is not considered, the minimum cost of the system is shown in Fig. 8.
From Fig. 7, it is clear that the cost minimum of the system in the current case occurs with  1 = 0.73 and  0 = 51 .In addition, we can see that in the vicinity of  1 = 0.73, the cost of the system decreases briefly even though how  0 changes, this is because the characteristic of the gamma process is that the reliability of the system changes from slow to fast, near 0.5 is the turning point where the speed of the process shifts sharply, and most of the time, the machine's reliability is greater than 0.5, so there is no need for repairs, and when the reliability is lower than 0.5, the repairs are already too late, so the system will age quickly, resulting in shut down or failure, the number of repairs is also lower, resulting in a short-lived reduction in cost.
By comparing Fig. 7 and Fig. 8, it can be seen that the selfrepair function of the system can significantly reduce the operating cost of the system.In addition, the reliability and buffer inventory data of the system when the detection occurred during the simulation time were collected and the average reliability and average buffer inventory of the system were calculated.The values of these two parameters with and without considering self-repair conditions were compared and the results are presented in Fig. 9.As can be seen in Fig. 9, the average reliability of the system increases after considering self-repair.At the same time the average buffer stock is reduced, which means that the buffer is not easy to be full or empty, and the probability of a machine shutdown due to the buffer becoming full or empty is greatly reduced.This also proves that the self-repair mechanism proposed in this paper can not only in reducing the cost but also improve the system reliability and stability.

sensitivity analysis
In this section, we analyze the effect of varying different parameters on the results, in preparation for taking our model to realistic applications.
There are three parameters related to maintenance cost in this paper and a fixed system to perform CM threshold, which are   ,   ,   and  2 , by changing the values of their parameters by different proportions respectively, the amount of change in the total cost of the system varies as shown in Fig. 10.
As shown in Fig. 10, the cost of shut down has the greatest impact on the system cost, this is mainly because the cost of shut down is much larger than the other two parameters, and it acts as a penalty function in the model, that is to say, when the machine is in shut down, the system applies a penalty mechanism that adds a large cost.Meanwhile, the threshold  2 for the system to perform CM increases the system cost whether it varies in a positive or negative direction because if it varies in a positive direction, the system performs CM more often, leading to an increase in cost, and if it varies in a negative direction, it leads to the system not being able to perform maintenance in time due to the fast decrease in reliability, which leads to a shutdown cost.From the experimental results, the increase in cost for a positive change in  2 is greater than the increase in cost generated by a change its negative direction.

Conclusions and future research
For a serial two-component system with a buffer device and Finally, a sensitivity analysis of the parameters is performed and reasonable conclusions are given to support the results of the analysis.The results show that self-repair of the system can reduce costs by nearly half in a given period of time.It also improves the average reliability of the system and reduces the average buffer stock.
In future research, scholars could consider the effect of one machine in a two-component system on the reliability of the other in the event of a failure, e.g., when the upstream (or downstream) machine fails or performs maintenance, the shock to the downstream (or upstream) machine surges and the reliability decreases even more violently.Consideration can also be given to constructing joint optimization models for systems consisting of more components, e.g., a system of four upstream and downstream machines and three buffer devices.
et al. (2018) proposed a two-stage shock model with a self-repair mechanism.The original cumulative and delta shocks are further categorized into effective and ineffective shocks.The self-repair conditions for effective shocks at different stages are developed.The probability density function, distribution function, and mean of the shock length are solved based on the finite Markov chain embedding method.The optimal preventive maintenance strategy is obtained by minimizing the long-term average running cost.Qiu[29] et al.

Fig. 1 .
Fig. 1.The 2M1B system and its workflow.To simplify the expression, the two machines in series are numbered, with the M1 being number 1 and the M2 being number 2, which are denoted by ( = 1,2).The overall failure process of the machines can be divided into two parts: the internal degradation failure process and the external shock failure process.The internal degradation process is a process in which the amount of internal degradation of a machine increases due to the growth of its lifetime, it is also known as soft failure process (For example, wear and tear during machining, corrosion of the machine body by coolant, etc.).When the amount of degradation is greater than the internal failure threshold, the system will fail (or have a soft failure) due to excessive internal degradation.It is an internal cause of machine failure.The external shock process is a process in which a machine has a decreasing external failure threshold due to random external shocks, it is also known as hard failure process (For example, unstable input current during machine processing, incorrect operation by personnel, etc.).When the size of an

Fig. 3
Fig.3represents a numerical competition process example between the internal degradation failure process and the external shock failure process of a machine.In this figure, the upper and lower bounds of the variation of the thresholds for internal degradation and external shock failure are plotted, and it is convenient to observe the trend of the thresholds.In this numerical case, the frequency of external shocks is set very high to simulate the equipment in extreme operating environments.

5 .
() =  1 () 2 (){() = }, ( = 1,2) =  1 () 2 () Quantification of the dynamic time and effects of different maintenance methods In this paper, two maintenance methods are considered: imperfect preventive maintenance (PM) and corrective maintenance (CM).Imperfect PM can increase the machine's internal failure thresholds  1 and external failure thresholds  2 .While CM can affect almost all aspects of the machine including  1 and  2 , including cumulative internal degradation  1 , and self-repair coefficient   .Both maintenances have their required time and effect.The time for the  ℎ machine to perform the  ℎ PM is divided into two parts, internal maintenance time   and external maintenance time   , and the total maintenance time   is the sum of the two parts.Similarly for CM, the total time is denoted as   , where   is the internal maintenance time and   is the external maintenance time.

Fig. 5
Fig. 5 illustrates the process when the  ℎ machine performs maintenance.The first graph represents the total time for the machine to perform PM, CM, and the effect of maintenance on reliability   ().From the above, it is known that the time the machine spends performing maintenance is divided into two parts: internal maintenance time and external maintenance time, and the machine performs internal maintenance first before external maintenance.The second figure represents the time required for internal maintenance of the machine and the effect of maintenance on the amount of internal degradation  1 ().The third figure shows the time required for external maintenance of the machine and the effect of the maintenance team on the external shock failure threshold  2 ().As the time of use increases, the reliability of the machine falls below the PM threshold  1 at moments 2 and 7 (the second test and the seventh test), which triggers the imperfect PM, and the machine

Fig. 9 .
Fig. 9.The comparison of average reliability and average buffer inventory of the system considering and not considering self-repair conditions.
self-repairing characteristics, a new approach to self-repair and a competitive failure process of internal degradation and external shocks processes are proposed, and a joint optimization model of buffer capacity and maintenance thresholds is presented, taking into account the dynamic time used for maintenance and the dynamics of the maintenance effect.Data such as the reliability and buffer stock of the system are obtained in equally spaced system measurements, and the time of the maintenance decision is also based on the interval of detection.Separating the reliability interval using two thresholds, the calculation methods of dynamic maintenance effect and time for different maintenances are defined.All 23 repair cases and the buffer stock changes in each repair state are also listed.The system maintenance strategy for the least cost case under a specific operation cycle can be obtained by the objective function.Next, a numerical case is used to illustrate the feasibility of the present model and comparatively verify the optimal cost in a given operation cycle without considering selfrepair, which in turn verifies the superiority of the present model. t and   .Let  1  denotes the effect of the  ℎ CM on  1 , and the effects on  1 ,  2 and  1 are the same as those of PM.The effect of CM on   is reflected in the interval of the obeying truncated normal distribution, which is changed from the previous [, ] to [ + ,  + ] , which indicates that the selfrepair ability of the machine is improved by CM.Then, the
PM, CM and downtime.Let   and   denote the number of times the system performs PM and CM,   represents the number of  ℎ sub-cases in which the  ℎ case occurs, and   means the total time the system is shutdown.The total cost is denoted as   .Expressed in the equation it is   =     +     +