A genetic algorithm-based approach for flexible job shop rescheduling problem with machine failure interference

Highlights Abstract ◼ Based on genetic algorithm, a complete and detailed method for solving FJSP is proposed. ◼ Rescheduling strategy for FJSP in dynamic environment is established. ◼ The complete rescheduling solution results in better results than right-shift rescheduling. ◼ The proposed method can make an effective response to the flexible job-shop rescheduling with machine failure interference. Rescheduling is the guarantee to maintain the reliable operation of production system process. In production system, the original scheduling scheme cannot be carried out when machine breaks down. It is necessary to transfer the production tasks in the failure cycle and replan the production path to ensure that the production tasks are completed on time and maintain the stability of production system. To address this issue, in this paper, we studied the event-driven rescheduling policy in dynamic environment, and established the usage rules of right-shift rescheduling and complete rescheduling based on the type of interference events. And then, we proposed the rescheduling decision method based on genetic algorithm for solving flexible job shop scheduling problem with machine fault interference. In addition, we extended the "mk" series of instances by introducing the machine fault interference information. The solution data show that the complete rescheduling method can respond effectively to the rescheduling of flexible job shop scheduling problem with machine failure interference.


Introduction
As the core of product production process management decision-making stage, shop scheduling plays a key role in realizing intelligent manufacturing industry and digital production management. Obtaining excellent scheduling solution through scientific scheduling decision theory is helpful to excavate the production capacity of existing production resources and improve the utilization rate of job shop production equipment, which has important theoretical significance and engineering application value to shorten the production cycle and save the production cost of enterprises.
The highly integrated function of production equipment is a major feature of production equipment in manufacturing industry, such as CNC machine tools, machining centers, etc.
One equipment can be competent for a variety of processing  [8] proposed an improved genetic algorithm to solve FJSP, which a new local search-based operator was used to improve the quality of the available solutions by optimizing the most potential individuals in each generation. Ziaee [35] developed an efficient heuristic based on a constructive procedure to obtain high-quality schedules very quickly and it can be used to improve the quality of the initial feasible solution when solving a problem by a metaheuristic algorithm, since choosing a good initial solution is an important aspect that affects the performance of the algorithm. Xing et al. [31] proposed a co-evolutionary algorithm, which combined ant colony algorithm and genetic algorithm.
The two algorithms evolved their respective populations independently to improve the performance of solving FJSP. Sun et al. [28] considered FJSP with uncertain processing time represented by fuzzy numbers, and combined particle swarm optimization with the genetic algorithm to improve the convergence ability. Zeng and Wang [33] took particle swarm optimization algorithm as the operator to embed into manual immune algorithm for maintaining the diversity of population and prevent obtaining local optimal solution in solving FJSP.
Denkena et al. [9] used the concept of quantum computing based optimization for FJSP and the new approach demonstrated the good performance and practicability in the application to a realistic use-case. Li and Lei [21] developed an imperialist competitive algorithm with feedback to solve the multiobjective optimization problem of FJSP. Li and Gao [20] proposed a multi strategy slime mould algorithm named GCSMA for global optimization and the simulation experiment was verified that GCSMA can be effectively applied to FJSP, and the optimization results were satisfactory. Huo and Wang [17] proposed a hybrid dynamic scheduling method with digital twin and improved bacterial foraging algorithm. Sharifi and Taghippour [27] respectively used genetic algorithm, simulated annealing algorithm and teaching-learning-based optimization algorithm to solve the scheduling problem and proved the superiority of genetic algorithm in solving the scheduling problem by enumeration method. Similarly, the solution method based on genetic algorithm has also been adopted in some literatures and has shown its superiority in solving the job shop scheduling problem and other combinatorial optimization problems [10,14,16,19,23,24,26,32].
In static scheduling, all manufacturing resources are persistent, that is, it is assumed that the production environment is an ideal interference-free environment, and the machines can run continuously according to the original scheduling plan.
However, in real production, the manufacturing system will encounter unexpected disturbances, such as machine breakdowns and emergency orders. In this case, the scheduled schedule will lose its optimality or even become inexecutable. [7,11,25] The significance of rescheduling is to formulate the corresponding rescheduling scheme through the re-selection of machines to deal with the deterioration of the initial scheduling scheme caused by interference factors.
Ghaleb et al. [12] considered processing times and energy consumption affected by machine deterioration and failures, built maintenance and scheduling decisions based on the machine's degradation level, and proposed an effective genetic algorithm for solving. Wang et al. [30] studied the scheduling problem for the flexible manufacturing systems under uncertain machine failure disruptions and proposed a robust scheduling optimization model based on the concept of threshold scenario to achieve a set of production due-date requirements as well as possible. Tubilla and Gersgwin [29] studied a variety of scheduling policies in a failure-prone machine and shed light on the most adequate operating conditions for their implementation.
Azimpoor [5] proposed a branch and bound algorithm that studied an integrated optimization problem of condition-based preventive maintenance and production rescheduling with multi-phase processing speed selection and old machine scrap [4], and researched the joint optimization of preventive maintenance and flexible job-shop rescheduling with processing speed selection, and the dynamic arrival of the new machine is considered to enhance productivity [3].
With the motivations noted above, we considered the machine flexibility in real working environment, in this paper,

Problem description and modeling
Flexibility generally refers to the flexibility of the machine, that is, in the workpiece to be processed, all the workpiece contains multiple processes, there are multiple machines in the processing system, each process can choose multiple processing machines, but only one machine can be selected for processing, the same process can choose to process in different machines, then there will inevitably be different processing time. The scheduling problem is flexible job-shop scheduling problem. A schematic diagram of the flexible job-shop scheduling problem is shown in Fig. 1.
Choice machine arbitrarily For the flexible job-shop scheduling problem, the parameters are defined: then the scheduling model has the following constraints: where, Sjh is the start processing time of operation Ojh, pijh is the processing time of operation Ojh on machine Mi, and cji is the completion processing time of operation Ojh.
where, j=1, 2, ..., n, hj is the number of operations that the job Jj contains.
where, L is a sufficiently large positive number.
In the above equations, Eqs. (3) and (4) 6) and (7) constrain that only one operation can be processed by the same machine at the same time. Eq. (8) restricts that the same operation can only be processed by one machine at the same time.

Basic solution flow of GA
The steps of genetic algorithm [15,22,34]

Initial population:
The machine selection of partial FJSP (P-FJSP) is irregular, such as the instance in Table 1. Kacem [18] set the processing time of unselected machines to "999" during encoding and P-FJSP was converted into total FJSP(T-FJSP), which makes the algorithm of encoding machine chains more general, and this approach has also been adopted in some subsequent literatures. Although the elimination mechanism of GA can eliminate the cases of non-selectable machine is selected, it increases a lot of redundant information, and increases the amount of calculation and search difficulty.
Therefore, this paper designs a method of machine chain initialization for P-FJSP. The optional machines of all processes are counted and stored in "Ms_celldata", and the index value is randomly generated during coding to generate machine chain.
The details are given in Algorithm 1. Decoding operation Operation in same job Decoded operation

Rescheduling strategy in dynamic environment
Periodic rescheduling is a scheduling method that assigns tasks to resources periodically based on rolling horizon. In essence, the static scheduling is divided into multiple scheduling time Windows, and the static scheduling is implemented in each time window. This scheduling method has high robustness for the production system, when the disturbance occurs, the scheduling system can make a timely response. The smaller the time window, the more aggressive the response, and the more computationally intensive it is. The disadvantage is that when no external disturbance occurs, unnecessary computation will be generated, and the optimum in the time window is a local optimum, which cannot represent the global optimum Event-driven rescheduling is a scheduling method that the scheduling system regenerates the scheduling scheme when the external disturbance occurs. In the production environment where interference events do not occur frequently, this scheduling method can save computing resources and respond positively to interference events. Event-driven rescheduling includes: (1) Right shift rescheduling After the occurrence of the disturbance event, such as order insertion or machine failure, a simple rescheduling method is right shift, that is, the subsequent operations on the time node of the machine where the disturbance event occurred are delayed, which is essentially to delay the related links in the production system without taking any measures for the disturbance event. When the interference duration is small, the idle time of the machine in the scheduling scheme has the ability to absorb the interference factors, and has little impact on the makespan of the overall scheduling scheme. When the interference duration is large, the idle time of the machine cannot absorb the processing delay caused by the interference, which will cause the overall scheduling scheme to produce tardiness, resulting in production delay. The principle of rightshift rescheduling is shown in Fig. 3. According to the type of disturbance event, it is necessary to make assumptions about the operation being processed by the disturbed machine. The workpiece being processed must be processed on the disturbed machine before it can be rescheduled.
If the machine is disturbed by failure, the processing of the operation in process will stop immediately on the disturbed machine and need to be reprocessed on other machines under rescheduling.

Rescheduling decision method for FJSP with machine failure interference
The FJSP rescheduling problem in dynamic environment considers a variety of disturbance factors, which are random and uncertain, and make the production mobilization process fluctuate. The above sections give the selection rules of scheduling strategies according to different working conditions.
Considering the most common machine failure interruption factor in production scheduling, this paper studies the specific implementation method of the complete rescheduling strategy, and adopts the studied complete rescheduling method to reduce the impact of machine failure disturbance.

Performance metrics and assumptions
The most direct impact caused by machine failure is the delay of the construction period, so the difference between the actual scheduling scheme and the original scheduling scheme can be used as a performance index to measure the rescheduling, as shown in Eq. (9).
where Sr is the rescheduling scheme, Sp is the original scheduling scheme, Cmax is the makespan of the scheduling scheme, and ( ) is the difference value of the objective function between the rescheduling scheme and the original scheduling scheme. In order to make Eq. (9) more general, the relative deviation is used to represent the difference of performance index between two scheduling schemes, as shown in Eq. (10).
In order to simplify the problem, the following assumptions are made for the FJSP rescheduling problem with machine failures: (1) Only one machine is down at a time.
(2) It takes negligible time to transfer the workpiece from the failed machine to a functioning machine, and the operation needs to be reprocessed.
(3) Repair the machine immediately after its failure.

Principle and algorithm of complete rescheduling
The optimization algorithm used to solve the rescheduling problem under machine fault interference is the same as the algorithm used to solve the initial scheduling scheme. The difference is that the rescheduling information input adds the machine fault information, that is, the machine cannot be selected during the machine fault period, and can be selected again after the fault is repaired. So, one solution cycle of rescheduling should start by determining the chromosome gene position corresponding to the faulty machine. Fig. 6 shows the Gantt chart of the optimal solution for the FJSP instance shown in Table 1

Results and discussion
In order to evaluate the performance of the proposed algorithm in solving FJSP rescheduling optimization, we extended the "mk" instances proposed by Brandimarte [6] and As shown in Table 2  value of the rescheduling scheme is also different from that of the initial optimal scheduling scheme, and it cannot be stabilized around a certain percentage, that is, "40%" may be the optimal rescheduling (Fig. 8 mk04-1), Or it may be that rescheduling is not optimal (Fig. 8 mk01-1).
In order to visually compare the results of right-shift rescheduling with complete rescheduling, the RM value is calculated for each group of results, and the comparison results are shown in Fig. 8.
The machine fault information and rescheduling results are shown in Table 2.  5  3  20  56  42  4  18  19  61  54  1  22  20  49  44  3  23  14  55  46  5  13  10   rescheduling result is 54, and the added value is 21, which is due to the fact that the machine failure happened at the time of machine processing, so the operation needed to be reprocessed in rescheduling. At this time, the 5th operation of job J10 is being processed (Fig. 9). Based on the solution results of 50 rescheduling events in Table 2 Fig. 10 shows that, for machine M5 which is greatly affected by fault interference, the 2nd operation of job J1 and the 3rd operation of job J10 move to the right directly leads to the delay of subsequent operations. In the complete rescheduling scheme shown in Fig. 11

Conclusion
In this paper, we studied a rescheduling method based on genetic algorithm for FJSP with machine failure, thus aiding the operational reliability of robust shop floor production systems.
To be more precise: the mathematical model of FJSP, the detailed process of solving FJSP by genetic algorithm and the event-driven rescheduling policy in dynamic environments are established; and then, the rescheduling decision method based on genetic algorithm for FJSP with machine fault interference is proposed and verified by extended instances. The test solution data show that the average delay ratio of right-shift rescheduling is 49.70%, and the average delay ratio of complete rescheduling is 27.29%, which leads to the conclusion: complete rescheduling is superior than right-shift rescheduling, and the proposed complete rescheduling decision method can effectively respond to the rescheduling solution of FJSP with machine fault interference.
As the machine ages, production system machine failures and maintenance problems become com-mon. Combining the event-triggered rescheduling theory in this paper with the condition-based preventive maintenance is an effective means to ensure the stability of the production system. The future work is to predict the machine maintenance according to the running state of the machine, and combine the re-scheduling method in this paper to improve the maintenance function module of the production and manufacturing system.