A physics-informed and data-driven framework for robotic welding in manufacturing (2025)

Introduction

The development of industrial artificial intelligence (AI)-driven data models is pivotal to achieving full-process automation in manufacturing scenarios. It also serves as a critical driving force for deeply integrating digital transformation with general AI in industrial applications1,2,3,4,5. However, the process of constructing such models is constrained by the intricate interplay among the data quality6,7, model accuracy8, and generalizability aspects9. Data quality, as the foundation of model performance, is intricately linked to accuracy and generalization in a relationship akin to a balance scale: data quality forms the base, whereas accuracy and generalization occupy opposing sides. As the data volume increases, the structural stability of the model may improve, but simultaneously enhancing both its accuracy and generalizability requires exponentially greater data resources10,11. Throughout the process of implementing AI in manufacturing, striking an effective balance among these three factors remains a persistent and formidable challenge12. In practical industrial scenarios, models must not only resolve the tradeoffs between data quality and performance but also satisfy multidimensional requirements such as low defect detection rates13, high robustness14, and extensive scalability15. These challenges are particularly pronounced in robotic welding-based manufacturing situations, where automated welding methods encounter significant hurdles under complex welding paths and operational conditions16. In scenarios involving intricate processing routes, scarce instability data, and high costs associated with model misjudgments, achieving large-scale and reliable robotic welding applications in complex component manufacturing is difficult17,18. For example, in the commercial aerospace sector, the rapid increase in launch demands necessitates the precise fabrication of structures such as spacecraft fuel tanks and sealed outer casings for space stations19,20. These processes must reconcile the complexities of low-volume production with those of dynamic processing scenarios21. This context imposes stringent demands on the precision and response speed of the welding method. However, under dynamically changing conditions, the current technologies face significant limitations, as they fail to meet the high reliability requirements of advanced manufacturing tasks.

To address these challenges, researchers have extensively utilized numerical simulations22,23,24 to analyze the distributions of the thermal, mass, and force fields induced during welding processes, aiming to identify instability factors. Concurrently, the potential of data-driven models for conducting real-time prediction in welding applications is being actively explored. Numerical simulations, combined with prior welding parameter knowledge, offer robust support for understanding welding stability25,26. However, their high computational costs and limitations with respect to spanning various spatial and temporal scales significantly hinder their widespread application in industrial scenarios. In contrast, the rise of AI has progressively reduced experimental costs and supplemented numerical computations. Deep learning has been increasingly applied to predict welding stability27, detect weld defects28, and optimize welding parameters29.

The application of deep learning in welding manufacturing follows a structured four-stage workflow: Process Data Acquisition, Data-Driven Modeling, Inference Model Deployment, and Dynamic Model Optimization. Process Data Acquisition (Fig.1b) integrates voltage and current sensors to capture one-dimensional signals, while machine vision-based imaging and infrared sensors provide multimodal data for analyzing dynamic melt pool behavior30,31. Data-Driven Modeling (Fig.1c) involves the development and training of models tailored for welding instability detection and defect monitoring32. Inference Model Deployment (Fig.1e) is executed either through end-to-end integration within specific task environments or via distributed architectures to enhance scalability33. Dynamic Model Optimization (Fig.1d) refines deployed models by updating databases and adjusting model weights, ensuring adaptability to varying welding conditions34. However, this workflow heavily relies on high-quality data and lacks sufficient modeling capabilities for complex thermofluidic coupling in real-world scenarios, limiting its generalizability and stability. To overcome these limitations, integrating the strengths of physical models with data-driven methods has emerged as an effective pathway35,36,37. Physics-informed data-driven models incorporate physical laws into the machine learning process, thereby significantly reducing their demand for high-quality data while enhancing their generalizability38,39,40,41. Under highly variable conditions, this approach enables complex scenarios to be efficiently modeled with minimal data, optimizes model performance and provides a scalable solution for addressing dynamic industrial challenges.

The PHOENIX framework is proposed to comprehensively integrate physical knowledge into data-driven models, enhancing their predictive performance and adaptability in welding-based manufacturing processes. This framework reduces the reliance on high-quality, large-scale datasets while significantly improving the prediction accuracy, recall, and generalization capabilities of the constructed model under complex operating conditions. a The PHOENIX framework systematically extracts physical information, including engineering expertise, welding knowledge, and conservation laws, to embed these insights into data-driven models. This approach effectively guides the model training and optimization processes, enhances the accuracy of the model and ensures consistency with physical information. b By leveraging physics-informed data-driven models, the framework substantially decreases the dependency on high-quality data and a large number of feature, addressing the challenges posed by small-scale and low-quality datasets in model applications. c During the training process, PHOENIX incorporates physical constraints to optimize the model normalization, parameter tuning, and loss function design procedures. Moreover, it translates physical laws into explicit model constraints, strengthening the ability of the framework to physically represent welding processes. d PHOENIX establishes an intergroup dynamic learning mechanism by integrating historical prior data with real-time collected data. This approach enhances the stability and reliability of the model in complex industrial scenarios, ensuring robust and dependable predictive results. e The framework enables the precise perception of transient and sequential behaviors in manufacturing processes by fusing physical information with data-driven models. This capability supports proactive predictions of melt pool dynamics, ensuring high stability and weld quality during the welding process. f The PHOENIX framework exhibits remarkable versatility and is applicable to real-time welding process monitoring, predictive melt pool state modeling, dynamic weld defect detection, and adaptive welding parameter regulation. Furthermore, it holds the potential for expansion to other welding methods and additive manufacturing technologies, offering a novel solution for implementing process optimization and quality control across diverse manufacturing techniques.

Full size image

Robotic variable polarity plasma arc (VPPA) welding technology has emerged as a highly reliable method for aluminum alloy welding and has demonstrated significant advantages, particularly in terms of effectively controlling welding defects42. By employing high-frequency polarity switching over short durations, VPPA not only efficiently removes oxide films from material surfaces but also leverages the intense stirring action of the welding arc to substantially reduce porosity and inclusions. This ensures high-quality welds while achieving minimal deformation and residual stress43,44. Despite its application in high-precision welding tasks, VPPA technology still faces considerable challenges under complex operational conditions due to the intricate multi-physics coupling effects and highly dynamic characteristics of the process. These challenges are particularly pronounced in scenarios involving robotic welding tasks with continuous spatial position changes. In industrial applications, welding process instability often leads to interruptions, requiring manual interventions for onsite repairs, along with the recalibration of the employed robotic paths and process parameters. These issues significantly prolong production cycles and increase manufacturing costs. Therefore, achieving advanced predictive capabilities to attain welding process stability and providing sufficient response time for online regulation are critical to effectively addressing these challenges.

Since these insights, this paper proposes an innovative AI framework for robotic welding manufacturing, namely a Physics-informed Hybrid Optimization framework for Efficient Neural Intelligence in Welding (PHOENIX), as illustrated in Fig.1. By embedding physical knowledge into multisource data inputs, model architecture design, and optimization processes, PHOENIX mitigates the dependence on extensive, high-cost datasets while enhancing predictive accuracy and robustness in complex industrial scenarios. The PHOENIX framework embeds physical information through a hierarchical structure, encompassing engineering expertise, welding knowledge, conservation laws, and physical models (Fig.1a). This enables the seamless integration of physical constraints throughout the input (Fig.1b), training (Fig.1c), inference (Fig.1e), and optimization (Fig.1d) stages of data-driven models. By deeply combining physical information with data-driven approaches, the framework effectively addresses the critical bottlenecks that traditional models encounter in industrial intelligent applications. Using robotic VPPA welding as a representative application, the PHOENIX framework is demonstrating reliable predictive performance and adaptability. By leveraging the deep integration of physical information, the framework is achieving precise instability warnings up to 0.05 s in advance with a prediction accuracy of 98.1%, even when trained on small batches of data. The PHOENIX framework is transcending the traditional trade‑offs among data quality, prediction accuracy, and model generalization, and is offering a universal solution for a range of welding and additive manufacturing technologies (Fig.1f). Its application in commercial aerospace scenarios is notably alleviating the tension between high experimental costs and the need for rapid manufacturing, and is providing a technological pathway for achieving fast responses and intelligent manufacturing capabilities.

Results

PHOENIX framework for robotic VPPA welding

To handle these challenges, we integrated physical information into the in situ monitoring task of the VPPA welding process within the PHOENIX framework, achieving highly accurate and generalizable melt pool stability prediction capabilities under scenarios with complex operating conditions and limited data (Fig.2). The PHOENIX framework is composed of four key modules: a machine vision module for capturing the dynamic and morphological features of the melt pool, a time-ahead prediction module with a physical information input, a data-driven physical saddle point modeling module, and an incremental learning module for dynamically tuning the model parameters through the integration of prior and newly acquired data.

Robotic VPPA welding represents a prototypical application scenario of the PHOENIX framework. Through three distinct pathways, the framework effectively integrates physical information to enable highly accurate and generalizable early melt pool instability predictions, even with small datasets. The blue pathway involves the development of a time-ahead melt pool instability prediction module based on the dynamic and morphological features of the melt pool. The red pathway concerns the establishment of a flow model for melt pool dynamics using quasistatic physical features as constraints, leveraging data-driven models to obtain precise dynamic feature predictions. The yellow pathway enables dynamic model hyperparameter tuning through intergroup incremental learning; this process integrates the historical data with the newly acquired information, enhancing the performance and adaptability of the model. a An in situ high-speed X-ray acquisition system captures high-precision melt pool dynamics data. While they are costly to obtain, these data provide a robust foundation for understanding melt pool behavior. b A highly adaptable machine vision module, which is equipped with transfer learning capabilities, extracts multisource image features in real time, delivering reliable data support for online melt pool monitoring. c An early instability detection module was developed; this module leverages physical information inputs to efficiently predict melt pool instability during robotic VPPA welding. d Melt pool flow trajectories and saddle point information at varying depths were recorded using a particle tracking method via an in situ high-speed X-ray system. e An industrial camera acquisition system was employed to collect low-cost morphological data from the melt pool, providing a viable solution for the rapid acquisition of cost-effective data. f A data-driven model constrained by quasistatic welding parameter features was developed to substitute expensive data with cost-effective alternatives, thereby reducing the reliance on high-quality data while maintaining strong model performance. g A schematic illustration of the full-position robotic VPPA welding and monitoring system under practical operational conditions. h Incremental learning was employed to integrate historical experience with the newly collected data, enabling proactive temporal information correction and dynamic model optimization.

Full size image

A machine vision module with transfer learning capabilities was employed (Fig.2e) to efficiently extract expensive insights from in situ X-ray data and cost-effective information from industrial camera data. The in situ X-ray system captures the dynamic features of the melt pool (Fig.2a), whereas the industrial camera captures the morphological features of the keyhole (Fig.2c). By applying transfer learning45,46 in the semantic segmentation network47,48, the module overcomes the performance limitations associated with small training datasets, enabling the effective and accurate extraction of features from diverse input sources under limited data conditions. This design provides the PHOENIX framework with reliable physical information, significantly enhancing the predictive accuracy and overall ability of the system.

On the basis of the PHOENIX framework, an online time-ahead prediction module49,50 was constructed to achieve early instability detection during the robotic VPPA welding process (Fig.2f). This model uses the physical information-rich dynamic and morphological features of the melt pool transmitted from the machine vision module as its inputs. By performing multi-information fusion51 at the feature level and applying a sliding window to accumulate temporal data from the welding process, the module predicts short-term future welding states with high precision. The online deployment of this anticipatory module effectively addresses the lag that is inherent in traditional monitoring systems, enabling control strategies to be preemptively implemented before instability trends manifest.

The dynamic behavior of the melt pool was captured via the particle tracking method (Fig.2b), and a data-driven melt pool dynamics model was constructed by integrating quasistatic welding parameters (Fig.2d). By combining cost-effective data with the expensive data obtained through the data-driven double-saddle physical module, the trained sequential prediction module was directly applied to anticipate melt pool behavior. This approach significantly reduces the reliance on costly data while ensuring reliable melt pool stability and flow behavior predictions, thereby improving the data utilization efficiency of the model.

A group-based training method that combines incremental learning52,53 with historical knowledge and new data was proposed within the PHOENIX framework (Fig.2h). This approach enables the model to optimize and update the parameters of its modules in complex scenarios. Through distributed deployment across dual-edge and cloud platforms54, the model enhances the performance of the time-ahead prediction module via sample replay strategies and fine-tuning mechanisms, thereby improving its generalizability. This design supports dynamic adjustments of the anticipatory module in complex welding scenarios (Fig.2g) while granting producers precise control over their proprietary models and product specifications. Furthermore, the integration of cloud-based incremental learning significantly reduces the local deployment costs of the framework, offering an efficient and economical solution for intelligently manufacturing small-to-medium batches of complex components.

Robotic VPPA welding dataset construction

In VPPA welding manufacturing, the dynamic equilibrium between arc shear forces and gravitational effects ensures the directional flow of molten metal, thereby maintaining process stability53. However, in robotic VPPA welding of large-scale aluminum alloy structures, multiposition induce shifts in the relative gravitational direction of the melt pool, leading to flow oscillations. When these oscillations accumulate beyond a stability threshold, melt pool collapse becomes likely, disrupting the welding process. Such instability necessitates robot path reconfiguration and defect repair, significantly increasing both time and economic costs for industrial applications. Moreover, real-world welding of large components presents additional challenges, including instability induced by acceleration and deceleration during robotic motion, non-uniform heat dissipation caused by asymmetric structures, and gap formation in the weld region due to stress concentration. Given the highly dynamic and unpredictable nature of the welding process, reactive adjustments after instability have already occurred often prove insufficient in mitigating its adverse effects. Therefore, anticipating the onset of instability and allowing adequate response time before reaching a critical state is essential for enhancing welding stability and process reliability. By predicting melt pool behavior in advance, it becomes possible to implement preemptive measures, thereby reducing process uncertainties, improving weld quality, and optimizing the overall performance of automated and intelligent welding systems.

To address the inherent challenges in robotic VPPA welding tasks and advance the development of the PHOENIX framework, cost-effective data were collected using industrial cameras integrated into the robotic VPPA welding system (Fig.3c). These data encompassed welding parameters and melt pool morphology features, providing invaluable support for the model development process. Concurrently, in collaboration with the Joining and Welding Research Institute (JWRI) at Osaka University, expensive data were obtained via an in situ high-speed X-ray acquisition system, which targeted a limited set of fixed welding positions. Compared to industrial cameras used in manufacturing, in situ high-speed X-ray imaging systems that are typically developed for research purposes involve substantially higher costs in terms of deployment, operation, and maintenance (see Supplementary Fig.1). Accordingly, in this study, data acquired from industrial cameras are referred to as cost-effective data, while data obtained from in situ high-speed X-ray systems are classified as expensive data. Meanwhile, for data labeling purposes, welding engineers assessed the final quality of the weld seams, integrating welding speed and instability location evaluations to calculate weld seam lengths. These assessments served as the basis for categorizing the melt pool states into three distinct labels: the quasistable state (Fig.3d), the nonstationary state (Fig.3e), and the instability state (Fig.3f). Notably, while quasistable and nonstationary melt pools sustain weld seam formation, instability states lead to seam severance (Fig.3b). During the VPPA welding process, the welding zone often exhibits a keyhole phenomenon akin to that observed in laser welding scenarios (Fig.3a). However, unlike the blind holes that are typical of laser welding55, the elevated arc pressure of the plasma arc in VPPA welding results in the formation of a through-hole. To ensure high welding stability, the entire process must operate within a flow behavior range corresponding to the quasistable state.

The VPPA welding technique has become crucial for manufacturing spacecraft components, particularly lightweight aluminum alloy structures. As the payload capacity requirements of commercial aerospace applications have increased, the performance requirements of aluminum alloys in terms of volume and weight have become more stringent. Simultaneously, the manufacturing industry, driven by customized solutions, demands greater flexibility, precision, and adaptability from welding technologies in complex scenarios. However, VPPA welding faces challenges, including multi-position welding, adaptation issues derived from preassembly errors in large components, and external disturbances that affect welding stability, necessitating process optimization and performance enhancement techniques. a The core principle of VPPA welding lies in the ionization of argon to form a stable plasma arc, thereby achieving efficient energy transfer and precise heat input control. This high-energy density arc provides robust technical support for aluminum alloy welding. b A distinctive feature of the VPPA welding process is the formation of a unique keyhole (through-hole) structure. The keyhole, which is achieved through the concentration of heat, facilitates efficient melting and penetration, maintaining its stability throughout the welding process. However, the instability of the molten metal flow can disrupt the maintenance of the keyhole structure, leading to weld cracking or incomplete fusion defects (see Supplementary Movie3). c The application of robotic VPPA welding in large, complex, spatially curved aluminum alloy structures is exemplified, demonstrating the flexibility and precision of this technology under intricate geometric conditions. d During quasistable VPPA welding, the molten metal flows stably, forming a distinct rear saddle point. This process is typically accompanied by well-formed welds, reflecting the high stability and controllability of the welding process. e Under the nonstationary state, the saddle point of the melt pool exhibits some fluctuations, yet the weld formation remains relatively well formed, indicating the adaptability of the system to localized instabilities. f When the VPPA welding process enters an instability state, the saddle point of the melt pool is significantly impacted, and the keyhole structure becomes difficult to maintain, leading to incomplete weld closure or the occurrence of cutting defects.

Full size image

Physics-informed instability time-ahead prediction module

In traditional manufacturing processes, online monitoring systems rely primarily on visual methods to conduct in situ welding state monitoring56. However, this approach is limited because it provides only a static assessment of the current welding conditions, making it difficult to apply real-time adjustments. When irreversible errors occur during welding, the time available for correction is often insufficient, severely restricting the advancement of smart manufacturing technologies. To address this issue, combining machine vision with advanced predictive techniques has emerged as an efficient solution. We propose a machine vision-driven time-ahead prediction module based on the PHOENIX framework (Fig.4a). By integrating machine vision, the experience of welding engineers, and welding knowledge, this module enables the early detection of welding instability. It leverages the automated feature extraction capabilities of machine vision, requires no human intervention, and efficiently processes physical information inputs under the guidance of welding knowledge, significantly optimizing the training process. Specifically, in VPPA welding applications, this module accurately identifies the nonstationary features preceding instability, providing sufficient response time for online control.

In the guidance of the PHOENIX framework, the time-ahead prediction module built with physical information and integrated with machine vision significantly simplifies the process of constructing in situ online monitoring systems. The model achieved high welding state prediction accuracy and greatly reduced its missed detection rates through a feature engineering mechanism that incorporates engineers’ prior experience and welding knowledge. a The machine vision model constructed through transfer learning worked in synergy with the time-ahead prediction module based on physical information inputs, demonstrating an efficient operational flow. The model exhibited flexibility in acquiring and processing time series data. b Feature engineering, which is based on welding knowledge and engineers’ experience, significantly enhanced the generalizability of the model. During this process, expensive melt pool dynamics features derived from X-rays and cost-effective morphological features captured by industrial cameras were efficiently integrated via the machine vision module, supporting the multisource information processing ability of the model. c Performance evaluation results obtained with respect to the time-ahead prediction module in various data usage scenarios revealed significant prediction accuracy differences when all data, expensive data, cost-effective data, and physical models (replacing the expensive data) were used. These evaluations also demonstrated the dynamic relationship between the time length of the time series data accumulated in the sliding window and the predictions of future changes. The error bars represent the standard error of time-ahead prediction results across multiple independent data groups.

Full size image

To achieve this goal, we utilized a Visual Geometry Group 16 (VGG16) + U-Net57,58,59,60 model with transfer learning capabilities (Fig.2e) as the input data preprocessing unit of the machine vision module. This model efficiently extracts relevant features from the outputs of X-rays and industrial cameras. The pretrained VGG16 model quickly adapts to both expensive and cost-effective data sources. Through this machine vision module, we could not only capture the dynamic and morphological features of the melt pool but also quantify the key features containing physical information. Feature engineering algorithms (see “Methods“) were used to extract expensive data features from X-ray images (Fig.4bi), including the depth (Dj) and offset distance (Lj) of the flow saddle point on the rear wall of the melt pool (see “Discussion”), the metal flow channel area within the melt pool, and the flow channel areas of the front (A1j) and rear walls (A2j and A3j), which correspond to the flow regions near the inlet and outlet of the melt pool, respectively. Additionally, by deploying a camera near the torch of the welding robot, we gathered cost-effective data (Fig.4bii and iii), such as the inlet and outlet side areas (A4j and A5j) and perimeters (C1j and C2j) of the melt pool formed by the arc-melted metal, as well as the lengths of the major and minor axes (a1j, a2j, b1j, b2j) along and perpendicular to the welding direction. Furthermore, the compactness and eccentricity of the inlet and outlet sides of the melt pool were automatically measured. These cost-effective data contributed to uncovering correlations within the obtained information, further enhancing the breadth of the machine vision module. After performing standardization and normalization, all the data were fused at the feature level to provide the input conditions for the time-ahead prediction module.

We subsequently developed a long short-term memory network-multilayer perceptron (LSTM-MLP) model (see “Methods“) based on a sliding window approach, further refining the design of the time-ahead prediction module (Fig.2f). The sliding window technique enables the real-time prediction of sequential behaviors. LSTM is utilized to capture temporal features during the welding process, effectively mitigating the gradient explosion and vanishing problems that are often encountered by traditional recurrent neural networks when handling time series data61,62. The MLP, on the other hand, focuses on capturing spatial features that the LSTM may fail to extract, further exploring the interrelationships between the input features63. Through a comparative analysis of various common time series prediction models, we found that the LSTM-MLP model exhibits superior predictive accuracy in capturing the nonstationary behavior of robotic VPPA welding. Under nonstationary states, the LSTM-MLP model achieves a remarkable accuracy of 98.1%, substantially outperforming gated recurrent units gated recurrent units (GRU-MLP) (97%), RNN-MLP (96.9%), and LSTM (96.5%) (see Supplementary Tables15). Moreover, the LSTM-MLP model achieved a specificity of 98.9% in identifying nonstationary states (Fig.5a). The time-ahead prediction module built using the sliding-window LSTM-MLP model significantly outperformed the traditional time series prediction models in terms of capturing both temporal and spatial information. Furthermore, a t-distributed stochastic neighbor embedding (t-SNE) visualization analysis of the robotic VPPA welding states (Fig.5d) revealed that the LSTM output layer clearly distinguished between the quasisteady and nonstationary states, effectively capturing the correlations contained within the time series data. Moreover, the MLP layer further enhanced the ability to distinguish between the nonstationary and instability states, widening the boundaries between the three states; this highlights the advantage of the MLP in terms of capturing spatial information.

Statistical analysis was conducted on the model to investigate the sensitivity of the time-ahead prediction module to different input features. a The accuracy and specificity of GRU-MLP, RNN-MLP, LSTM-MLP, and LSTM were compared. b A confusion matrix analysis was conducted on the task of predicting the welding state 0.05 s in the future using a 0.2 s sliding window. c Ablation experiments were implemented to visualize the importance rankings of the expensive and cost-effective features, showing the critical roles of these features under different welding states. d A t-SNE-based dimensionality reduction scheme was executed, and the result visualization clearly presents the temporal and spatial information capture capabilities of the time series prediction steps under different melt pool states.

Full size image

The time-ahead prediction module, which integrates multiple information sources, could extract both temporal and spatial features from time series images and predict welding stability, even over longer time spans (Fig.4c). With the inclusion of both expensive and cost-effective data, the prediction accuracies achieved for future times of 0.01, 0.05, 0.1, 0.2, 0.5, and 1 s were 98.3%, 98.1%, 97.1%, 93.9%, 86.9%, and 86.3%, respectively. Although the accuracy remained above 87% for predictions within 1 s, a significant decrease in accuracy occurred once the prediction time exceeded 0.05 s. When the sliding window was changed to accumulate data over 0.05, 0.1, and 0.5 s, the prediction accuracies attained for the next 0.5 s were 94%, 97.8%, and 96.7%, respectively. After balancing the computational cost of the sliding window with the prediction accuracy, the best approach was to use 0.2 s of data to predict the welding state 0.05 s ahead. This provided sufficient redundancy for the subsequent control system design task. Additionally, we compared the prediction accuracies attained when different data sources were used (Fig.4c). Under the same conditions, when only cost-effective or expensive data were used, the prediction accuracies were 96.2% and 87.4%, respectively. The detection precision for nonstationary state can reach as high as 95.4% when all data sources are used, while the precision drops to 87.69% and 67.69% when only partial data are used (Fig. 5b). The results demonstrate a tendency to predict quasi-steady states when trained exclusively on expensive data, whereas training based solely on cost-effective inputs leads to a bias toward predicting unstable states. To further analyze this phenomenon, we conducted ablation experiments (Fig.5c) on the time-ahead prediction module, extracting the top 6 features that were most impactful to the welding states out of a total of 18 features (see Supplementary Fig.2). The five expensive features containing melt pool flow behavior information consistently ranked highly in terms of the prediction accuracy achieved at different stages, especially in the quasisteady states, where they played a dominant role in the decisions made by the model. Therefore, the time-ahead prediction module should incorporate as many expensive features as possible to support accurate future welding state predictions. Although the model already provided good predictions, it has not yet fully learned the relationships between the cost-effective and expensive features.

Data-driven physical saddle point modeling

During the welding process, by combining expensive and cost-effective data, we successfully utilized the time-ahead prediction module to predict future states. Although this achievement is promising, certain challenges remain in practical applications. For example, owing to its high cost and potential health risks, in situ X-ray imaging is difficult to apply broadly in engineering practice. Therefore, finding alternative methods for obtaining melt pool flow channel and saddle point information becomes an important task. However, techniques such as line scanning, light detection and ranging (LiDAR), and stereo vision face significant practical use challenges, particularly in terms of capturing flow channel information, due to arc interference during welding. Ignoring these data sources would result in substantial declines in the achieved prediction accuracy and recall rates. To address this issue, we conducted an in-depth analysis of the changes exhibited by the behavior of the molten metal flow of the melt pool under the influence of a thermal welding input and employed particle tracking (Fig.6a and “Methods”) to analyze the resulting flow behavior and patterns.

To further reveal the intrinsic mechanisms of melt pool instability and provide reliable data support for the construction of data-driven models, this study employed a combination of particle tracking and high-speed in situ X-ray acquisition systems to capture the flow behavior and flow channels of the molten metal contained within the melt pool. This method enabled the movement and flow paths the molten metal to be precisely tracked, thus providing crucial data for constructing physical models. a Particle tracking, which uses the high X-ray absorption capabilities of tungsten particles, reveals the flow behavior of the molten metal. This method accurately captures the microscopic flow features of the melt pool, offering new insights into the flow mechanisms and providing a novel perspective for constructing physical models. b Flow trajectories of tracer particles released at different depths in the melt pool, annotated via an image‑accumulation method (see Supplementary Movie1 and Supplementary Movie2). c The flow behavior and flow regions of the tracking particles significantly differ at various depths, and they present phenomena such as bifurcation at the front wall, a directional flow along the sidewalls, and complex merging and bifurcation behaviors at the rear wall. d The dual-saddle point flow model obtained from the experiments further reveals the key flow mechanisms in VPPA welding.

Full size image

In a practical operation scenario, we first use particle tracking and in situ X-ray imaging to capture the flow behavior information of the target melt pool, which assists in the construction of the data-driven model. By tracking the trajectories of tracer particles released at different depths, the directional flow patterns of liquid metal within the melt pool are obtained (Fig.6b). Research has shown that the molten metal on the front wall of a melt pool splits after melting and then flows toward both sidewalls, ultimately converging at the rear wall of the melt pool. Once the molten metal converges, it flows toward both the inlet and outlet sides of the melt pool, where it solidifies to form the rear wall surface. After the molten metal merges at the rear wall, splitting occurs at the saddle point position where the curvature of the morphology changes, and the flow velocity at this position approaches zero (Fig.6c). Based on this phenomenon, we propose a dual-saddle-point flow model for the robotic VPPA welding process (Fig.6d). Through multiple experiments, we found that the ion gas and current (EN and EP) significantly affect both the saddle point location and the flow channel area. Increasing the current leads to more metal melting, which increases the width and depth of the melt pool, thereby expanding the flow channel area. Moreover, higher temperatures enhance the heat and mass transfer process of the molten metal, further increasing its flow velocity. The increase in ion gas improves the penetration and rigidity of the plasma arc, intensifying the arc shear force imposed on the molten metal in the melt pool, which promotes the flow of molten metal and increases the area at the outlet side of the melt pool. Additionally, the shear force and pressure of the arc within the confined keyhole drive the molten metal, causing a bidirectional reverse flow at the rear wall, thus resulting in the splitting phenomenon64. The stability of the mass transfer behavior in molten metal during the welding process is a key factor for maintaining smooth operations. We also discovered that a nonstationary state phenomenon occurs during the transition from a stable melt pool to an unstable melt pool, and owing to the complex thermodynamic coupling scenario of the welding process, it is difficult to directly derive the saddle point variation rules through physical reasoning. To validate these findings, we conducted numerical simulations in our previous research. The results showed that increases in the current and ion gas levels lead to shifts in the saddle point location and changes in the flow channel area65.

Furthermore, we incorporated quasistatic welding features (such as EN current, EP current, and ion gas parameters) as physical constraints to construct a data-driven melt pool dynamics model known as a data-driven physical saddle point model (Fig.7a and “Methods”). This physical modeling process enabled the prediction of expensive features via cost-effective data. In the time-ahead prediction module, by combining cost-effective data with expensive data obtained through data-driven physical double-saddle modeling, we achieved precise predictions with an average accuracy rate of 97.4% (Fig.4c), outperforming the prediction methods using only cost-effective or expensive data. The model maintained high accuracy (Fig.5b). The advantage of this approach lies in the fact that once the model is trained, it no longer relies on directly using expensive data, thus reducing its costs and complexity in practical applications. Moreover, no additional architectural adjustments or optimizations are required for the already constructed time-ahead prediction module.

By incorporating quasistatic welding parameters as constraints, a data-driven VPPA welding-based melt pool dynamics model was developed. This model successfully substitutes expensive data, allowing the constructed time series prediction model to deliver reliable results under lower-cost conditions. a A flowchart based on physical information constraints outlines the construction process of the data-driven model, providing a detailed explanation of its specific application in the time series prediction model. b CBN was used to embed physical static information, such as the welding current and ion gas, into the model. Upon combining the CBN and BPNN, the melt pool dynamics features were precisely modeled. c A comparative analysis indicated significant error performance differences between the MISO CBN-BPNN model (achieved through multiple training sessions) and the MIMO CBN-BPNN model (achieved through a single training session). Boxplots visually present the error distributions of the various melt pool dynamics features, further validating the superiority of the MIMO model. d A comparison with multiple machine learning methods (in terms of the MAE, MSE, RMSE, and R2 metrics) demonstrates the significant advantages of the MIMO CBN-BPNN model, especially in terms of prediction accuracy and model robustness, providing a more competitive solution for melt pool dynamics modeling.

Full size image

In the traditional methods, the welding parameters are typically input as independent features directly into data-driven models. However, this type of approach does not effectively guide the model optimization process or improve the obtained prediction results. Since welding parameters are quasistatic features, their impact on the performance of a model is limited. Parameters such as the current, ion gas, and shielding gas are difficult to incorporate directly into the model. Inspired by conditional generative adversarial networks66, we propose a Condition-Based Neuromodulation (CBN)67,68 -Backpropagation Neural Network (BPNN) method. This approach effectively injects physical information into the data-driven model, enhancing the generalizability of the model and reducing its reliance on large datasets (Fig.7b). The core of the CBN lies in its ability to automatically adjust the output modulation based on the input conditions. Dynamically adapting the neuromodulation mechanism enhances the model’s expressiveness and stability while optimizing its training process and predictive accuracy by selectively applying condition-driven modulation to welding input parameters such as ion gas, EN current, and EP current. Through training, the CBN-BPNN model can independently operate without relying on expensive data inputs while still accurately capturing the melt pool dynamics and other high-cost features, with quasi-static information also effectively integrated into the main network architecture. The CBN-BPNN model, with its multiple-input multiple-output (MIMO) capabilities, better utilizes the correlations between cost-effective features, thereby enhancing the learning ability of the model. This ability significantly reduces the parallel computation costs incurred in engineering applications and effectively improves the resulting computational efficiency.

To further validate the robustness of the model, we compared the output error performances of the multiple-input single-output (MISO) model (multiple predictions) and the MIMO model (single prediction) (Fig.7c). The results showed that although the MISO model produced slightly smaller prediction errors in certain output dimensions, the MIMO model demonstrated more stable and reliable performance overall. Compared with the MISO strategy, the MIMO model maintains lower prediction variance and tighter error distributions across all outputs. This unified training approach enables the model to effectively leverage inter-dimensional correlations, improving generalization and mitigating the overfitting risk commonly observed in independently trained models. This feature makes the MIMO model more practical for real-world engineering applications. Additionally, to further verify whether this method is optimal, we compared our model with other machine learning models. The CBN-BPNN (MIMO) outperformed the other methods (see Supplementary Note) in terms of the mean absolute error (MAE), mean squared error (MSE), root mean square error (RMSE), and coefficient of determination (R2) metrics (Fig.7d). In conclusion, the model based on data-driven physical saddle point modeling, which combines expensive data with physical constraints, can maintain high prediction accuracy and stability in practical applications without relying on expensive data. This approach successfully substitutes expensive data with a small dataset, providing a reliable predictive tool for optimizing and stabilizing the welding process.

Dynamic model parameter tuning with an integrated prior and new data

Thus far, we have validated the time-ahead prediction module built based on the PHOENIX framework as an effective tool for preemptively predicting melt pool instability, thereby offering robust support for real-time control tasks implemented in robotic welding applications. However, the generalizability and real-time performance of deep learning models remain critical limitations regarding their practical industrial deployment, particularly in the domain of robotic VPPA welding. Industrial scenarios are inherently complex and dynamic, with factors such as spatial positioning variations, imprecise fixture alignments, thermal strain-induced gaps due to heat accumulation, asymmetric heat dissipation in complex structural components, and wire feeding anomalies during welding (Fig.8b) adversely affecting the performance of the utilized model in real-world conditions. Although the time-ahead prediction module exhibits effectiveness in certain industrial scenarios, its performance degrades under shifting conditions. This decline primarily stems from variations in melt pool flow behaviors across different environments, which in turn alter the spatiotemporal features of morphological features and reduce the consistency of input patterns. This underscores the pressing need to increase the accuracy of the model in data-scarce settings and to extend its generalizability across diverse operating conditions.

To address the challenges posed by complex welding conditions and further enhance the generalizability of time-ahead prediction models, a dynamic model parameter tuning method was proposed based on the PHOENIX framework. This approach integrates prior knowledge with new experiential data through incremental learning, enabling the autonomous optimization of model parameters across different groups. This optimization ensures that the time-ahead prediction module maintains high efficiency and accuracy in dynamic and unpredictable industrial welding scenarios, effectively mitigating the issues related to environmental variability, tool misalignment, thermal strain, and other operational complexities. a Utilizing a distributed dual-edge and cloud-coordinated method, the proposed workflow rectifies the omissions and misclassifications arising in complex new scenarios. By fine-tuning the model through intergroup incremental learning, the generalization ability of the time-ahead prediction module is significantly enhanced. The theoretical foundation for this approach lies in the temporal causal relationships that are present among the melt pool, welding state, and weld seam during the welding process. b In practical applications, robotic VPPA welding often encounters unique environmental challenges, including different welding positions (e.g., horizontal or overhead welding), dynamic variations in the welding speed during directional changes, gaps caused by thermal strain in the base material, asymmetric heat dissipation leading to uneven thermal distributions, and anomalies in the wire feeding speed.

Full size image

To address these challenges, we propose a dynamic model parameter tuning method based on the PHOENIX framework. This approach integrates prior knowledge with newly acquired experience through incremental learning, enabling the autonomous optimization of the model parameters across different operating groups and ensuring that the time series prediction module maintains high efficiency and precision in complex scenarios (Fig.8a). Freezing certain layers of the model allows it to retain predictive capability for prior scenarios while significantly reducing computational overhead and improving training efficiency. Simultaneously, by rebalancing the proportion of data from new and prior scenarios, the model undergoes incremental retraining, enabling it to adapt to new scenarios while preserving its predictive accuracy in previously encountered scenarios, thereby enhancing its overall adaptability and generalization. Leveraging a distributed dual-edge computing system in conjunction with cloud technology, the prediction module is fine-tuned via incremental learning to enhance adaptability. Specifically, two edge computing devices are deployed for industrial applications. The first device facilitates preemptive prediction in robotic VPPA welding scenarios, generating melt pool labels, whereas the second device conducts online weld seam condition monitoring and establishes corresponding melt pool labels. The small datasets generated by these devices are transmitted to a cloud-based server for conducting rapid incremental learning in small batches, enabling updates to the time-ahead prediction module. The updated model weights are subsequently transmitted back to the edge devices for adaptive adjustments. This design enables the preemptive prediction module to undergo adaptive parameter tuning in expansive welding scenarios, allowing it to better accommodate the current working conditions. This approach is particularly well-suited for the manufacturing of complex, large-scale components in both small- and medium-batch production. Beyond enhancing model adaptability, it preserves manufacturers’ control over proprietary models and product-specific configurations, thereby mitigating risks associated with model leakage and data contamination. Moreover, centralized, cloud-based training and optimization strategies significantly lower the costs of local deployment across multiple devices, improving both the economic feasibility and scalability of the system.

A gyroscope and a pre-deployed machine vision module are used to assist with scene perception and weld seam status monitoring, respectively, to support data correction. The weld seam detection module distinguished the states of the weld seam and melt pool under varying welding conditions, and the results were compared with those derived from the time series prediction module (Fig.9b). The accuracy of weld seam detection was nearly 100%, as the weld seam represents the final solidified outcome of the melt pool, allowing for labeling based on observed results. In contrast, the prediction accuracy for time-ahead prediction module was lower due to the inherently dynamic and evolving nature of the melt pool. By leveraging the temporal and causal relationship between the melt pool and the weld seam, the module enabled labeling of melt pool states (Fig.8a). The results acquired from the weld seam detection module and the predicted welding states were transmitted to the cloud. Since the data were reduced to a one-dimensional format, the transmission volume was minimal, rendering the latency negligible. In the cloud, the labels of missed and erroneous melt pool predictions were corrected via synchronized timestamps. Additionally, a data pruning technique was employed to align and quantify the dataset, preventing contamination caused by an excessive data volume and thereby effectively preserving the performance of the model. This design enhanced the data processing efficiency and robustness of the model, enabling it to more accurately handle data anomalies and errors in complex scenarios. The automated correction mechanism allowed welding errors to be promptly rectified during the operation process, thereby improving the quality and stability of welding tasks in industrial applications.

The figure illustrates the performance of dynamic model tuning via incremental learning, highlighting the impact of different layer-freezing strategies on the model’s adaptability, retention of prior knowledge, and overall predictive accuracy across evolving scenarios. a Experimental explored strategies for freezing different model parameter layers to preserve historical knowledge. Under the guidance of incremental learning, the model underwent adaptive optimization. Additionally, various sample replay strategies were evaluated, alongside the performance of the model in terms of adapting to complex new scenarios relative to its accuracy in familiar settings. The error bars represent the standard error of time-ahead prediction results across multiple independent data groups. b A comparison was conducted among three approaches for evaluating the cutting quality based on weld seam information: direct assessment using weld data, the time-ahead prediction module, and the time-ahead prediction module with incremental learning.

Full size image

In the cloud, we deployed an incremental learning module leveraging fine-tuning to adaptively adjust the model hyperparameters via corrected data. This fine-tuning process for small batches employed a low learning rate to minimize the modifications made to the initial time series prediction model while ensuring that new knowledge was integrated without erasing prior experiences. Specifically, the learning process was controlled by freezing certain layers of the time-ahead prediction module to accommodate new data distributions. By evaluating the effects of different layer-freezing configurations (Fig.9a), we observed that freezing a single layer significantly enhanced the adaptability of the model to new scenarios, albeit at the expense of reduced sensitivity to the original scenario. Conversely, increasing the number of frozen layers restored the ability of the model to perceive its original scenario but diminished its adaptability to new conditions. Freezing two layers provided the optimal balance, enabling the model to retain the prior knowledge while effectively learning from new data. Additionally, the sample replay strategy enhanced the adaptability of the model by combining new samples with a subset of older data to form small training batches. This approach prevented the model from overfitting on new data at the cost of forgetting its understanding of previous scenarios. The experimental results revealed that a New-to-Prior data ratio of 1:2 yielded the best tradeoff, ensuring both stability and adaptability for the model. Leveraging this incremental learning strategy, the accuracy of the time-ahead perception module was maintained at approximately 96% in similarly complex scenarios (Fig.9b).

Discussion

In real-world manufacturing scenarios, proactive anomalous state detection remains a critical challenge related to intelligent manufacturing, particularly when working with small-scale datasets. Achieving high accuracy, broad generalizability, and low missed detection rates under these constraints is exceedingly difficult. To address these issues, the PHOENIX framework offers an innovative solution by integrating physical information with data-driven models, effectively achieving the following key objectives: (1) ensuring high accuracy and a low missed detection rate in proactive anomalous state detection tasks, (2) reducing the reliance of the approach on expensive data by constructing robust physical models in data-scarce scenarios, and (3) enhancing the adaptive capabilities of models in complex manufacturing scenarios through autonomous optimization. With these advantages, the PHOENIX framework provides a perspective for intelligent manufacturing systems and offers practical pathways for overcoming the challenges faced by traditional models in industrial production settings.

The effectiveness and practicality of incorporating physical information into the model development and deployment procedures were validated through the application of the PHOENIX framework to robotic VPPA welding processes. The physics‑informed time‑ahead prediction module achieved 98.1% accuracy in predicting melt pool instability. Even when forecasting welding states up to one second ahead, the model achieved a predictive accuracy of 86%, demonstrating its proactive sensing ability. Furthermore, the physics-informed data-driven model effectively reduced the reliance on expensive data by constructing a physical VPPA saddle point model. By substituting costly data with inputs derived from proxy physical models, the time-ahead prediction module maintained an accuracy of 97.4%, highlighting the potential to minimize data dependencies without compromising the resulting performance. Finally, incremental learning, which includes strategies such as sample replay and data pruning, enabled the autonomous optimization of the model parameters. When subjected to six sudden and complex environmental scenarios, the model achieved an accuracy of approximately 96%, significantly enhancing its generalizability. Through a hierarchical deployment scheme involving dual-edge and cloud architectures, the PHOENIX framework effectively addresses the challenge of rapidly and proactively optimizing small-scale, complex products in manufacturing enterprises. This demonstrates its potential as an effective tool for advancing intelligent manufacturing systems.

In this study, we further analyzed the mechanisms of melt pool instability by combining real-time X-ray imaging data concerning the flow behavior of molten metal in a melt pool with the ablation test results of the proactive prediction model. The dynamic stability of the directional flow and saddle point behavior of the molten metal within the VPPA welding melt pool is crucial for maintaining process stability. In particular, the convergence and divergence of molten metal at the saddle point position of the real wall contribute to the formation of a stable weld. Welding position variations can destabilize the balance of the molten metal flow within the melt pool, which in turn affects the stability of the saddle point position and ultimately impacts the resulting weld quality. During gap welding, the lack of sufficient molten metal at the front wall for replenishing the convergence of the rear wall makes it difficult to maintain dynamic equilibrium at the saddle point. This phenomenon underscores the importance of maintaining dynamic equilibrium at the saddle point in robotic VPPA welding tasks. By incorporating the dynamic features of the melt pool, including saddle point behavior, as constraints in the construction and optimization procedures of the full-process model, the stability of the pool can be enhanced, and the overall performance of the model can be significantly improved. This in turn further strengthens the precision control and quality prediction capabilities of the welding process.

However, some challenges still warrant further exploration. First, the effective application of physical information in a manufacturing process relies on prior knowledge and an understanding of the process itself, which stands in contrast with the fully black-box approach of AI models. In data-scarce scenarios, we aim to achieve improved model performance by integrating physical information; thus, finding the correct balance between physical information and the black-box nature of the constructed model remains a critical challenge. Second, while data replay and pruning strategies have been implemented to mitigate overfitting in new scenarios, the progressive deployment and iterative learning of the time-ahead prediction model in real-world applications necessitate further investigation into effectively reducing the influence of historical data on the current model, ensuring adaptive learning without compromising generalization. Meanwhile, further research is warranted on enhancing adaptability to various production scales while simultaneously reducing deployment costs to improve overall manufacturing efficiency.

This study represents the integration of physical information with data-driven models in robotic welding scenarios, providing a solid foundation for the realization of intelligent welding technologies. The proposed PHOENIX framework demonstrates high generalizability and holds significant potential for broader applications, including arc-based additive manufacturing and other advanced manufacturing domains. Additionally, the framework is applicable to rapidly predicting mechanical properties and microstructures, as well as for the online detection of defects such as pores and cracks, depending on the specific detection and monitoring objectives. By constraining the input information with physical information, the model can effectively mitigate data quantity and quality issues, significantly improving the accuracy of the model, lowering its missed detection rates, and enhancing its adaptability and generalizability across diverse scenarios.

Methods

Melt pool dynamic behavior acquisition

As the core physical information for guiding the construction of physical models, the dynamic stability exhibited by the behavior of the VPPA welding melt pool was monitored via an in situ X-ray high-speed acquisition system (Fig.2a). To obtain clearer X-ray projection results and reflect the thickness of the welding process as accurately as possible, the dimensions of the base material, 5052 aluminum alloy, were set to 100 mm × 20 mm ×  mm. In the experiment, the X-ray source was set to a tube current of 1.5 mA and a tube voltage of 230 kV, with images recorded by a high-speed camera at a rate of 1000 FPS after penetration (see Supplementary Fig.1). The resolution of the camera was 800 × 600 pixels, capturing an area of 22 mm × 20 mm behind the image intensifier. To visualize the flow behavior of the molten metal in the melt pool, particle tracking was employed (Fig.5b). Through the high-speed in situ X-ray acquisition system, tungsten particles (0.3 mm) were used to track the flow behavior and velocity of the molten metal. Owing to their relatively high atomic number and density, tungsten particles exhibited significant X-ray absorption, thereby enhancing image contrast and making them ideal tracer particles for tracking material flows.

During the welding process, holes with diameters of 0.8 mm were drilled in the base material, into which four tungsten particles were placed. The holes were then sealed with welding wire. When the arc heated the area and melted the metal, the tungsten particles were released into the melt pool. By tracking and marking the position of each particle, the flow trajectory was obtained. The displacement distance of the particles over a fixed time interval was calculated to determine the flow velocity of the molten metal. To simplify this process, a continuous tracking system based on the YOLO+DeepSort model69 was developed to automatically extract relevant information. The experimental results indicate that the four tungsten particles, when placed within the same hole, did not significantly affect the stability of the melt pool. A dual-saddle-point flow model was constructed on the basis of the flow velocity and flow trajectory. Additionally, the relative flow channel area of the front wall of the melt pool was obtained by calculating the projection information of the three-dimensional space occupied by the melt pool (Fig.4bii). The relative flow channel area was divided into two regions: the flow channel at the front wall and the saddle point located in region A1. The flow channel at the rear wall was divided into two reverse flow regions on the basis of the saddle point position; the region flowing toward the melt pool entrance was labeled A2, whereas the region flowing toward the melt pool exit was labeled A3.

Machine vision with transfer learning

In this study, we integrated the melt pool dynamics features derived from an in situ X-ray high-speed acquisition system with the melt pool morphology obtained from an industrial camera. As a result, constructing a highly versatile machine vision-assisted system for efficiently extracting effective features became a key task. Although the image quality was optimized by reducing the thickness of the base material, image blurring still occurred at the edges of the X-ray images output from the in situ system. Additionally, during the welding polarity transitions, the image intensifier was affected by periodic fluctuations, making the accuracy of image segmentation an important metric for evaluating the performance of the system (Fig.4bi). To address this challenge, we employed a combination consisting of the VGG16 and U-Net network models to precisely segment the melt pool edges and base material from the high-speed image acquisition system. This model integrates the convolutional layers of VGG16 into the encoder part of U-Net, thereby fusing both models and simplifying the U-Net architecture through transfer learning (Fig.2e). Since the U-Net decoder requires skip connections with the encoder to ensure compatibility in terms of the number of channels and the resolution, we added additional convolutional layers to ensure that the output of the VGG16 encoder could effectively interface with the U-Net decoder. VGG16, a classic pretrained network, was initialized with weights trained on large datasets (ImageNet), accelerating the convergence of the melt pool segmentation model and significantly improving the segmentation performance achieved on expensive datasets.

The model achieved segmentation accuracies of 98.9% and 99.6% on the validation sets of the expensive and cost-effective datasets, respectively. When deployed on edge computing hardware, the processing time required for a single image was 0.0255 s. This process was realized through the online detection program we developed, which was integrated as a plugin into the machine vision module. In this program, for the expensive data, the point at the upper-left corner of the melt pool on the inlet side was set as a fixed point, with a coordinate system defined based on this point to solve for the input consisting of all physical information. For the cost-effective data, feature information was captured by calculating pixel values.

Sliding-window time-ahead prediction module

In this study, we developed a multi-input time series prediction system based on a sliding window approach, utilizing both LSTM-MLP models (Fig.2f). The dynamic features used for LSTM-MLP-based prediction are continuously captured in real time by the machine vision module and temporarily stored. Once the accumulated data reaches the preset sliding window size, it is formatted as an array and transmitted to the time-ahead prediction module, enabling the forecasting of the melt pool state at a designated future time step. As time progresses, the sliding window dynamically updates by replacing the oldest data, ensuring that the time-ahead prediction module continuously receives the most recent time-series inputs, thereby generating stable and coherent predictive outcomes. Meanwhile, the LSTM was employed to capture the temporal dependencies of the input physical information features, whereas the MLP further processed the spatial information acquired from these inputs. By combining both models, we effectively achieved time series prediction for the welding states. To standardize and integrate both the expensive and cost-effective data, we standardized the input physical information features and constructed an input data matrix. The LSTM part of the model contains three layers, each with a hidden state size of 64 dimensions, and was designed to capture temporal dependencies in sequence data. The output of the LSTM is then passed to the MLP, which consists of two fully connected layers. The first layer expands the hidden state dimensions to 128, and the second layer maps the result to the three output classes corresponding to the welding states. By combining the temporal processing ability of LSTM with the nonlinear mapping ability of the MLP, the model effectively extracts meaningful features from complex time series data for accurately performing classification.

Both the LSTM and MLP layers share a loss function, specifically the cross-entropy loss function. Through joint optimization, the output of the LSTM layer provides valuable information to the MLP layer, improving the resulting classification accuracy. To further process the physical information features and evaluate the performance of the time series prediction module, we used t-SNE for dimensionality reduction and visualization purposes (Fig.5d). Additionally, t-SNE was used for prognostic stratification validation, demonstrating the discriminative ability of the model across different welding states. Furthermore, we performed an ablation study by gradually removing physical information features, aiming to evaluate the contributions of the expensive and cost-effective features to the performance of the model (Fig.5c). This also deepens the understanding of the physical models within the time-ahead model framework. In this study, a confusion matrix was used as a tool to evaluate the classification performance of the time series prediction module across different welding states, particularly in terms of accuracy, precision, recall, and the false-negative rate (Fig.5b).

Physics-infused data-driven modeling

To construct a highly versatile physical model for replacing expensive data, we designed a MIMO CBN-BPNN. This network processes various welding conditions and features via CBN. This network processes various welding conditions and features by dynamically adjusting in training process using condition dependent scaling. Specifically, quasistatic features, including the EN current (\({I}_{{EN}}\)), EP current (\({I}_{{EP}}\)), and ion gas flow rate (\(Q\)), were incorporated as additional input channels to modulate the output. A learnable projection module transforms the condition vector into a scaling vector through a sigmoid activation function (Fig.2f):

$$\alpha \left({I}_{{EN}},{I}_{{EP}},Q\right)={\sigma} ({W}_{\alpha }\cdot \left[{I}_{{EN}},{I}_{{EP}},Q\right]+{b}_{\alpha })$$

(1)

Outputs are additionally tuned at each step in the training process:

$${y}_{i}=\hat{{y}_{i}}\cdot \left(1+{\alpha }_{i}({I}_{{EN}},{I}_{{EP}},Q)\right)$$

(2)

Where \(\hat{{y}_{i}}\) is the raw output from the main network branch, \({\alpha }_{i}\) is the learned condition-dependent modulation factor, \({W}_{\alpha }\) and \({b}_{\alpha }\) are the projection weights and bias, respectively. This approach allows the model to flexibly adjust its output prediction according to varying process conditions, thereby enhancing its expressiveness and adaptability under different welding environments. The backbone of the BPNN consists of three fully connected layers with 128, 64, and 32 neurons, respectively. To evaluate the performance of the model on both the training and validation sets, we use metrics such as the (MAE), the RMSE, and R2.

Incremental learning via cloud deployment

To facilitate the model’s application in complex scenarios, we propose an intergroup incremental learning strategy (Fig.2h). Incremental learning enables fine-tuning of model parameters to adapt to new samples without requiring a complete retraining of all layer weights. The core principle of intergroup incremental learning is that with each newly acquired data group, the model undergoes training based on previously learned parameters, ensuring that the updated model serves as the foundation for subsequent training. This iterative process allows the model to progressively learn the characteristics of different data groups, leveraging automated labeling to enhance data efficiency while maintaining adaptability and preserving prior knowledge. The update mechanism is structured as follows:

$${\theta }_{t}={\theta }_{t-1}-\eta {\nabla }_{\theta }{L}_{{CEL}}({\theta }_{t-1},{X}_{{G}_{t}},y{G}_{t})$$

(3)

where \({\theta }_{t}\) denotes the parameters of the time series prediction model after training on the t-th data group, and \({\theta }_{t-1}\) represents the parameters of the model following the training process conducted on the previous group. \(\eta\) indicates the learning rate, and \({L}_{{CEL}}({\theta }_{t-1},{X}_{{G}_{t}},y{G}_{t})\) is the loss function for the t-th data group, which is computed using the parameters \({\theta }_{t-1}\) acquired from the previous update. To reduce the imposed computational overhead, we freeze the LSTM layers of the time series prediction model, allowing the model to focus on updating specific components. The frozen layers do not participate in gradient updates, whereas the remaining layers continue to be updated. To determine the optimal number of frozen layers in the LSTM for the current data group, we systematically freeze different layers to identify the best configuration, ensuring that the performance of the model is maximized while enabling fine-grained control (Fig.9a). To prevent catastrophic forgetting and enhance the efficiency of the model, we implement a sample replay strategy and a data pruning strategy. The data pruning strategy enables the distribution of the three-state data to be balanced after performing self-correction, addressing the significant data imbalances across different states. With respect to the sample replay strategy (Fig.9a), by retaining past samples and mixing them with the current new samples during training, sample replay helps the model maintain its memory of the previous tasks while improving its generalization ability, thus preventing overfitting to new tasks. Additionally, sample replay increases the diversity of the samples, which effectively enhances the adaptability of the model to unknown scenarios, further improving its performance in incremental learning tasks.

A physics-informed and data-driven framework for robotic welding in manufacturing (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Moshe Kshlerin

Last Updated:

Views: 5847

Rating: 4.7 / 5 (57 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Moshe Kshlerin

Birthday: 1994-01-25

Address: Suite 609 315 Lupita Unions, Ronnieburgh, MI 62697

Phone: +2424755286529

Job: District Education Designer

Hobby: Yoga, Gunsmithing, Singing, 3D printing, Nordic skating, Soapmaking, Juggling

Introduction: My name is Moshe Kshlerin, I am a gleaming, attractive, outstanding, pleasant, delightful, outstanding, famous person who loves writing and wants to share my knowledge and understanding with you.