|Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA,
B. Zhang, P.E. Davis, N. Morales, Z. Zhang, K. Teranishi, M. Parashar. In Euro-Par 2023: Parallel Processing, Springer Nature Switzerland, pp. 323--338. 2023.
The extreme-scale computing landscape is increasingly dominated by GPU-accelerated systems. At the same time, in-situ workflows that employ memory-to-memory inter-application data exchanges have emerged as an effective approach for leveraging these extreme-scale systems. In the case of GPUs, GPUDirect RDMA enables third-party devices, such as network interface cards, to access GPU memory directly and has been adopted for intra-application communications across GPUs. In this paper, we present an interoperable framework for GPU-based in-situ workflows that optimizes data movement using GPUDirect RDMA. Specifically, we analyze the characteristics of the possible data movement pathways between GPUs from an in-situ workflow perspective, and design a strategy that maximizes throughput. Furthermore, we implement this approach as an extension of the DataSpaces data staging service, and experimentally evaluate its performance and scalability on a current leadership GPU cluster. The performance results show that the proposed design reduces data-movement time by up to 53% and 40% for the sender and receiver, respectively, and maintains excellent scalability for up to 256 GPUs.
Studying Latency and Throughput Constraints for Geo-Distributed Data in the National Science Data Fabric
J. Luettgau, H. Martinez, G. Tarcea, G. Scorzelli, V. Pascucci, M. Taufer. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, ACM, pp. 325–326. 2023.
The National Science Data Fabric (NSDF) is our solution to the problem of addressing the data-sharing needs of the growing data science community. NSDF is designed to make sharing data across geographically distributed sites easier for users who lack technical expertise and infrastructure. By developing an easy-to-install software stack, we promote the FAIR data-sharing principles in NSDF while leveraging existing high-speed data transfer infrastructures such as Globus and XRootD. This work shows how we leverage latency and throughput information between geo-distributed NSDF sites with NSDF entry points to optimize the automatic coordination of data placement and transfer across the data fabric, which can further improve the efficiency of data sharing.
Multi-Omic Integration of Blood-Based Tumor-Associated Genomic and Lipidomic Profiles Using Machine Learning Models in Metastatic Prostate Cancer
S. Fang, S. Zhe, H.M. Lin, A.A. Azad, H. Fettke, E.M. Kwan, L. Horvath, B. Mak, T. Zheng, P. Du, S. Jia, R.M. Kirby, M. Kohli. In Clinical Cancer Informatics, 2023.
On the Decentralized Stochastic Gradient Descent with Markov Chain Sampling
T. Sun, D. Li, B. Wang. In IEEE Transactions on Signal Processing, IEEE, July, 2023.
The decentralized stochastic gradient method emerges as a promising solution for solving large-scale machine learning problems. This paper studies the decentralized Markov chain gradient descent (DMGD), a variant of the decentralized stochastic gradient method, which draws random samples along the trajectory of a Markov chain. DMGD arises when obtaining independent samples is costly or impossible, excluding the use of the traditional stochastic gradient algorithms. Specifically, we consider the DMGD over a connected graph, where each node only communicates with its neighbors by sending and receiving the intermediate results. We establish both ergodic and nonergodic convergence rates of DMGD, which elucidate the critical dependencies on the topology of the graph that connects all nodes and the mixing time of the Markov chain. We further numerically verify the sample efficiency of DMGD.
|Development of Large-Scale Scientific Cyberinfrastructure and the Growing Opportunity to Democratize Access to Platforms and Data,
J. Luettgau, G. Scorzelli, V. Pascucci, M. Taufer. In Distributed, Ambient and Pervasive Interactions, Springer Nature Switzerland, pp. 378--389. 2023.
As researchers across scientific domains rapidly adopt advanced scientific computing methodologies, access to advanced cyberinfrastructure (CI) becomes a critical requirement in scientific discovery. Lowering the entry barriers to CI is a crucial challenge in interdisciplinary sciences requiring frictionless software integration, data sharing from many distributed sites, and access to heterogeneous computing platforms. In this paper, we explore how the challenge is not merely a factor of availability and affordability of computing, network, and storage technologies but rather the result of insufficient interfaces with an increasingly heterogeneous mix of computing technologies and data sources. With more distributed computation and data, scientists, educators, and students must invest their time and effort in coordinating data access and movements, often penalizing their scientific research. Investments in the interfaces’ software stack are necessary to help scientists, educators, and students across domains take advantage of advanced computational methods. To this end, we propose developing a science data fabric as the standard scientific discovery interface that seamlessly manages data dependencies within scientific workflows and CI.
Interpreting and generalizing deep learning in physics-based problems with functional linear models
Subtitled arXiv:2307.04569, A. Arzani, L. Yuan, P. Newell, B. Wang. 2023.
Although deep learning has achieved remarkable success in various scientific machine learning applications, its black-box nature poses concerns regarding interpretability and generalization capabilities beyond the training data. Interpretability is crucial and often desired in modeling physical systems. Moreover, acquiring extensive datasets that encompass the entire range of input features is challenging in many physics-based learning tasks, leading to increased errors when encountering out-of-distribution (OOD) data. In this work, motivated by the field of functional data analysis (FDA), we propose generalized functional linear models as an interpretable surrogate for a trained deep learning model. We demonstrate that our model could be trained either based on a trained neural network (post-hoc interpretation) or directly from training data (interpretable operator learning). A library of generalized functional linear models with different kernel functions is considered and sparse regression is used to discover an interpretable surrogate model that could be analytically presented. We present test cases in solid mechanics, fluid mechanics, and transport. Our results demonstrate that our model can achieve comparable accuracy to deep learning and can improve OOD generalization while providing more transparency and interpretability. Our study underscores the significance of interpretability in scientific machine learning and showcases the potential of functional linear models as a tool for interpreting and generalizing deep learning.
|Error Estimation for the Material Point and Particle in Cell Methods,
M. Berzins. In admos2023, 2023.
The Material Point Method (MPM) is widely used for challenging applications in engineering, and animation. The complexity of the method makes error estimation challenging. Error analysis of a simple MPM method is undertaken and the global error is shown to be first order in space and time for a widely-used variant of the method. Computational experiments illustrate the estimated accuracy.
Ensemble physics informed neural networks: A framework to improve inverse transport modeling in heterogeneous domains
M. Aliakbari, M.S. Sadrabadi, P. Vadasz, A. Arzani. In Physics of Fluids, AIP, 2023.
Modeling fluid flow and transport in heterogeneous systems is often challenged by unknown parameters that vary in space. In inverse
Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees
S. Brink, M. McKinsey, D. Boehme, C. Scully-Allison, I. Lumsden, D. Hawkins, T. Burgess, V. Lama, J. Luettgau, K.E. Isaacs, M. Taufer, O. Pearce. In HPDC ’23, ACM, 2023.
Thicket is an open-source Python toolkit for Exploratory Data Analysis (EDA) of multi-run performance experiments. It enables an understanding of optimal performance configuration for large-scale application codes. Most performance tools focus on a single execution (e.g., single platform, single measurement tool, single scale). Thicket bridges the gap to convenient analysis in multi-dimensional, multi-scale, multi-architecture, and multi-tool performance datasets by providing an interface for interacting with the performance data.
|Orchestration of materials science workflows for heterogeneous resources at large scale,
N. Zhou, G. Scorzelli, J. Luettgau, R.R. Kancharla, J. Kane, R. Wheeler, B. Croom, B. Newell, V. Pascucci, M. Taufer. In The International Journal of High Performance Computing Applications, Sage, 2023.
In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., cracks) to mitigate these bottlenecks. We demonstrate the effectiveness of our framework for a workflow performing synchrotron X-ray computed tomography reconstruction and segmentation of a silica-based structure. Our framework provides a cloud-based, cutting-edge solution to challenges such as growing intermediate and output data and heavy resource demands during image reconstruction and segmentation. Specifically, our framework efficiently manages data storage, scaling up compute resources on the cloud. The multi-layer software structure of our framework includes three layers. A top layer uses Jupyter notebooks and serves as the user interface. A middle layer uses Ansible for resource deployment and managing the execution environment. A low layer is dedicated to resource management and provides resource management and job scheduling on heterogeneous nodes (i.e., GPU and CPU). At the core of this layer, Kubernetes supports resource management, and Dask enables large-scale job scheduling for heterogeneous resources. The broader impact of our work is four-fold: through our framework, we hide the complexity of the cloud’s software stack to the user who otherwise is required to have expertise in cloud technologies; we manage job scheduling efficiently and in a scalable manner; we enable resource elasticity and workflow orchestration at a large scale; and we facilitate moving the study of nonporous structures, which has wide applications in engineering and scientific fields, to the cloud. While we demonstrate the capability of our framework for a specific materials science application, it can be adapted for other applications and domains because of its modular, multi-layer architecture.
Extending Hedgehog’s dataflow graphs to multi-node GPU architectures
N. Shingde, M. Berzins, T. Blattner, W. Keyrouz, A. Bardakoff. In Workshop on Asynchronous Many-Task Systems and Applications (WAMTA23), 2023.
Asynchronous task-based systems offer the possibility of making it easier to take advantage of scalable heterogeneous architectures.
Learning Proper Orthogonal Decomposition of Complex Dynamics Using Heavy-ball Neural ODEs
J. Baker, E. Cherkaev, A. Narayan, B. Wang. In Journal of Scientific Computing, Vol. 95, No. 14, 2023.
Proper orthogonal decomposition (POD) allows reduced-order modeling of complex dynamical systems at a substantial level, while maintaining a high degree of accuracy in modeling the underlying dynamical systems. Advances in machine learning algorithms enable learning POD-based dynamics from data and making accurate and fast predictions of dynamical systems. This paper extends the recently proposed heavy-ball neural ODEs (HBNODEs) (Xia et al. NeurIPS, 2021] for learning data-driven reduced-order models (ROMs) in the POD context, in particular, for learning dynamics of time-varying coefficients generated by the POD analysis on training snapshots constructed by solving full-order models. HBNODE enjoys several practical advantages for learning POD-based ROMs with theoretical guarantees, including 1) HBNODE can learn long-range dependencies effectively from sequential observations, which is crucial for learning intrinsic patterns from sequential data, and 2) HBNODE is computationally efficient in both training and testing. We compare HBNODE with other popular ROMs on several complex dynamical systems, including the von Kármán Street flow, the Kurganov-Petrova-Popov equation, and the one-dimensional Euler equations for fluids modeling.
An approximate control variates approach to multifidelity distribution estimation
Subtitled arXiv:2303.06422v1, R. Han, A. Narayan, Y. Xu. 2023.
Forward simulation-based uncertainty quantification that studies the output distribution of quantities of interest (QoI) is a crucial component for computationally robust statistics and engineering. There is a large body of literature devoted to accurately assessing statistics of QoI, and in particular, multilevel or multifidelity approaches are known to be effective, leveraging cost-accuracy tradeoffs between a given ensemble of models. However, effective algorithms that can estimate the full distribution of outputs are still under active development. In this paper, we introduce a general multifidelity framework for estimating the cumulative distribution functions (CDFs) of vector-valued QoI associated with a high-fidelity model under a budget constraint. Given a family of appropriate control variates obtained from lower fidelity surrogates, our framework involves identifying the most cost-effective model subset and then using it to build an approximate control variates estimator for the target CDF. We instantiate the framework by constructing a family of control variates using intermediate linear approximators and rigorously analyze the corresponding algorithm. Our analysis reveals that the resulting CDF estimator is uniformly consistent and budget-asymptotically optimal, with only mild moment and regularity assumptions. The approach provides a robust multifidelity CDF estimator that is adaptive to the available budget, does not require a priori knowledge of cross-model statistics or model hierarchy, and is applicable to general output dimensions. We demonstrate the efficiency and robustness of the approach using several test examples.
The effects of passive design on indoor thermal comfort and energy savings for residential buildings in hot climates: A systematic review
M. Hu, K. Zhang, Q. Nguyen, T. Tasdizen. In Urban Climate, Vol. 49, pp. 101466. 2023.
In this study, a systematic review and meta-analysis were conducted to identify, categorize, and investigate the effectiveness of passive cooling strategies (PCSs) for residential buildings. Forty-two studies published between 2000 and 2021 were reviewed; they examined the effects of PCSs on indoor temperature decrease, cooling load reduction, energy savings, and thermal comfort hour extension. In total, 30 passive strategies were identified and classified into three categories: design approach, building envelope, and passive cooling system. The review found that using various passive strategies can achieve, on average, (i) an indoor temperature decrease of 2.2 °C, (ii) a cooling load reduction of 31%, (iii) energy savings of 29%, and (v) a thermal comfort hour extension of 23%. Moreover, the five most effective passive strategies were identified as well as the differences between hot and dry climates and hot and humid climates.
A unified scalable framework for causal sweeping strategies for Physics-Informed Neural Networks (PINNs) and their temporal decompositions
Subtitled arXiv:2302.14227v1, M. Penwarden, A.D. Jagtap, S. Zhe, G.E. Karniadakis, R.M. Kirby. 2023.
Physics-informed neural networks (PINNs) as a means of solving partial differential equations (PDE) have garnered much attention in the Computational Science and Engineering (CS&E) world. However, a recent topic of interest is exploring various training (i.e., optimization) challenges – in particular, arriving at poor local minima in the optimization landscape results in a PINN approximation giving an inferior, and sometimes trivial, solution when solving forward time-dependent PDEs with no data. This problem is also found in, and in some sense more difficult, with domain decomposition strategies such as temporal decomposition using XPINNs. To address this problem, we first enable a general categorization for previous causality methods, from which we identify a gap (e.g., opportunity) in the previous approaches. We then furnish examples and explanations for different training challenges, their cause, and how they relate to information propagation and temporal decomposition. We propose a solution to fill this gap by reframing these causality concepts into a generalized information propagation framework in which any prior method or combination of methods can be described. This framework is easily modifiable via user parameters in the open-source code accompanying this paper. Our unified framework moves toward reducing the number of PINN methods to consider and the reimplementation and retuning cost for thorough comparisons rather than increasing it. Using the idea of information propagation, we propose a new stacked-decomposition method that bridges the gap between time-marching PINNs and XPINNs. We also introduce significant computational speed-ups by using transfer learning concepts to initialize subnetworks in the domain and loss tolerance-based propagation for the subdomains. Finally, we formulate a new time-sweeping collocation point algorithm inspired by the previous PINNs causality literature, which our framework can still describe, and provides a significant computational speed-up via reduced-cost collocation point segmentation. The proposed methods overcome training challenges in PINNs and XPINNs for time-dependent PDEs by respecting the causality in multiple forms and improving scalability by limiting the computation required per optimization iteration. Finally, we provide numerical results for these methods on baseline PDE problems for which unmodified PINNs and XPINNs struggle to train.
Genetic Programming Based Symbolic Regression for Analytical Solutions to Differential Equations
Subtitled arXiv:2302.03175v1, H. Oh, R. Amici, G. Bomarito, S. Zhe, R. Kirby, J. Hochhalter. 2023.
In this paper, we present a machine learning method for the discovery of analytic solutions to differential equations. The method utilizes an inherently interpretable algorithm, genetic programming based symbolic regression. Unlike conventional accuracy measures in machine learning we demonstrate the ability to recover true analytic solutions, as opposed to a numerical approximation. The method is verified by assessing its ability to recover known analytic solutions for two separate differential equations. The developed method is compared to a conventional, purely data-driven genetic programming based symbolic regression algorithm. The reliability of successful evolution of the true solution, or an algebraic equivalent, is demonstrated.
Deep neural operators can serve as accurate surrogates for shape optimization: A case study for airfoils
Subtitled arXiv:2302.00807v1, K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, L. Bravo, A. Ghoshal, R.M. Kirby, G. Karniadakis. 2023.
Deep neural operators, such as DeepONets, have changed the paradigm in high-dimensional nonlinear regression from function regression to (differential) operator regression, paving the way for significant changes in computational engineering applications. Here, we investigate the use of DeepONets to infer flow fields around unseen airfoils with the aim of shape optimization, an important design problem in aerodynamics that typically taxes computational resources heavily. We present results which display little to no degradation in prediction accuracy, while reducing the online optimization cost by orders of magnitude. We consider NACA airfoils as a test case for our proposed approach, as their shape can be easily defined by the four-digit parametrization. We successfully optimize the constrained NACA four-digit problem with respect to maximizing the lift-to-drag ratio and validate all results by comparing them to a high-order CFD solver. We find that DeepONets have low generalization error, making them ideal for generating solutions of unseen shapes. Specifically, pressure, density, and velocity fields are accurately inferred at a fraction of a second, hence enabling the use of general objective functions beyond the maximization of the lift-to-drag ratio considered in the current work.
A Metalearning Approach for Physics-Informed Neural Networks (PINNs): Application to Parameterized PDEs
M. Penwarden, S. Zhe, A. Narayan, R.M. Kirby. In Journal of Computational Physics, Elsevier, 2023.
Physics-informed neural networks (PINNs) as a means of discretizing partial differential equations (PDEs) are garnering much attention in the Computational Science and Engineering (CS&E) world. At least two challenges exist for PINNs at present: an understanding of accuracy and convergence characteristics with respect to tunable parameters and identification of optimization strategies that make PINNs as efficient as other computational science tools. The cost of PINNs training remains a major challenge of Physics-informed Machine Learning (PiML) – and, in fact, machine learning (ML) in general. This paper is meant to move towards addressing the latter through the study of PINNs on new tasks, for which parameterized PDEs provides a good testbed application as tasks can be easily defined in this context. Following the ML world, we introduce metalearning of PINNs with application to parameterized PDEs. By introducing metalearning and transfer learning concepts, we can greatly accelerate the PINNs optimization process. We present a survey of model-agnostic metalearning, and then discuss our model-aware metalearning applied to PINNs as well as implementation considerations and algorithmic complexity. We then test our approach on various canonical forward parameterized PDEs that have been presented in the emerging PINNs literature.
Multi-Task Classification for Improved Health Outcome Prediction Based on Environmental Indicators
M. Alirezaei, Q.C. Nguyen, R. Whitaker, T. Tasdizen. In IEEE Access, 2022.
The influence of the neighborhood environment on health outcomes has been widely recognized in various studies. Google street view (GSV) images offer a unique and valuable tool for evaluating neighborhood environments on a large scale. By annotating the images with labels indicating the presence or absence of certain neighborhood features, we can develop classifiers that can automatically analyze and evaluate the environment. However, labeling GSV images on a large scale is a time-consuming and labor-intensive task. Considering these challenges, we propose using a multi-task classifier to improve training a classifier with limited supervised, GSV data. Our multi-task classifier utilizes readily available, inexpensive online images collected from Flicker as a related classification task. The hypothesis is that a classifier trained on multiple related tasks is less likely to overfit to small amounts of training data and generalizes better to unseen data. We leverage the power of multiple related tasks to improve the classifier’s overall performance and generalization capability. Here we show that, with the proposed learning paradigm, predicted labels for GSV test images are more accurate. Across different environment indicators, the accuracy, F1 score and balanced accuracy increase up to 6 % in the multi-task learning framework compared to its single-task learning counterpart. The enhanced accuracy of the predicted labels obtained through the multi-task classifier contributes to a more reliable and precise regression analysis determining the correlation between predicted built environment indicators and health outcomes. The R2 values calculated for different health outcomes improve by up to 4 % using multi-task learning detected indicators.
Accelerating Physics Schemes in Numerical Weather Prediction Codes and Preserving Positivity in the Physics-Dynamics coupling
Timbwaoga Aime Judicael (TAJO) Ouermi. University of Utah, 2022.