UncertainSCI: A Python Package for Noninvasive Parametric Uncertainty Quantification of Simulation Pipelines|
J. Tate, Z. Liu, J.A. Bergquist, S. Rampersad, D. White, C. Charlebois, L. Rupp, D. Brooks, R. MacLeod, A. Narayan. In Journal of Open Source Software, Vol. 8, No. 90, 2023.
We have developed UncertainSCI (UncertainSCI, 2020) as an open-source tool designed to make modern uncertainty quantification (UQ) techniques more accessible in biomedical simulation applications. UncertainSCI is implemented in Python with a noninvasive interface to meet our software design goals of 1) numerical accuracy, 2) simple application programming interface (API), 3) adaptability to many applications and methods, and 4) interfacing with diverse simulation software. Using a Python implementation in UncertainSCI allowed us to utilize the popularity and low barrier-to-entry of Python and its common packages and to leverage the built-in integration and support for Python in common simulation software packages and languages. Additionally, we used noninvasive UQ techniques and created a similarly noninvasive interface to external modeling software that can be called in diverse ways, depending on the complexity and level of Python integration in the external simulation pipeline. We have developed and included examples applying UncertainSCI to relatively simple 1D simulations implemented in Python, and to bioelectric field simulations implemented in external software packages, which demonstrate the use of UncertainSCI and the effectiveness of the architecture and implementation in achieving our design goals. UnceratainSCI differs from similar software, notably UQLab, Uncertainpy, and Simnibs, in that it can be efficiently and non-invasively used with external simulation software, specifically with high resolution 3D simulations often used in Bioelectric field simulations. Figure 1 illustrates the use of UncertainSCI in computing UQ with modeling pipelines for bioelectricity simulations
Instance-wise Linearization of Neural Network for Model Interpretation|
Subtitled arXiv:2310.16295v1, Z. Li, S. Liu, K. Bhavya, T. Bremer, V. Pascucci. 2023.
Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction.
Attribute-Aware RBFs: Interactive Visualization of Time Series Particle Volumes Using RT Core Range Queries|
N. Morrical, S. Zellmann, A. Sahistan, P. Shriwise, V. Pascucci. In IEEE Trans Vis Comput Graph, IEEE, 2023.
Smoothed-particle hydrodynamics (SPH) is a mesh-free method used to simulate volumetric media in fluids, astrophysics, and solid mechanics. Visualizing these simulations is problematic because these datasets often contain millions, if not billions of particles carrying physical attributes and moving over time. Radial basis functions (RBFs) are used to model particles, and overlapping particles are interpolated to reconstruct a high-quality volumetric field; however, this interpolation process is expensive and makes interactive visualization difficult. Existing RBF interpolation schemes do not account for color-mapped attributes and are instead constrained to visualizing just the density field. To address these challenges, we exploit ray tracing cores in modern GPU architectures to accelerate scalar field reconstruction. We use a novel RBF interpolation scheme to integrate per-particle colors and densities, and leverage GPU-parallel tree construction and refitting to quickly update the tree as the simulation animates over time or when the user manipulates particle radii. We also propose a Hilbert reordering scheme to cluster particles together at the leaves of the tree to reduce tree memory consumption. Finally, we reduce the noise of volumetric shadows by adopting a spatially temporal blue noise sampling scheme. Our method can provide a more detailed and interactive view of these large, volumetric, time-series particle datasets than traditional methods, leading to new insights into these physics simulations.
Ray Tracing Spherical Harmonics Glyphs|
C. Peters, T. Patel, W. Usher, C R. Johnson. In Vision, Modeling, and Visualization, The Eurographics Association, 2023.
Spherical harmonics glyphs are an established way to visualize high angular resolution diffusion imaging data. Starting from a unit sphere, each point on the surface is scaled according to the value of a linear combination of spherical harmonics basis functions. The resulting glyph visualizes an orientation distribution function. We present an efficient method to render these glyphs using ray tracing. Our method constructs a polynomial whose roots correspond to ray-glyph intersections. This polynomial has degree 2k + 2 for spherical harmonics bands 0, 2, . . . , k. We then find all intersections in an efficient and numerically stable fashion through polynomial root finding. Our formulation also gives rise to a simple formula for normal vectors of the glyph. Additionally, we compute a nearly exact axis-aligned bounding box to make ray tracing of these glyphs even more efficient. Since our method finds all intersections for arbitrary rays, it lets us perform sophisticated shading and uncertainty visualization. Compared to prior work, it is faster, more flexible and more accurate.
"Yeah, this graph doesn't show that": Analysis of Online Engagement with Misleading Data Visualizations|
M. Lisnic, A. Lex, M. Kogan. In OSF Preprints, 2023.
Attempting to make sense of a phenomenon or crisis, social media users often share data visualizations and interpretations that can be erroneous or misleading. Prior work has studied how data visualizations can mislead, but do misleading visualizations reach a broad social media audience? And if so, do users amplify or challenge misleading interpretations? To answer these questions, we conducted a mixed-methods analysis of the public’s engagement with data visualization posts about COVID-19 on Twitter. Compared to posts with accurate visual insights, our results show that posts with misleading visualizations garner more replies in which the audiences point out nuanced fallacies and caveats in data interpretations. Based on the results of our thematic analysis of engagement, we identify and discuss important opportunities and limitations to effectively leveraging crowdsourced assessments to address data-driven misinformation.
Strengthening the US Department of Energy's Recruitment Pipeline: The DOE/NNSA Predictive Science Academic Alliance Program (PSAAP) Experience|
J. K. Holmen, V. G. Vergara Larrea, E. W. Draeger, E. T. Phipps, P. J. Smith, M. Berzins, S. T. Smith, J. N. Thornock, S. Parete-Koon. In Practice and Experience in Advanced Research Computing, ACM, pp. 137--144. 2023.
The US Department of Energy (DOE) oversees a system of 17 national laboratories responsible for developing unique scientific capabilities beyond the scope of academic and industrial institutions. These labs strive to keep America at the forefront of discovery and are home to some of the Nation’s best minds and the world’s best scientific and research facilities. Collaborations between national laboratories and academic institutions are critical to develop and recruit talent for the DOE workforce. Academia’s cooperative education model poses challenges for DOE recruitment pipelines centered around traditional internships. This paper discusses a promising DOE recruitment pipeline, the National Nuclear Security Administration’s (NNSA) Predictive Science Academic Alliance Program (PSAAP) initiative. As a part of this, experiences capturing the successes and challenges faced by the University of Utah’s Carbon Capture Multidisciplinary Simulation Center (CCMSC) through their participation in the PSAAP-II initiative are shared. These experiences demonstrate the success of Utah’s PSAAP center as a recruitment pipeline with approximately 43% of CCMSC students going to a national laboratory after graduation. Potential opportunities to strengthen the DOE’s recruitment pipeline are also discussed.
reVISit: Supporting Scalable Evaluation of Interactive Visualizations|
Subtitled OSF Preprints, Y. Ding, J. Wilburn, H. Shrestha, A. Ndlovu, K. Gadhave, C. Nobre, A. Lex, L. Harrison. 2023.
reVISit is an open-source software toolkit and framework for creating, deploying, and monitoring empirical visualization studies. Running a quality empirical study in visualization can be demanding and resource-intensive, requiring substantial time, cost, and technical expertise from the research team. These challenges are amplified as research norms trend towards more complex and rigorous study methodologies, alongside a growing need to evaluate more complex interactive visualizations. reVISit aims to ameliorate these challenges by introducing a domain-specific language for study set-up, and a series of software components, such as UI elements, behavior provenance, and an experiment monitoring and management interface. Together with interactive or static stimuli provided by the experimenter, these are compiled to a ready-to-deploy web-based experiment. We demonstrate reVISit's functionality by re-implementing two studies – a graphical perception task and a more complex, interactive study. reVISit is an open-source community project, available at https://revisit.dev/
Studying Latency and Throughput Constraints for Geo-Distributed Data in the National Science Data Fabric|
J. Luettgau, H. Martinez, G. Tarcea, G. Scorzelli, V. Pascucci, M. Taufer. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, ACM, pp. 325–326. 2023.
The National Science Data Fabric (NSDF) is our solution to the problem of addressing the data-sharing needs of the growing data science community. NSDF is designed to make sharing data across geographically distributed sites easier for users who lack technical expertise and infrastructure. By developing an easy-to-install software stack, we promote the FAIR data-sharing principles in NSDF while leveraging existing high-speed data transfer infrastructures such as Globus and XRootD. This work shows how we leverage latency and throughput information between geo-distributed NSDF sites with NSDF entry points to optimize the automatic coordination of data placement and transfer across the data fabric, which can further improve the efficiency of data sharing.
AI for Scientific Visualization|
C. R. Johnson, H. Shen. In Artificial Intelligence for Science, Edited by Alok Choudhary, Geoffrey Fox, and Tony Hey, World Scientific, pp. 535-552. 2023.
Fiber Uncertainty Visualization for Bivariate Data With Parametric and Nonparametric Noise Models|
T. M. Athawale, C.R. Johnson, S. Sane,, D. Pugmire. In IEEE Transactions on Visualization and Computer Graphics, Vol. 29, No. 1, IEEE, pp. 613-23. 2023.
Visualization and analysis of multivariate data and their uncertainty are top research challenges in data visualization. Constructing fiber surfaces is a popular technique for multivariate data visualization that generalizes the idea of level-set visualization for univariate data to multivariate data. In this paper, we present a statistical framework to quantify positional probabilities of fibers extracted from uncertain bivariate fields. Specifically, we extend the state-of-the-art Gaussian models of uncertainty for bivariate data to other parametric distributions (e.g., uniform and Epanechnikov) and more general nonparametric probability distributions (e.g., histograms and kernel density estimation) and derive corresponding spatial probabilities of fibers. In our proposed framework, we leverage Green’s theorem for closed-form computation of fiber probabilities when bivariate data are assumed to have independent parametric and nonparametric noise. Additionally, we present a nonparametric approach combined with numerical integration to study the positional probability of fibers when bivariate data are assumed to have correlated noise. For uncertainty analysis, we visualize the derived probability volumes for fibers via volume rendering and extracting level sets based on probability thresholds. We present the utility of our proposed techniques via experiments on synthetic and simulation datasets
FunMC2: A Filter for Uncertainty Visualization of Marching Cubes on Multi-Core Devices|
Z. Wang, T. M. Athawale, K. Moreland, J. Chen, C. R. Johnson, D. Pugmire. In Eurographics Symposium on Parallel Graphics and Visualization, 2023.
Visualization is an important tool for scientists to extract understanding from complex scientific data. Scientists need to understand the uncertainty inherent in all scientific data in order to interpret the data correctly. Uncertainty visualization has been an active and growing area of research to address this challenge. Algorithms for uncertainty visualization can be expensive, and research efforts have been focused mainly on structured grid types. Further, support for uncertainty visualization in production tools is limited. In this paper, we adapt an algorithm for computing key metrics for visualizing uncertainty in Marching Cubes (MC) to multi-core devices and present the design, implementation, and evaluation for a Filter for uncertainty visualization of Marching Cubes on Multi-Core devices (FunMC2). FunMC2 accelerates the uncertainty visualization of MC significantly, and it is portable across multi-core CPUs and GPUs. Evaluation results show that FunMC2 based on OpenMP runs around 11× to 41× faster on multi-core CPUs than the corresponding serial version using one CPU core. FunMC2 based on a single GPU is around 5× to 9× faster than FunMC2 running by OpenMP. Moreover, FunMC2 is flexible enough to process ensemble data with both structured and unstructured mesh types. Furthermore, we demonstrate that FunMC2 can be seamlessly integrated as a plugin into ParaView, a production visualization tool for post-processing.
A Visual Environment for Data Driven Protein Modeling and Validation|
M. Falk, V. Tobiasson, A. Bock, C. Hansen, A. Ynnerman. In IEEE Transactions on Visualization and Computer Graphics, IEEE, pp. 1-11. 2023.
In structural biology, validation and verification of new atomic models are crucial and necessary steps which limit the production of reliable molecular models for publications and databases. An atomic model is the result of meticulous modeling and matching and is evaluated using a variety of metrics that provide clues to improve and refine the model so it fits our understanding of molecules and physical constraints. In cryo electron microscopy (cryo-EM) the validation is also part of an iterative modeling process in which there is a need to judge the quality of the model during the creation phase. A shortcoming is that the process and results of the validation are rarely communicated using visual metaphors. This work presents a visual framework for molecular validation. The framework was developed in close collaboration with domain experts in a participatory design process. Its core is a novel visual representation based on 2D heatmaps that shows all available validation metrics in a linear fashion, presenting a global overview of the atomic model and provide domain experts with interactive analysis tools. Additional information stemming from the underlying data, such as a variety of local quality measures, is used to guide the user's attention toward regions of higher relevance. Linked with the heatmap is a three-dimensional molecular visualization providing the spatial context of the structures and chosen metrics. Additional views of statistical properties of the structure are included in the visual framework. We demonstrate the utility of the framework and its visual guidance with examples from cryo-EM.
Data Abstraction Elephants: The Initial Diversity of Data Representations and Mental Models|
K. Williams, A. Bigelow, K.E. Isaacs. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23), ACM, 2023.
Two people looking at the same dataset will create diferent mental models, prioritize diferent attributes, and connect with diferent visualizations. We seek to understand the space of data abstractions associated with mental models and how well people communicate their mental models when sketching. Data abstractions have a profound infuence on the visualization design, yet it’s unclear how universal they may be when not initially infuenced by a representation. We conducted a study about how people create their mental models from a dataset. Rather than presenting tabular data, we presented each participant with one of three datasets in paragraph form, to avoid biasing the data abstraction and mental model. We observed various mental models, data abstractions, and depictions from the same dataset, and how these concepts are infuenced by communication and purpose-seeking. Our results have implications for visualization design, especially during the discovery and data collection phase.
|Orchestration of materials science workflows for heterogeneous resources at large scale,
N. Zhou, G. Scorzelli, J. Luettgau, R.R. Kancharla, J. Kane, R. Wheeler, B. Croom, B. Newell, V. Pascucci, M. Taufer. In The International Journal of High Performance Computing Applications, Sage, 2023.
In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., cracks) to mitigate these bottlenecks. We demonstrate the effectiveness of our framework for a workflow performing synchrotron X-ray computed tomography reconstruction and segmentation of a silica-based structure. Our framework provides a cloud-based, cutting-edge solution to challenges such as growing intermediate and output data and heavy resource demands during image reconstruction and segmentation. Specifically, our framework efficiently manages data storage, scaling up compute resources on the cloud. The multi-layer software structure of our framework includes three layers. A top layer uses Jupyter notebooks and serves as the user interface. A middle layer uses Ansible for resource deployment and managing the execution environment. A low layer is dedicated to resource management and provides resource management and job scheduling on heterogeneous nodes (i.e., GPU and CPU). At the core of this layer, Kubernetes supports resource management, and Dask enables large-scale job scheduling for heterogeneous resources. The broader impact of our work is four-fold: through our framework, we hide the complexity of the cloud’s software stack to the user who otherwise is required to have expertise in cloud technologies; we manage job scheduling efficiently and in a scalable manner; we enable resource elasticity and workflow orchestration at a large scale; and we facilitate moving the study of nonporous structures, which has wide applications in engineering and scientific fields, to the cloud. While we demonstrate the capability of our framework for a specific materials science application, it can be adapted for other applications and domains because of its modular, multi-layer architecture.
Here’s what you need to know about my data: Exploring Expert Knowledge’s Role in Data Analysis|
H. Lin, M. Lisnic, D. Akbaba, M. Meyer, A. Lex. 2023.
Data driven decision making has become the gold standard in science, industry, and public policy. Yet data alone, as an imperfect and partial representation of reality, is often insufficient to make good analysis decisions. Knowledge about the context of a dataset, its strengths and weaknesses, and its applicability for certain tasks is essential. In this work, we present an interview study with analysts from a wide range of domains and with varied expertise and experience inquiring about the role of contextual knowledge. We provide insights into how data is insufficient in analysts workflows and how they incorporate other sources of knowledge into their analysis. We also suggest design opportunities to better and more robustly consider both, knowledge and data in analysis processes.
Progressive Tree-Based Compression of Large-Scale Particle Data|
D. Hoang, H. Bhatia, P. Lindstrom, V. Pascucci. In IEEE Transactions on Visualization and Computer Graphics, IEEE, pp. 1--18. 2023.
Scientific simulations and observations using particles have been creating large datasets that require effective and efficient data reduction to store, transfer, and analyze. However, current approaches either compress only small data well while being inefficient for large data, or handle large data but with insufficient compression. Toward effective and scalable compression/decompression of particle positions, we introduce new kinds of particle hierarchies and corresponding traversal orders that quickly reduce reconstruction error while being fast and low in memory footprint. Our solution to compression of large-scale particle data is a flexible block-based hierarchy that supports progressive, random-access, and error-driven decoding, where error estimation heuristics can be supplied by the user. For low-level node encoding, we introduce new schemes that effectively compress both uniform and densely structured particle distributions.
Protein-metabolite interactomics of carbohydrate metabolism reveal regulation of lactate dehydrogenase|
K. G. Hicks, A. A. Cluntun, H. L. Schubert, S. R. Hackett, J. A. Berg, P. G. Leonard, M. A. Ajalla Aleixo, Y. Zhou, A. J. Bott, S. R. Salvatore, F. Chang, A. Blevins, P. Barta, S. Tilley, A. Leifer, A. Guzman, A. Arok, S. Fogarty, J. M. Winter, H. Ahn, K. N. Allen, S. Block, I. A. Cardoso, J. Ding, I. Dreveny, C. Gasper, Q. Ho, A. Matsuura, M. J. Palladino, S. Prajapati, P. Sun, K. Tittmann, D. R. Tolan, J. Unterlass, A. P. VanDemark, M. G. Vander Heiden, B. A. Webb, C. Yun, P. Zhap, B. Wang, F. J. Schopfer, C. P. Hill, M. C. Nonato, F. L. Muller, J. E. Cox, J. Rutter. In Science, Vol. 379, No. 6636, pp. 996-1003. 2023.
Metabolic networks are interconnected and influence diverse cellular processes. The protein-metabolite interactions that mediate these networks are frequently low affinity and challenging to systematically discover. We developed mass spectrometry integrated with equilibrium dialysis for the discovery of allostery systematically (MIDAS) to identify such interactions. Analysis of 33 enzymes from human carbohydrate metabolism identified 830 protein-metabolite interactions, including known regulators, substrates, and products as well as previously unreported interactions. We functionally validated a subset of interactions, including the isoform-specific inhibition of lactate dehydrogenase by long-chain acyl–coenzyme A. Cell treatment with fatty acids caused a loss of pyruvate-lactate interconversion dependent on lactate dehydrogenase isoform expression. These protein-metabolite interactions may contribute to the dynamic, tissue-specific metabolic flexibility that enables growth and survival in an ever-changing nutrient environment. Understanding how metabolic state influences cellular processes requires systematic analysis of low-affinity interactions of metabolites with proteins. Hicks et al. describe a method called MIDAS (mass spectrometry integrated with equilibrium dialysis for the discovery of allostery systematically), which allowed them to probe such interactions for 33 enzymes of human carbohydrate metabolism and more than 400 metabolites. The authors detected many known and many new interactions, including regulation of lactate dehydrogenase by ATP and long-chain acyl coenzyme A, which may help to explain known physiological relations between fat and carbohydrate metabolism in different tissues. —LBR A mass spectrometry and dialysis method detects metabolite-protein interactions that help to control physiology.
Exploring Classification of Topological Priors with Machine Learning for Feature Extraction|
S. Leventhal, A. Gyulassy, M. Heimann, V. Pascucci. In IEEE Transactions on Visualization and Computer Graphics, pp. 1--12. 2023.
In many scientific endeavors, increasingly abstract representations of data allow for new interpretive methodologies and conceptualization of phenomena. For example, moving from raw imaged pixels to segmented and reconstructed objects allows researchers new insights and means to direct their studies toward relevant areas. Thus, the development of new and improved methods for segmentation remains an active area of research. With advances in machine learning and neural networks, scientists have been focused on employing deep neural networks such as U-Net to obtain pixel-level segmentations, namely, defining associations between pixels and corresponding/referent objects and gathering those objects afterward. Topological analysis, such as the use of the Morse-Smale complex to encode regions of uniform gradient flow behavior, offers an alternative approach: first, create geometric priors, and then apply machine learning to classify. This approach is empirically motivated since phenomena of interest often appear as subsets of topological priors in many applications. Using topological elements not only reduces the learning space but also introduces the ability to use learnable geometries and connectivity to aid the classification of the segmentation target. In this paper, we describe an approach to creating learnable topological elements, explore the application of ML techniques to classification tasks in a number of areas, and demonstrate this approach as a viable alternative to pixel-level classification, with similar accuracy, improved execution time, and requiring marginal training data.
Troubling Collaboration: Matters of Care for Visualization Design Study|
D. Akbaba, D. Lange, M. Correll, A. Lex, M. Meyer. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23),, pp. 23--28. April, 2023.
A common research process in visualization is for visualization researchers to collaborate with domain experts to solve particular applied data problems. While there is existing guidance and expertise around how to structure collaborations to strengthen research contributions, there is comparatively little guidance on how to navigate the implications of, and power produced through the socio-technical entanglements of collaborations. In this paper, we qualitatively analyze refective interviews of past participants of collaborations from multiple perspectives: visualization graduate students, visualization professors, and domain collaborators. We juxtapose the perspectives of these individuals, revealing tensions about the tools that are built and the relationships that are formed — a complex web of competing motivations. Through the lens of matters of care, we interpret this web, concluding with considerations that both trouble and necessitate reformation of current patterns around collaborative work in visualization design studies to promote more equitable, useful, and care-ful outcomes.
Accelerated Probabilistic Marching Cubes by Deep Learning for Time-Varying Scalar Ensembles|
M. Han, T.M. Athawale, D. Pugmire, C.R. Johnson. In 2022 IEEE Visualization and Visual Analytics (VIS), IEEE, pp. 155-159. 2022.
Visualizing the uncertainty of ensemble simulations is challenging due to the large size and multivariate and temporal features of en-semble data sets. One popular approach to studying the uncertainty of ensembles is analyzing the positional uncertainty of the level sets. Probabilistic marching cubes is a technique that performs Monte Carlo sampling of multivariate Gaussian noise distributions for positional uncertainty visualization of level sets. However, the technique suffers from high computational time, making interactive visualization and analysis impossible to achieve. This paper introduces a deep-learning-based approach to learning the level-set uncertainty for two-dimensional ensemble data with a multivariate Gaussian noise assumption. We train the model using the first few time steps from time-varying ensemble data in our workflow. We demonstrate that our trained model accurately infers uncertainty in level sets for new time steps and is up to 170X faster than that of the original probabilistic model with serial computation and 10X faster than that of the original parallel computation.