banner research


Visualization, sometimes referred to as visual data analysis, uses the graphical representation of data as a means of gaining understanding and insight into the data. Visualization research at SCI has focused on applications spanning computational fluid dynamics, medical imaging and analysis, biomedical data analysis, healthcare data analysis, weather data analysis, poetry, network and graph analysis, financial data analysis, etc.

Research involves novel algorithm and technique development to building tools and systems that assist in the comprehension of massive amounts of (scientific) data. We also research the process of creating successful visualizations.

We strongly believe in the role of interactivity in visual data analysis. Therefore, much of our research is concerned with creating visualizations that are intuitive to interact with and also render at interactive rates.

Visualization at SCI includes the academic subfields of Scientific Visualization, Information Visualization and Visual Analytics.


Charles Hansen

Volume Rendering
Ray Tracing

Valerio Pascucci

Topological Methods
Data Streaming
Big Data

Chris Johnson

Scalar, Vector, and
Tensor Field Visualization,
Uncertainty Visualization


Mike Kirby

Uncertainty Visualization

Ross Whitaker

Topological Methods
Uncertainty Visualization

Miriah Meyer

Information Visualization


Yarden Livnat

Information Visualization

alex lex

Alex Lex

Information Visualization


Bei Wang

Information Visualization
Scientific Visualization
Topological Data Analysis

Visualization Project Sites:

Associated Labs:

Publications in Visualization:

enRoute: Dynamic Path Extraction from Biological Pathway Maps for Exploring Heterogeneous Experimental Datasets
C. Partl, A. Lex, M. Streit, D. Kalkofen, K. Kashofer, D. Schmalstieg. In BMC Bioinformatics, Vol. 14, No. Suppl 19, Nov, 2013.
ISSN: 1471-2105
DOI: 10.1186/1471-2105-14-S19-S3

Jointly analyzing biological pathway maps and experimental data is critical for understanding how biological processes work in different conditions and why different samples exhibit certain characteristics. This joint analysis, however, poses a significant challenge for visualization. Current techniques are either well suited to visualize large amounts of pathway node attributes, or to represent the topology of the pathway well, but do not accomplish both at the same time. To address this we introduce enRoute, a technique that enables analysts to specify a path of interest in a pathway, extract this path into a separate, linked view, and show detailed experimental data associated with the nodes of this extracted path right next to it. This juxtaposition of the extracted path and the experimental data allows analysts to simultaneously investigate large amounts of potentially heterogeneous data, thereby solving the problem of joint analysis of topology and node attributes. As this approach does not modify the layout of pathway maps, it is compatible with arbitrary graph layouts, including those of hand-crafted, image-based pathway maps. We demonstrate the technique in context of pathways from the KEGG and the Wikipathways databases. We apply experimental data from two public databases, the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA) that both contain a wide variety of genomic datasets for a large number of samples. In addition, we make use of a smaller dataset of hepatocellular carcinoma and common xenograft models. To verify the utility of enRoute, domain experts conducted two case studies where they explore data from the CCLE and the hepatocellular carcinoma datasets in the context of relevant pathways.

Ray Tracing and Volume Rendering Large Molecular Data on Multi-core and Many-core Architectures.
A. Knoll, I. Wald, P. Navratil, M. E Papka,, K. P Gaither. In Proc. 8th International Workshop on Ultrascale Visualization at SC13 (Ultravis), 2013, 2013.

Visualizing large molecular data requires efficient means of rendering millions of data elements that combine glyphs, geometry and volumetric techniques. The geometric and volumetric loads challenge traditional rasterization-based vis methods. Ray casting presents a scalable and memory- efficient alternative, but modern techniques typically rely on GPU-based acceleration to achieve interactive rendering rates. In this paper, we present bnsView, a molecular visualization ray tracing framework that delivers fast volume rendering and ball-and-stick ray casting on both multi-core CPUs andmany-core Intel ® Xeon PhiTM co-processors, implemented in a SPMD language that generates efficient SIMD vector code for multiple platforms without source modification. We show that our approach running on co- processors is competitive with similar techniques running on GPU accelerators, and we demonstrate large-scale parallel remote visualization from TACC's Stampede supercomputer to large-format display walls using this system.

International Journal for Uncertainty Quantification,
Subtitled “Special Issue on Working with Uncertainty: Representation, Quantification, Propagation, Visualization, and Communication of Uncertainty,” C.R. Johnson, A. Pang (Eds.). In Int. J. Uncertainty Quantification, Vol. 3, No. 3, Begell House, Inc., 2013.
ISSN: 2152-5080
DOI: 10.1615/Int.J.UncertaintyQuantification.v3.i3

International Journal for Uncertainty Quantification,
Subtitled “Special Issue on Working with Uncertainty: Representation, Quantification, Propagation, Visualization, and Communication of Uncertainty,” C.R. Johnson, A. Pang (Eds.). In Int. J. Uncertainty Quantification, Vol. 3, No. 2, Begell House, Inc., pp. vii--viii. 2013.
ISSN: 2152-5080
DOI: 10.1615/Int.J.UncertaintyQuantification.v3.i2

The impact of display bezels on stereoscopic vision for tiled displays
J. Grüninger, J. Krüger. In Proceedings of the 19th ACM Symposium on Virtual Reality Software and Technology (VRST), pp. 241--250. 2013.
DOI: 10.1145/2503713.2503717

In recent years high-resolution tiled display systems have gained significant attention in scientific and information visualization of large-scale data. Modern tiled display setups are based on either video projectors or LCD screens. While LCD screens are the preferred solution for monoscopic setups, stereoscopic displays almost exclusively consist of some kind of video projection. This is because projections can significantly reduce gaps between tiles, while LCD screens require a bezel around the panel. Projection setups, however, suffer from a number of maintenance issues that are avoided by LCD screens. For example, projector alignment is a very time-consuming task that needs to be repeated at intervals, and different aging states of lamps and filters cause color inconsistencies. The growing availability of inexpensive stereoscopic LCDs for television and gaming allows one to build high-resolution stereoscopic tiled display walls with the same dimensions and resolution as projection systems at a fraction of the cost, while avoiding the aforementioned issues. The only drawback is the increased gap size between tiles.

In this paper, we investigate the effects of bezels on the stereo perception with three surveys and show, that smaller LCD bezels and larger displays significantly increase stereo perception on display wall systems. We also show that the bezel color is not very important and that bezels can negatively affect the adaption times to the stereoscopic effect but improve task completion times. Finally, we present guidelines for the setup of tiled stereoscopic display wall systems.

Visualization Collaborations: What Works and Why
R.M. Kirby, M.D. Meyer. In IEEE Computer Graphics and Applications: Visualization Viewpoints, Vol. 33, No. 6, pp. 82--88. 2013.

In 1987, Bruce McCormick and his colleagues outlined the current state and future vision of visualization in scientific computing.1 That same year, Donna Cox pioneered her concept of the "Renaissance team"-a multidisciplinary team of experts focused on solving visualization problems.2 Even if a member of the visualization community has never read McCormick and his colleagues' report or heard Donna Cox speak, he or she has probably been affected by some of their ideas.

Of particular interest to us is their vision for collaboration. McCormick and his colleagues envisioned an interdisciplinary team that through close interaction would develop visualization tools that not only were effective in the context of their immediate collaborative environment but also could be reused by scientists and engineers in other fields. McCormick and his colleagues categorized the types of researchers they imagined constituting these teams, one type being the "visualization scientist/engineer." They even commented on the skills these individuals might have. However, they provided little guidance on how to make such teams successful.

In the more than 25 years since the report, researchers have refined the concepts of interaction versus collaboration,3 interdisciplinary versus multidisciplinary teams,4,5 and independence versus interdependence.6 Here, we use observations from our collective 18 years of collaborative visualization research to help shed light on not just the composition of current and future visualization collaborative teams but also pitfalls and recommendations for successful collaboration. Although our statements might reflect what seasoned visualization researchers are already doing, we believe that reexpressing and possibly reaffirming basic collaboration principles provide benefits.

Scalable Visualization and Interactive Analysis Using Massive Data Streams
V. Pascucci, P.-T. Bremer, A. Gyulassy, G. Scorzelli, C. Christensen, B. Summa, S. Kumar. In Cloud Computing and Big Data, Advances in Parallel Computing, Vol. 23, IOS Press, pp. 212--230. 2013.

Historically, data creation and storage has always outpaced the infrastructure for its movement and utilization. This trend is increasing now more than ever, with the ever growing size of scientific simulations, increased resolution of sensors, and large mosaic images. Effective exploration of massive scientific models demands the combination of data management, analysis, and visualization techniques, working together in an interactive setting. The ViSUS application framework has been designed as an environment that allows the interactive exploration and analysis of massive scientific models in a cache-oblivious, hardware-agnostic manner, enabling processing and visualization of possibly geographically distributed data using many kinds of devices and platforms.

For general purpose feature segmentation and exploration we discuss a new paradigm based on topological analysis. This approach enables the extraction of summaries of features present in the data through abstract models that are orders of magnitude smaller than the raw data, providing enough information to support general queries and perform a wide range of analyses without access to the original data.

Keywords: Visualization, data analysis, topological data analysis, Parallel I/O

Uncertainty Visualization in HARDI based on Ensembles of ODFs
F. Jiao, J.M. Phillips, Y. Gur, C.R. Johnson. In Proceedings of 2013 IEEE Pacific Visualization Symposium, pp. 193--200. 2013.
PubMed ID: 24466504
PubMed Central ID: PMC3898522

In this paper, we propose a new and accurate technique for uncertainty analysis and uncertainty visualization based on fiber orientation distribution function (ODF) glyphs, associated with high angular resolution diffusion imaging (HARDI). Our visualization applies volume rendering techniques to an ensemble of 3D ODF glyphs, which we call SIP functions of diffusion shapes, to capture their variability due to underlying uncertainty. This rendering elucidates the complex heteroscedastic structural variation in these shapes. Furthermore, we quantify the extent of this variation by measuring the fraction of the volume of these shapes, which is consistent across all noise levels, the certain volume ratio. Our uncertainty analysis and visualization framework is then applied to synthetic data, as well as to HARDI human-brain data, to study the impact of various image acquisition parameters and background noise levels on the diffusion shapes.

Characterization and modeling of PIDX parallel I/O for performance optimization
S. Kumar, A. Saha, V. Vishwanath, P. Carns, J.A. Schmidt, G. Scorzelli, H. Kolla, R. Grout, R. Latham, R. Ross, M.E. Papka, J. Chen, V. Pascucci. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 67. 2013.

Parallel I/O library performance can vary greatly in response to user-tunable parameter values such as aggregator count, file count, and aggregation strategy. Unfortunately, manual selection of these values is time consuming and dependent on characteristics of the target machine, the underlying file system, and the dataset itself. Some characteristics, such as the amount of memory per core, can also impose hard constraints on the range of viable parameter values. In this work we address these problems by using machine learning techniques to model the performance of the PIDX parallel I/O library and select appropriate tunable parameter values. We characterize both the network and I/O phases of PIDX on a Cray XE6 as well as an IBM Blue Gene/P system. We use the results of this study to develop a machine learning model for parameter space exploration and performance prediction.

Keywords: I/O, Network Characterization, Performance Modeling

Comprehensible Presentation of Topological Information
G.H. Weber, K. Beketayev, P.-T. Bremer, B. Hamann, M. Haranczyk, M. Hlawitschka, V. Pascucci. No. LBNL-5693E, Lawrence Berkeley National Laboratory, 2013.

Topological information has proven very valuable in the analysis of scientific data. An important challenge that remains is presenting this highly abstract information in a way that it is comprehensible even if one does not have an in-depth background in topology. Furthermore, it is often desirable to combine the structural insight gained by topological analysis with complementary information, such as geometric information. We present an overview over methods that use metaphors to make topological information more accessible to non-expert users, and we demonstrate their applicability to a range of scientific data sets. With the increasingly complex output of exascale simulations, the importance of having effective means of providing a comprehensible, abstract overview over data will grow. The techniques that we present will serve as an important foundation for this purpose.

Topology analysis of time-dependent multi-fluid data using the Reeb graph
F. Chen, H. Obermaier, H. Hagen, B. Hamann, J. Tierny, V. Pascucci. In Computer Aided Geometric Design, Vol. 30, No. 6, pp. 557--566. 2013.
DOI: 10.1016/j.cagd.2012.03.019

Liquid–liquid extraction is a typical multi-fluid problem in chemical engineering where two types of immiscible fluids are mixed together. Mixing of two-phase fluids results in a time-varying fluid density distribution, quantitatively indicating the presence of liquid phases. For engineers who design extraction devices, it is crucial to understand the density distribution of each fluid, particularly flow regions that have a high concentration of the dispersed phase. The propagation of regions of high density can be studied by examining the topology of isosurfaces of the density data. We present a topology-based approach to track the splitting and merging events of these regions using the Reeb graphs. Time is used as the third dimension in addition to two-dimensional (2D) point-based simulation data. Due to low time resolution of the input data set, a physics-based interpolation scheme is required in order to improve the accuracy of the proposed topology tracking method. The model used for interpolation produces a smooth time-dependent density field by applying Lagrangian-based advection to the given simulated point cloud data, conforming to the physical laws of flow evolution. Using the Reeb graph, the spatial and temporal locations of bifurcation and merging events can be readily identified supporting in-depth analysis of the extraction process.

Keywords: Multi-phase fluid, Level set, Topology method, Point-based multi-fluid simulation

The CommonGround visual paradigm for biosurveillance
Y. Livnat, E. Jurrus, A.V. Gundlapalli, P. Gestland. In Proceedings of the 2013 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 352--357. 2013.
ISBN: 978-1-4673-6214-6
DOI: 10.1109/ISI.2013.6578857

Biosurveillance is a critical area in the intelligence community for real-time detection of disease outbreaks. Identifying epidemics enables analysts to detect and monitor disease outbreak that might be spread from natural causes or from possible biological warfare attacks. Containing these events and disseminating alerts requires the ability to rapidly find, classify and track harmful biological signatures. In this paper, we describe a novel visual paradigm to conduct biosurveillance using an Infectious Disease Weather Map. Our system provides a visual common ground in which users can view, explore and discover emerging concepts and correlations such as symptoms, syndromes, pathogens and geographic locations.

Keywords: biosurveillance, visualization, interactive exploration, situational awareness

Uncertainty Visualization in Forward and Inverse Cardiac Models
B. Burton, B. Erem, K. Potter, P. Rosen, C.R. Johnson, D. Brooks, R.S. Macleod. In Computing in Cardiology CinC, pp. 57--60. 2013.
ISSN: 2325-8861

Quantification and visualization of uncertainty in cardiac forward and inverse problems with complex geometries is subject to various challenges. Specific to visualization is the observation that occlusion and clutter obscure important regions of interest, making visual assessment difficult. In order to overcome these limitations in uncertainty visualization, we have developed and implemented a collection of novel approaches. To highlight the utility of these techniques, we evaluated the uncertainty associated with two examples of modeling myocardial activity. In one case we studied cardiac potentials during the repolarization phase as a function of variability in tissue conductivities of the ischemic heart (forward case). In a second case, we evaluated uncertainty in reconstructed activation times on the epicardium resulting from variation in the control parameter of Tikhonov regularization (inverse case). To overcome difficulties associated with uncertainty visualization, we implemented linked-view windows and interactive animation to the two respective cases. Through dimensionality reduction and superimposed mean and standard deviation measures over time, we were able to display key features in large ensembles of data and highlight regions of interest where larger uncertainties exist.

Evaluation of Interactive Visualization on Mobile Computing Platforms for Selection of Deep Brain Stimulation Parameters
C. Butson, G. Tamm, S. Jain, T. Fogal, J. Krüger. In IEEE Transactions on Visualization and Computer Graphics, Vol. 19, No. 1, pp. 108--117. January, 2013.
DOI: 10.1109/TVCG.2012.92
PubMed ID: 22450824

In recent years there has been significant growth in the use of patient-specific models to predict the effects of neuromodulation therapies such as deep brain stimulation (DBS). However, translating these models from a research environment to the everyday clinical workflow has been a challenge, primarily due to the complexity of the models and the expertise required in specialized visualization software. In this paper, we deploy the interactive visualization system ImageVis3D Mobile , which has been designed for mobile computing devices such as the iPhone or iPad, in an evaluation environment to visualize models of Parkinson’s disease patients who received DBS therapy. Selection of DBS settings is a significant clinical challenge that requires repeated revisions to achieve optimal therapeutic response, and is often performed without any visual representation of the stimulation system in the patient. We used ImageVis3D Mobile to provide models to movement disorders clinicians and asked them to use the software to determine: 1) which of the four DBS electrode contacts they would select for therapy; and 2) what stimulation settings they would choose. We compared the stimulation protocol chosen from the software versus the stimulation protocol that was chosen via clinical practice (independently of the study). Lastly, we compared the amount of time required to reach these settings using the software versus the time required through standard practice. We found that the stimulation settings chosen using ImageVis3D Mobile were similar to those used in standard of care, but were selected in drastically less time. We show how our visualization system, available directly at the point of care on a device familiar to the clinician, can be used to guide clinical decision making for selection of DBS settings. In our view, the positive impact of the system could also translate to areas other than DBS.

Keywords: Biomedical and Medical Visualization, Mobile and Ubiquitous Visualization, Computational Model, Clinical Decision Making, Parkinson’s Disease, SciDAC, ImageVis3D

Contour Boxplots: A Method for Characterizing Uncertainty in Feature Sets from Simulation Ensembles
R.T. Whitaker, M. Mirzargar, R.M. Kirby. In IEEE Transactions on Visualization and Computer Graphics, Vol. 19, No. 12, pp. 2713--2722. December, 2013.
DOI: 10.1109/TVCG.2013.143
PubMed ID: 24051838

Ensembles of numerical simulations are used in a variety of applications, such as meteorology or computational solid mechanics, in order to quantify the uncertainty or possible error in a model or simulation. Deriving robust statistics and visualizing the variability of an ensemble is a challenging task and is usually accomplished through direct visualization of ensemble members or by providing aggregate representations such as an average or pointwise probabilities. In many cases, the interesting quantities in a simulation are not dense fields, but are sets of features that are often represented as thresholds on physical or derived quantities. In this paper, we introduce a generalization of boxplots, called contour boxplots, for visualization and exploration of ensembles of contours or level sets of functions. Conventional boxplots have been widely used as an exploratory or communicative tool for data analysis, and they typically show the median, mean, confidence intervals, and outliers of a population. The proposed contour boxplots are a generalization of functional boxplots, which build on the notion of data depth. Data depth approximates the extent to which a particular sample is centrally located within its density function. This produces a center-outward ordering that gives rise to the statistical quantities that are essential to boxplots. Here we present a generalization of functional data depth to contours and demonstrate methods for displaying the resulting boxplots for two-dimensional simulation data in weather forecasting and computational fluid dynamics.

ManyVis: Multiple Applications in an Integrated Visualization Environment
A. Rungta, B. Summa, D. Demir, P.-T. Bremer, V. Pascucci. In IEEE Transactions on Visualization and Computer Graphics (TVCG), Vol. 19, No. 12, pp. 2878--2885. December, 2013.

As the visualization field matures, an increasing number of general toolkits are developed to cover a broad range of applications. However, no general tool can incorporate the latest capabilities for all possible applications, nor can the user interfaces and workflows be easily adjusted to accommodate all user communities. As a result, users will often chose either substandard solutions presented in familiar, customized tools or assemble a patchwork of individual applications glued through ad-hoc scripts and extensive, manual intervention. Instead, we need the ability to easily and rapidly assemble the best-in-task tools into custom interfaces and workflows to optimally serve any given application community. Unfortunately, creating such meta-applications at the API or SDK level is difficult, time consuming, and often infeasible due to the sheer variety of data models, design philosophies, limits in functionality, and the use of closed commercial systems. In this paper, we present the ManyVis framework which enables custom solutions to be built both rapidly and simply by allowing coordination and communication across existing unrelated applications. ManyVis allows users to combine software tools with complementary characteristics into one virtual application driven by a single, custom-designed interface.

2D Vector Field Simplification Based on Robustness
SCI Technical Report, P. Skraba, Bei Wang, G. Chen, P. Rosen. No. UUSCI-2013-004, SCI Institute, University of Utah, 2013.

Vector field simplification aims to reduce the complexity of the flow by removing features in order of their relevance and importance, to reveal prominent behavior and obtain a compact representation for interpretation. Most existing simplification techniques based on the topological skeleton successively remove pairs of critical points connected by separatrices using distance or area-based relevance measures. These methods rely on the stable extraction of the topological skeleton, which can be difficult due to instability in numerical integration, especially when processing highly rotational flows. Further, the distance and area-based metrics are used to determine the cancellation ordering of features from a geometric point of view. Specifically, these metrics do not consider the flow magnitude, which is an important physical property of the flow. In this paper, we propose a novel simplification scheme derived from the recently introduced topological notion of robustness, which provides a complementary flow structure hierarchy to the traditional topological skeleton-based approach. Robustness enables the pruning of sets of critical points according to a quantitative measure of their stability, that is, the minimum amount of vector field perturbation required to remove them within a local neighborhood. This leads to a natural hierarchical simplification scheme with more physical consideration than purely topological-skeleton-based methods. Such a simplification does not depend on the topological skeleton of the vector field and therefore can handle more general situations (e.g. centers and pairs not connected by separatrices). We also provide a novel simplification algorithm based on degree theory with fewer restrictions and so can handle more general boundary conditions. We provide an implementation under the piecewise-linear setting and apply it to both synthetic and real-world datasets.

Rethinking Abstractions for Big Data: Why, Where, How, and What
M. Hall, R.M. Kirby, F. Li, M.D. Meyer, V. Pascucci, J.M. Phillips, R. Ricci, J. Van der Merwe, S. Venkatasubramanian. In Cornell University Library, 2013.

Big data refers to large and complex data sets that, under existing approaches, exceed the capacity and capability of current compute platforms, systems software, analytical tools and human understanding [7]. Numerous lessons on the scalability of big data can already be found in asymptotic analysis of algorithms and from the high-performance computing (HPC) and applications communities. However, scale is only one aspect of current big data trends; fundamentally, current and emerging problems in big data are a result of unprecedented complexity |in the structure of the data and how to analyze it, in dealing with unreliability and redundancy, in addressing the human factors of comprehending complex data sets, in formulating meaningful analyses, and in managing the dense, power-hungry data centers that house big data.

The computer science solution to complexity is finding the right abstractions, those that hide as much triviality as possible while revealing the essence of the problem that is being addressed. The "big data challenge" has disrupted computer science by stressing to the very limits the familiar abstractions which define the relevant subfields in data analysis, data management and the underlying parallel systems. Efficient processing of big data has shifted systems towards increasingly heterogeneous and specialized units, with resilience and energy becoming important considerations. The design and analysis of algorithms must now incorporate emerging costs in communicating data driven by IO costs, distributed data, and the growing energy cost of these operations. Data analysis representations as structural patterns and visualizations surpass human visual bandwidth, structures studied at small scale are rare at large scale, and large-scale high-dimensional phenomena cannot be reproduced at small scale.

As a result, not enough of these challenges are revealed by isolating abstractions in a traditional soft-ware stack or standard algorithmic and analytical techniques, and attempts to address complexity either oversimplify or require low-level management of details. The authors believe that the abstractions for big data need to be rethought, and this reorganization needs to evolve and be sustained through continued cross-disciplinary collaboration.

In what follows, we first consider the question of why big data and why now. We then describe the where (big data systems), the how (big data algorithms), and the what (big data analytics) challenges that we believe are central and must be addressed as the research community develops these new abstractions. We equate the biggest challenges that span these areas of big data with big mythological creatures, namely cyclops, that should be conquered.

Adaptive Sampling with Topological Scores
D. Maljovec, Bei Wang, A. Kupresanin, G. Johannesson, V. Pascucci, P.-T. Bremer. In Int. J. Uncertainty Quantification, Vol. 3, No. 2, Begell House, pp. 119--141. 2013.
DOI: 10.1615/int.j.uncertaintyquantification.2012003955

Understanding and describing expensive black box functions such as physical simulations is a common problem in many application areas. One example is the recent interest in uncertainty quantification with the goal of discovering the relationship between a potentially large number of input parameters and the output of a simulation. Typically, the simulation of interest is expensive to evaluate and thus the sampling of the parameter space is necessarily small. As a result choosing a "good" set of samples at which to evaluate is crucial to glean as much information as possible from the fewest samples. While space-filling sampling designs such as Latin hypercubes provide a good initial cover of the entire domain, more detailed studies typically rely on adaptive sampling: Given an initial set of samples, these techniques construct a surrogate model and use it to evaluate a scoring function which aims to predict the expected gain from evaluating a potential new sample. There exist a large number of different surrogate models as well as different scoring functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling using four popular regression models combined with six traditional scoring functions compared against a space-filling design. Furthermore, for a single high-dimensional output function, we introduce a new class of scoring functions based on global topological rather than local geometric information. The new scoring functions are competitive in terms of the root mean squared prediction error but are expected to better recover the global topological structure. Our experiments suggest that the most common point of failure of adaptive sampling schemes are ill-suited regression models. Nevertheless, even given well-fitted surrogate models many scoring functions fail to outperform a space-filling design.

Exploration of High-Dimensional Scalar Function for Nuclear Reactor Safety Analysis and Visualization
D. Maljovec, Bei Wang, V. Pascucci, P.-T. Bremer, M.Pernice, D. Mandelli, R. Nourgaliev. In Proceedings of the 2013 International Conference on Mathematics and Computational Methods Applied to Nuclear Science & Engineering (M&C), pp. 712-723. 2013.

The next generation of methodologies for nuclear reactor Probabilistic Risk Assessment (PRA) explicitly accounts for the time element in modeling the probabilistic system evolution and uses numerical simulation tools to account for possible dependencies between failure events. The Monte-Carlo (MC) and the Dynamic Event Tree (DET) approaches belong to this new class of dynamic PRA methodologies. A challenge of dynamic PRA algorithms is the large amount of data they produce which may be difficult to visualize and analyze in order to extract useful information. We present a software tool that is designed to address these goals. We model a large-scale nuclear simulation dataset as a high-dimensional scalar function defined over a discrete sample of the domain. First, we provide structural analysis of such a function at multiple scales and provide insight into the relationship between the input parameters and the output. Second, we enable exploratory analysis for users, where we help the users to differentiate features from noise through multi-scale analysis on an interactive platform, based on domain knowledge and data characterization. Our analysis is performed by exploiting the topological and geometric properties of the domain, building statistical models based on its topological segmentations and providing interactive visual interfaces to facilitate such explorations. We provide a user's guide to our software tool by highlighting its analysis and visualization capabilities, along with a use case involving data from a nuclear reactor safety simulation.

Keywords: high-dimensional data analysis, computational topology, nuclear reactor safety analysis, visualization