A. Humphrey, M. Berzins.
An Evaluation of An Asynchronous Task Based Dataflow Approach For Uintah, In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2, pp. 652-657. July, 2019.
The challenge of running complex physics code on the largest computers available has led to dataflow paradigms being explored. While such approaches are often applied at smaller scales, the challenge of extreme-scale data flow computing remains. The Uintah dataflow framework has consistently used dataflow computing at the largest scales on complex physics applications. At present Uintah contains two main dataflow models. Both are based upon asynchronous communication. One uses a static graph-based approach with asynchronous communication and the other uses a more dynamic approach that was introduced almost a decade ago. Subsequent changes within the Uintah runtime system combined with many more large scale experiments, has necessitated a reevaluation of these two approaches, comparing them in the context of large scale problems. While the static approach has worked well for some large-scale simulations, the dynamic approach is seen to offer performance improvements over the static case for a challenging fluid-structure interaction problem at large scale that involves fluid flow and a moving solid represented using particle method on an adaptive mesh.
Deep brain stimulation (DBS) can be an effective therapy for tics and comorbidities in select cases of severe, treatment-refractory Tourette syndrome (TS). Clinical responses remain variable across patients, which may be attributed to differences in the location of the neuroanatomical regions being stimulated. We evaluated active contact locations and regions of stimulation across a large cohort of patients with TS in an effort to guide future targeting.
B. Peterson. Portable and Performant GPU/Heterogeneous Asynchronous Many-task Runtime System, Subtitled Ph.D. Dissertation, University of Utah, School of Computing, Dec, 2019.
Asynchronous many-task (AMT) runtimes are maturing as a model for computing simulations on a diverse range of architectures at large-scale. The Uintah AMT framework is driven by a philosophy of maintaining an application layer distinct from the underlying runtime while operating on an adaptive mesh grid. This model has enabled task devel-opers to focus on writing task code while minimizing their interaction with MPI transfers, halo processing, data stores, coherency of simulation variables, and proper ordering of task execution. Further, Uintah is implementing an architecture portable solution by utilizing the Kokkos programming portability layer so that application tasks can be written in one codebase and performantly executed on CPUs, GPUs, Intel Xeon Phis, and other future architectures.
Of these architectures, it is perhaps Nvidia GPUs that introduce the greatest usability and portability challenges for AMT runtimes. Specifically, Nvidia GPUs require code to adhere to a proprietary programming model, use separate high capacity memory, utilize asynchrony of data movement and execution, and partition execution units among many streaming multiprocessors. Numerous novel solutions to both Uintah and Kokkos are required to abstract these GPU features into an AMT runtime while preserving an appli-cation layer and enabling portability.
The focus of this AMT research is largely split into two main parts, performance and portability. Runtime performance comes from 1) minimizing runtime overhead when preparing simulation variables for tasks prior to execution, and 2) executing a hetero-geneous mixture of tasks to keep compute node processing units busy. Preparation of simulation variables, especially halo processing, receives significant emphasis as Uintah’s target problems heavily rely on local and global halos. In addition, this work covers automated data movement of simulation variables between host and GPU memory as well as distributing tasks throughout a GPU for execution.
Portability is a productivity necessity as application developers struggle to maintain three sets of code per task, namely code for single CPU core execution, CUDA code for GPU tasks, and a third set of code for Xeon Phi parallel execution. Programming portability layers, such as Kokkos, provide a framework for this portability, however, Kokkos itself requires modifications to support GPU execution of finer grained tasks typical of AMT runtimes like Uintah. Currently, Kokkos GPU parallel loop execution is bulk-synchronous. This research demonstrates a model for portable loops that is asynchronous, nonblocking, and performant. Additionally, integrating GPU portability into Uintah required additional modifications to aid the application developer in avoiding Kokkos specific details.
This research concludes by demonstrating a GPU-enabled AMT runtime that is both performant and portable. Further, application developers are not burdened with additional architecture specific requirements. Results are demonstrated using production task codebases written for CPUs, GPUs, and Kokkos portability and executed in GPU homogeneous and CPU/GPU heterogeneous environments.
D. Sahasrabudhe, M. Berzins, J. Schmidt.
Node failure resiliency for Uintah without checkpointing, In Concurrency and Computation: Practice and Experience, pp. e5340. 2019.
The frequency of failures in upcoming exascale supercomputers may well be greater than at present due to many-core architectures if component failure rates remain unchanged. This potential increase in failure frequency coupled with I/O challenges at exascale may prove problematic for current resiliency approaches such as checkpoint restarting, although the use of fast intermediate memory may help. Algorithm-Based Fault Tolerance (ABFT) using Adaptive Mesh Refinement (AMR) is one resiliency approach used to address these challenges. For adaptive mesh codes, a coarse mesh version of the solution may be used to restore the fine mesh solution. This paper addresses the implementation of the ABFT approach within the Uintah software framework: both at a software level within Uintah and in the data reconstruction method used for the recovery of lost data. This method has two problems: inaccuracies introduced during the reconstruction propagate forward in time, and the physical consistency of variables such as positivity or boundedness may be violated during interpolation. These challenges can be addressed by the combination of two techniques: 1. a fault-tolerant MPI implementation to recover from runtime node failures, and 2. high-order interpolation schemes to preserve the physical solution and reconstruct lost data. The approach considered here uses a "Limited Essentially Non-Oscillatory" (LENO) scheme along with AMR to rebuild the lost data without checkpointing using Uintah. Experiments were carried out using a fault-tolerant MPI - ULFM to recover from runtime failure, and LENO to recover data on patches belonging to failed ranks, while the simulation was continued to the end. Results show that this ABFT approach is up to 10x faster than the traditional checkpointing method. The new interpolation approach is more accurate than linear interpolation and not subject to the overshoots found in other interpolation methods.
D. Sahasrabudhe, E. T. Phipps, S. Rajamanickam, M. Berzins. A Portable SIMD Primitive using Kokkos for Heterogeneous Architectures, In Sixth Workshop on Accelerator Programming Using Directives (WACCPD), 2019.
As computer architectures are rapidly evolving (e.g. those designed for exascale), multiple portability frameworks have been developed to avoid new architecture-specific development and tuning. However, portability frameworks depend on compilers for auto-vectorization and may lack support for explicit vectorization on heterogeneous platforms. Alternatively, programmers can use intrinsics-based primitives to achieve more efficient vectorization, but the lack of a gpu back-end for these primitives makes such code non-portable. A unified, portable, Single Instruction Multiple Data (simd) primitive proposed in this work, allows intrinsics-based vectorization on cpus and many-core architectures such as Intel Knights Landing (knl), and also facilitates Single Instruction Multiple Threads (simt) based execution on gpus. This unified primitive, coupled with the Kokkos portability ecosystem, makes it possible to develop explicitly vectorized code, which is portable across heterogeneous platforms. The new simd primitive is used on different architectures to test the performance boost against hard-to-auto-vectorize baseline, to measure the overhead against efficiently vectroized baseline, and to evaluate the new feature called the \logical vector length" (lvl). The simd primitive provides portability across cpus and gpus without any performance degradation being observed experimentally.
As application scientists develop and deploy simulation codes on to leadership-class computing resources, there is a need to instrument these codes to better understand performance to efficiently utilize these resources. This instrumentation may come from independent third-party tools that generate and store performance metrics or from custom instrumentation tools built directly into the application. The metrics collected are then available for visual analysis, typically in the domain in which there were collected. In this paper, we introduce an approach to visualize and analyze the performance metrics in situ in the context of the machine, application, and communication domains (MAC model) using a single visualization tool. This visualization model provides a holistic view of the application performance in the context of the resources where it is executing.
G. S. Smith, K. A. Mills, G. M. Pontone, W. S. Anderson, K. M. Perepezko, J. Brasic, Y. Zhou, J. Brandt, C. R. Butson, D. P. Holt, W. B. Mathews, R. F. Dannals, D. F. Wong, Z. Mari. Effect of STN DBS on vesicular monoamine transporter 2 and glucose metabolism in Parkinson's disease, In Parkinsonism and Related Disorders, Elsevier, 2019.
Deep brain stimulation (DBS) is an established treatment Parkinson's Disease (PD). Despite the improvement of motor symptoms in most patients by sub-thalamic nucleus (STN) DBS and its widespread use, the neurobiological mechanisms are not completely understood. The objective of the present study was to elucidate the effects of STN DBS in PD on the dopamine system and neural circuitry employing high-resolution positron emission tomography (PET) imaging. The hypotheses tested were that STN DBS would decrease striatal VMAT2, secondary to an increase in dopamine concentrations, and would decrease striatal cerebral metabolism and increase cortical metabolism.
Q. Tran, M. Berzins, W. Solowski. An improved moving least squares method for the Material Point Method, In Proceedings of the 2nd International Conference on the Material Point Method for Modelling Soil-Water-Structure Interaction (MPM 2019), 2019.
The paper presents an improved moving least squares reconstruction technique for the Material Point Method. The moving least squares reconstruction(MLS)can improve spatial accuracy in simulations involving large deformations. However, the MLS algorithm relies on computing the inverse of the moment matrix.This is both expensive and potentially unstable when there are not enough material points to reconstruct the high-order least squares function, which leads to a singular or an ill-conditioned matrix. The shown formulation can overcome this limitation while retain the same order of accuracy compared with the conventional moving least squares reconstruction.Numerical experiments demonstrate the improvements in the accuracy and comparison with the original Material Point Method and the Convected Particles Domain Interpolation method.
Q. A. Tran, W. Sołowski, M. Berzins, J. Guilkey. A convected particle least square interpolation material point method, In International Journal for Numerical Methods in Engineering, Wiley, October, 2019.
Applying the convected particle domain interpolation (CPDI) to the material point method has many advantages over the original material point method, including significantly improved accuracy. However, in the large deformation regime, the CPDI still may not retain the expected convergence rate. The paper proposes an enhanced CPDI formulation based on least square reconstruction technique. The convected particle least square interpolation (CPLS) material point method assumes the velocity field inside the material point domain as nonconstant. This velocity field in the material point domain is mapped to the background grid nodes with a moving least squares reconstruction. In this paper, we apply the improved moving least squares method to avoid the instability of the conventional moving least squares method due to a singular matrix. The proposed algorithm can improve convergence rate, as illustrated by numerical examples using the method of manufactured solutions.
W. Usher, I. Wald, J. Amstutz, J. Gunther, C. Brownlee, V. Pascucci. Scalable Ray Tracing Using the Distributed FrameBuffer, In Eurographics Conference on Visualization (EuroVis) 2019, Vol. 38, No. 3, 2019.
Image- and data-parallel rendering across multiple nodes on high-performance computing systems is widely used in visualization to provide higher frame rates, support large data sets, and render data in situ. Specifically for in situ visualization, reducing bottlenecks incurred by the visualization and compositing is of key concern to reduce the overall simulation runtime. Moreover, prior algorithms have been designed to support either image- or data-parallel rendering and impose restrictions on the data distribution, requiring different implementations for each configuration. In this paper, we introduce the Distributed FrameBuffer, an asynchronous image-processing framework for multi-node rendering. We demonstrate that our approach achieves performance superior to the state of the art for common use cases, while providing the flexibility to support a wide range of parallel rendering algorithms and data distributions. By building on this framework, we extend the open-source ray tracing library OSPRay with a data-distributed API, enabling its use in data-distributed and in situ visualization applications.
A Retrospective Evaluation of Automated Optimization of Deep Brain Stimulation Settings, In Brain Stimulation, Vol. 12, No. 2, Elsevier, pp. e54--e55. March, 2019.
Influence of head tissue conductivity uncertainties on EEG dipole reconstruction, In Frontiers in Neuroscience, 2019.
Reliable EEG source analysis depends on sufficiently detailed and accurate head models. In this study, we investigate how uncertainties inherent to the experimentally determined conductivity values of the different conductive compartments influence the results of EEG source analysis. In a single source scenario, the superficial and focal somatosensory P20/N20 component, we analyze the influence of varying conductivities on dipole reconstructions using a generalized polynomial chaos (gPC) approach. We find that in particular the conductivity uncertainties for skin and skull have a significant influence on the EEG inverse solution, leading to variations in source localization by several centimeters. The conductivity uncertainties for gray and white matter were found to have little influence on the source localization, but a strong influence on the strength and orientation of the reconstructed source, respectively. As the CSF conductivity is most accurately determined of all conductivities in a realistic head model, CSF conductivity uncertainties had a negligible influence on the source reconstruction. This small uncertainty is a further benefit of distinguishing the CSF in realistic volume conductor models.
Objective: We performed a retrospective analysis of an optimization algorithm for the computation of patient-specific multipolar stimulation configurations employing multiple independent current/voltage sources. We evaluated whether the obtained stimulation configurations align with clinical data and whether the optimized stimulation configurations have the potential to lead to an equal or better stimulation of the target region as manual programming, while reducing the time required for programming sessions. Methods: For three patients (five electrodes) diagnosed with essential tremor, we derived optimized multipolar stimulation configurations using an approach that is suitable for the application in clinical practice. To evaluate the automatically derived stimulation settings, we compared them to the results of the monopolar review. Results: We observe a good agreement between the findings of the monopolar review and the optimized stimulation configurations, with the algorithm assigning the maximal voltage in the optimized multipolar pattern to the contact that was found to lead to the best therapeutic effect in the clinical monopolar review in all cases. Additionally, our simulation results predict that the optimized stimulation settings lead to the activation of an equal or larger volume fraction of the target compared to the manually determined settings in all cases. Conclusions: Our results demonstrate the feasibility of an automatic determination of optimal DBS configurations and motivate a further evaluation of the applied optimization algorithm.
Interactive computation and visualization of deep brain stimulation effects using Duality, In Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, Taylor & Francis, 2019.J. Vorwerk, D. McCann, J. Krüger, C.R. Butson.
Deep brain stimulation (DBS) is an established treatment for movement disorders such as Parkinson’s disease or essential tremor. Currently, the selection of optimal stimulation settings is performed by iteratively adjusting the stimulation parameters and is a time consuming procedure that requires multiple clinic visits of several hours. Recently, computational models to predict and visualize the effect of DBS have been developed with the goal to simplify and accelerate this procedure by providing visual guidance and such models have been made available also on mobile devices. However, currently available visualization software still either lacks mobility, i.e. it is running on desktop computers and no easily available in clinical praxis, or flexibility, as the simulations that are visualized on mobile devices have to be precomputed. The goal of the pipeline presented in this paper is to close this gap: Using Duality, a newly developed software for the interactive visualization of simulation results, we implemented a pipeline that allows to compute DBS simulations in near-real time and instantaneously visualize the result on a tablet computer. We carry out a performance analysis and present the results of a case study in which the pipeline was applied.
Adaptive mesh refinement (AMR) is a key technology for large-scale simulations that allows for adaptively changing the simulation mesh resolution, resulting in significant computational and storage savings. However, visualizing such AMR data poses a significant challenge due to the difficulties introduced by the hierarchical representation when reconstructing continuous field values. In this paper, we detail a comprehensive solution for interactive isosurface rendering of block-structured AMR data. We contribute a novel reconstruction strategy—the octant method—which is continuous, adaptive and simple to implement. Furthermore, we present a generally applicable hybrid implicit isosurface ray-tracing method, which provides better rendering quality and performance than the built-in sampling-based approach in OSPRay. Finally, we integrate our octant method and hybrid isosurface geometry into OSPRay as a module, providing the ability to create high-quality interactive visualizations combining volume and isosurface representations of BS-AMR data. We evaluate the rendering performance, memory consumption and quality of our method on two gigascale block-structured AMR datasets.
A. Warner, J. Tate, B. Burton,, C.R. Johnson.
A High-Resolution Head and Brain Computer Model for Forward and Inverse EEG Simulation, In bioRxiv, Cold Spring Harbor Laboratory, Feb, 2019.
To conduct computational forward and inverse EEG studies of brain electrical activity, researchers must construct realistic head and brain computer models, which is both challenging and time consuming. The availability of realistic head models and corresponding imaging data is limited in terms of imaging modalities and patient diversity. In this paper, we describe a detailed head modeling pipeline and provide a high-resolution, multimodal, open-source, female head and brain model. The modeling pipeline specifically outlines image acquisition, preprocessing, registration, and segmentation; three-dimensional tetrahedral mesh generation; finite element EEG simulations; and visualization of the model and simulation results. The dataset includes both functional and structural images and EEG recordings from two high-resolution electrode configurations. The intermediate results and software components are also included in the dataset to facilitate modifications to the pipeline. This project will contribute to neuroscience research by providing a high-quality dataset that can be used for a variety of applications and a computational pipeline that may help researchers construct new head models more efficiently.
L. Zhou, D. Weiskopf, C. R. Johnson.
Perceptually guided contrast enhancement based on viewing distance, In Journal of Computer Languages, Vol. 55, Elsevier, pp. 100911. 2019.
We propose an image-space contrast enhancement method for color-encoded visualization. The contrast of an image is enhanced through a perceptually guided approach that interfaces with the user with a single and intuitive parameter of the virtual viewing distance. To this end, we analyze a multiscale contrast model of the input image and test the visibility of bandpass images of all scales at a virtual viewing distance. By adapting weights of bandpass images with a threshold model of spatial vision, this image-based method enhances contrast to compensate for contrast loss caused by viewing the image at a certain distance. Relevant features in the color image can be further emphasized by the user using overcompensation. The weights can be assigned with a simple band-based approach, or with an efficient pixel-based approach that reduces ringing artifacts. The method is efficient and can be integrated into any visualization tool as it is a generic image-based post-processing technique. Using highly diverse datasets, we show the usefulness of perception compensation across a wide range of typical visualizations.
In this paper, we propose a perceptually-guided visualization sharpening technique.We analyze the spectral behavior of an established comprehensive perceptual model to arrive at our approximated model based on an adapted weighting of the bandpass images from a Gaussian pyramid. The main benefit of this approximated model is its controllability and predictability for sharpening color-mapped visualizations. Our method can be integrated into any visualization tool as it adopts generic image-based post-processing, and it is intuitive and easy to use as viewing distance is the only parameter. Using highly diverse datasets, we show the usefulness of our method across a wide range of typical visualizations.
O. Abdullah, L. Dai, J. Tippetts, B. Zimmerman, A. Van Hoek, S. Joshi, E. Hsu.
High resolution and high field diffusion MRI in the visual system of primates (P3.086), In Neurology, Vol. 90, No. 15 Supplement, Wolters Kluwer Health, Inc, 2018.
Objective: Establishing a primate multiscale genetic brain network linking key microstructural brain components to social behavior remains an elusive goal.
Background: Diffusion MRI, which quantifies the magnitude and anisotropy of water diffusion in brain tissues, offers unparalleled opportunity to link the macroconnectome (resolution of ~0.5mm) to histological-based microconnectome at synaptic resolution.
Design/Methods: We tested the hypothesis that the simplest (and most clinically-used) reconstruction technique (known as diffusion tensor imaging, DTI) will yield similar brain connectivity patterns in the visual system (from optic chiasm to visual cortex) compared to more sophisticated and accurate reconstruction methods including diffusion spectrum imaging (DSI), q-ball imaging (QBI), and generalized q-sampling imaging. We obtained high resolution diffusion MRI data on ex vivo brain from Macaca fascicularis: MRI 7T, resolution 0.5 mm isotropic, 515 diffusion volumes up to b-value (aka diffusion sensitivity) of 40,000 s/mm2 with scan time ~100 hrs.
Results: Tractography results show that despite the limited ability of DTI to resolve crossing fibers at the optic chiasm, DTI-based tracts mapped to the known projections of layers in lateral geniculate nucleus and to the primary visual cortex. The other reconstructions were superior in localized regions for resolving crossing regions.
Conclusions: In conclusion, despite its simplifying assumptions, DTI-based fiber tractography can be used to generate accurate brain connectivity maps that conform to established neuroanatomical features in the visual system.
K. A. Aiello, S. P. Ponnapalli, O. Alter.
Mathematically universal and biologically consistent astrocytoma genotype encodes for transformation and predicts survival phenotype, In APL Bioengineering, Vol. 2, No. 3, AIP Publishing, pp. 031909. September, 2018.
DNA alterations have been observed in astrocytoma for decades. A copy-number genotype predictive of a survival phenotype was only discovered by using the generalized singular value decomposition (GSVD) formulated as a comparative spectral decomposition. Here, we use the GSVD to compare whole-genome sequencing (WGS) profiles of patient-matched astrocytoma and normal DNA. First, the GSVD uncovers a genome-wide pattern of copy-number alterations, which is bounded by patterns recently uncovered by the GSVDs of microarray-profiled patient-matched glioblastoma (GBM) and, separately, lower-grade astrocytoma and normal genomes. Like the microarray patterns, the WGS pattern is correlated with an approximately one-year median survival time. By filling in gaps in the microarray patterns, the WGS pattern reveals that this biologically consistent genotype encodes for transformation via the Notch together with the Ras and Shh pathways. Second, like the GSVDs of the microarray profiles, the GSVD of the WGS profiles separates the tumor-exclusive pattern from normal copy-number variations and experimental inconsistencies. These include the WGS technology-specific effects of guanine-cytosine content variations across the genomes that are correlated with experimental batches. Third, by identifying the biologically consistent phenotype among the WGS-profiled tumors, the GBM pattern proves to be a technology-independent predictor of survival and response to chemotherapy and radiation, statistically better than the patient's age and tumor's grade, the best other indicators, and MGMT promoter methylation and IDH1 mutation. We conclude that by using the complex structure of the data, comparative spectral decompositions underlie a mathematically universal description of the genotype-phenotype relations in cancer that other methods miss.