Scalable Large-scale Fluid-structure Interaction Solvers in the Uintah Framework via Hybrid Task-based Parallelism Algorithms|
SCI Technical Report, Q. Meng, M. Berzins. No. UUSCI-2012-004, SCI Institute, University of Utah, 2012.
Uintah is a software framework that provides an environment for solving fluid-structure interaction problems on structured adaptive grids on large-scale science and engineering problems involving the solution of partial differential equations. Uintah uses a combination of fluid-flow solvers and particle-based methods for solids, together with adaptive meshing and a novel asynchronous task-based approach with fully automated load balancing. When applying Uintah to fluid-structure interaction problems with mesh refinement, the combination of adaptive meshing and the movement of structures through space present a formidable challenge in terms of achieving scalability on large-scale parallel computers. With core counts per socket continuing to grow along with the prospect of less memory per core, adopting a model that uses MPI to communicate between nodes and a shared memory model on-node is one approach to achieve scalability at full machine capacity on current and emerging large-scale systems. For this approach to be successful, it is necessary to design data-structures that large numbers of cores can simultaneously access without contention. These data structures and algorithms must also be designed to avoid the overhead involved with locks and other synchronization primitives when running on large number of cores per node, as contention for acquiring locks quickly becomes untenable. This scalability challenge is addressed here for Uintah, by the development of new hybrid runtime and scheduling algorithms combined with novel lockfree data structures, making it possible for Uintah to achieve excellent scalability for a challenging fluid-structure problem with mesh refinement on as many as 260K cores.
Keywords: uintah, csafe
Large Scale Parallel Solution of Incompressible Flow Problems using Uintah and hypre|
SCI Technical Report, J. Schmidt, M. Berzins, J. Thornock, T. Saad, J. Sutherland. No. UUSCI-2012-002, SCI Institute, University of Utah, 2012.
The Uintah Software framework was developed to provide an environment for solving fluid-structure interaction problems on structured adaptive grids on large-scale, long-running, data-intensive problems. Uintah uses a combination of fluid-flow solvers and particle-based methods for solids together with a novel asynchronous task-based approach with fully automated load balancing. As Uintah is often used to solve compressible, low-Mach combustion applications, it is important to have a scalable linear solver. While there are many such solvers available, the scalability of those codes varies greatly. The hypre software offers a range of solvers and pre-conditioners for different types of grids. The weak scalability of Uintah and hypre is addressed for particular examples when applied to an incompressible flow problem relevant to combustion applications. After careful software engineering to reduce start-up costs, much better than expected weak scalability is seen for up to 100K cores on NSFs Kraken architecture and up to 200K+ cores, on DOEs new Titan machine.
Keywords: uintah, csafe
Biomedical Visual Computing: Case Studies and Challenges|
C.R. Johnson. In IEEE Computing in Science and Engineering, Vol. 14, No. 1, pp. 12--21. 2012.
PubMed ID: 22545005
PubMed Central ID: PMC3336198
Computer simulation and visualization are having a substantial impact on biomedicine and other areas of science and engineering. Advanced simulation and data acquisition techniques allow biomedical researchers to investigate increasingly sophisticated biological function and structure. A continuing trend in all computational science and engineering applications is the increasing size of resulting datasets. This trend is also evident in data acquisition, especially in image acquisition in biology and medical image databases.
For example, in a collaboration between neuroscientist Robert Marc and our research team at the University of Utah's Scientific Computing and Imaging (SCI) Institute (www.sci.utah.edu), we're creating datasets of brain electron microscopy (EM) mosaics that are 16 terabytes in size. However, while there's no foreseeable end to the increase in our ability to produce simulation data or record observational data, our ability to use this data in meaningful ways is inhibited by current data analysis capabilities, which already lag far behind. Indeed, as the NIH-NSF Visualization Research Challenges report notes, to effectively understand and make use of the vast amounts of data researchers are producing is one of the greatest scientific challenges of the 21st century.
Visual data analysis involves creating images that convey salient information about underlying data and processes, enabling the detection and validation of expected results while leading to unexpected discoveries in science. This allows for the validation of new theoretical models, provides comparison between models and datasets, enables quantitative and qualitative querying, improves interpretation of data, and facilitates decision making. Scientists can use visual data analysis systems to explore \"what if\" scenarios, define hypotheses, and examine data under multiple perspectives and assumptions. In addition, they can identify connections between numerous attributes and quantitatively assess the reliability of hypotheses. In essence, visual data analysis is an integral part of scientific problem solving and discovery.
As applied to biomedical systems, visualization plays a crucial role in our ability to comprehend large and complex data-data that, in two, three, or more dimensions, convey insight into many diverse biomedical applications, including understanding neural connectivity within the brain, interpreting bioelectric currents within the heart, characterizing white-matter tracts by diffusion tensor imaging, and understanding morphology differences among different genetic mice phenotypes.
Status of Release of the Uintah Computational Framework|
SCI Technical Report, M. Berzins. No. UUSCI-2012-001, SCI Institute, University of Utah, 2012.
This report provides a summary of the status of the Uintah Computation Framework (UCF) software. Uintah is uniquely equipped to tackle large-scale multi-physics science and engineering problems on disparate length and time scales. The Uintah framework makes it possible to run adaptive computations on modern HPC architectures with tens and now hundreds of thousands of cores with complex communication/memory hierarchies. Uintah was orignally developed in the University of Utah Center for Simulation of Accidental Fires and Explosions (C-SAFE), a DOE-funded academic alliance project and then extended to the broader NSF snd DOE science and engineering communities. As Uintah is applicable to a wide range of engineering problems that involve fl uid-structure interactions with highly deformable structures it is used for a number of NSF-funded and DOE engineering projects. In this report the Uintah framework software is outlined and typical applications are illustrated. Uintah is open-source software that is available through the MIT open-source license at http://www.uintah.utah.edu/.
Fast, Effective BVH Updates for Animated Scenes|
D. Kopta, T. Ize, J. Spjut, E. Brunvand, A. Davis, A. Kensler. In Proceedings of the Symposium on Interactive 3D Graphics and Games (I3D '12), pp. 197--204. 2012.
Bounding volume hierarchies (BVHs) are a popular acceleration structure choice for animated scenes rendered with ray tracing. This is due to the relative simplicity of refitting bounding volumes around moving geometry. However, the quality of such a refitted tree can degrade rapidly if objects in the scene deform or rearrange significantly as the animation progresses, resulting in dramatic increases in rendering times and a commensurate reduction in the frame rate. The BVH could be rebuilt on every frame, but this could take significant time. We present a method to efficiently extend refitting for animated scenes with tree rotations, a technique previously proposed for off-line improvement of BVH quality for static scenes. Tree rotations are local restructuring operations which can mitigate the effects that moving primitives have on BVH quality by rearranging nodes in the tree during each refit rather than triggering a full rebuild. The result is a fast, lightweight, incremental update algorithm that requires negligible memory, has minor update times, parallelizes easily, avoids significant degradation in tree quality or the need for rebuilding, and maintains fast rendering times. We show that our method approaches or exceeds the frame rates of other techniques and is consistently among the best options regardless of the animated scene.
Understanding Quasi-Periodic Fieldlines and Their Topology in Toroidal Magnetic Fields|
A.R. Sanderson, G. Chen, X. Tricoche, E. Cohen. In Topological Methods in Data Analysis and Visualization II, Edited by R. Peikert and H. Carr and H. Hauser and R. Fuchs, Springer, pp. 125--140. 2012.
Adaptive High-Order Discontinuous Galerkin Solution of Elastohydrodynamic Lubrication Point Contact Problems|
H. Lu, M. Berzins, C.E. Goodyer, P.K. Jimack. In Advances in Engineering Software, Vol. 45, No. 1, pp. 313--324. 2012.
This paper describes an adaptive implementation of a high order Discontinuous Galerkin (DG) method for the solution of elastohydrodynamic lubrication (EHL) point contact problems. These problems arise when modelling the thin lubricating film between contacts which are under sufficiently high pressure that the elastic deformation of the contacting elements cannot be neglected. The governing equations are highly nonlinear and include a second order partial differential equation that is derived via the thin-film approximation. Furthermore, the problem features a free boundary, which models where cavitation occurs, and this is automatically captured as part of the solution process. The need for spatial adaptivity stems from the highly variable length scales that are present in typical solutions. Results are presented which demonstrate both the effectiveness and the limitations of the proposed adaptive algorithm.
Keywords: Elastohydrodynamic lubrication, Discontinuous Galerkin, High polynomial degree, h-adaptivity, Nonlinear systems
DAG-Based Software Frameworks for PDEs|
M. Berzins, Q. Meng, J. Schmidt, J.C. Sutherland. In Proceedings of Euro-Par 2011 Workshops, Part I, Lecture Notes in Computer Science (LNCS) 7155, Springer-Verlag Berlin Heidelberg, pp. 324--333. August, 2012.
The task-based approach to software and parallelism is well-known and has been proposed as a potential candidate, named the silver model, for exascale software. This approach is not yet widely used in the large-scale multi-core parallel computing of complex systems of partial differential equations. After surveying task-based approaches we investigate how well the Uintah software and an extension named Wasatch fit in the task-based paradigm and how well they perform on large scale parallel computers. The conclusion is that these approaches show great promise for petascale but that considerable algorithmic challenges remain.
Keywords: DOD, Uintah, CSAFE
An optimization framework for inversely estimating myocardial transmembrane potentials and localizing ischemia|
D. Wang, R.M. Kirby, R.S. Macleod, C.R. Johnson. In Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), pp. 1680--1683. 2011.
PubMed ID: 22254648
PubMed Central ID: PMC3336368
By combining a static bidomain heart model with a torso conduction model, we studied the inverse electrocardiographic problem of computing the transmembrane potentials (TMPs) throughout the myocardium from a body-surface potential map, and then used the recovered potentials to localize myocardial ischemia. Our main contribution is solving the inverse problem within a constrained optimization framework, which is a generalization of previous methods for calculating transmembrane potentials. The framework offers ample flexibility for users to apply various physiologically-based constraints, and is well supported by mature algorithms and solvers developed by the optimization community. By avoiding the traditional inverse ECG approach of building the lead-field matrix, the framework greatly reduces computation cost and, by setting the associated forward problem as a constraint, the framework enables one to flexibly set individualized resolutions for each physical variable, a desirable feature for balancing model accuracy, ill-conditioning and computation tractability. Although the task of computing myocardial TMPs at an arbitrary time instance remains an open problem, we showed that it is possible to obtain TMPs with moderate accuracy during the ST segment by assuming all cardiac cells are at the plateau phase. Moreover, the calculated TMPs yielded a good estimate of ischemic regions, which was of more clinical interest than the voltage values themselves. We conducted finite element simulations of a phantom experiment over a 2D torso model with synthetic ischemic data. Preliminary results indicated that our approach is feasible and suitably accurate for the common case of transmural myocardial ischemia.
Cyber Science and Engineering: A Report of the National Science Foundation Advisory Committee for Cyberinfrastructure Task Force on Grand Challenges|
J.T. Oden, O. Ghattas, J.L. King, B.I. Schneider, K. Bartschat, F. Darema, J. Drake, T. Dunning, D. Estep, S. Glotzer, M. Gurnis, C.R. Johnson, D.S. Katz, D. Keyes, S. Kiesler, S. Kim, J. Kinter, G. Klimeck, C.W. McCurdy, R. Moser, C. Ott, A. Patra, L. Petzold, T. Schlick, K. Schulten, V. Stodden, J. Tromp, M. Wheeler, S.J. Winter, C. Wu, K. Yelick. Note: NSF Report, 2011.
This document contains the findings and recommendations of the NSF – Advisory Committee for Cyberinfrastructure Task Force on Grand Challenges addressed by advances in Cyber Science and Engineering. The term Cyber Science and Engineering (CS&E) is introduced to describe the intellectual discipline that brings together core areas of science and engineering, computer science, and computational and applied mathematics in a concerted effort to use the cyberinfrastructure (CI) for scientific discovery and engineering innovations; CS&E is computational and data-based science and engineering enabled by CI. The report examines a host of broad issues faced in addressing the Grand Challenges of science and technology and explores how those can be met by advances in CI. Included in the report are recommendations for new programs and initiatives that will expand the portfolio of the Office of Cyberinfrastructure and that will be critical to advances in all areas of science and engineering that rely on the CI.
Advisory Committee for CyberInfrastructure Task Force on Software for Science and Engineering|
D. Keyes, V. Taylor, T. Hey, S. Feldman, G. Allen, P. Colella, P. Cummings, F. Darema, J. Dongarra, T. Dunning, M. Ellisman, I. Foster, W. Gropp, C.R. Johnson, C. Kamath, R. Madduri, M. Mascagni, S.G. Parker, P. Raghavan, A. Trefethen, S. Valcourt, A. Patra, F. Choudhury, C. Cooper, P. McCartney, M. Parashar, T. Russell, B. Schneider, J. Schopf, N. Sharp. Note: NSF Report, 2011.
The Software for Science and Engineering (SSE) Task Force commenced in June 2009 with a charge that consisted of the following three elements:Identify specific needs and opportunities across the spectrum of scientific software infrastructure. Characterize the specific needs and analyze technical gaps and opportunities for NSF to meet those needs through individual and systemic approaches. Design responsive approaches. Develop initiatives and programs led (or co-led) by NSF to grow, develop, and sustain the software infrastructure needed to support NSF’s mission of transformative research and innovation leading to scientific leadership and technological competitiveness. Address issues of institutional barriers. Anticipate, analyze and address both institutional and exogenous barriers to NSF’s promotion of such an infrastructure.
The SSE Task Force members participated in bi-weekly telecons to address the given charge. The telecons often included additional distinguished members of the scientific community beyond the task force membership engaged in software issues, as well as personnel from federal agencies outside of NSF who manage software programs. It was quickly acknowledged that a number of reports loosely and tightly related to SSE existed and should be leveraged. By September 2009, the task formed had formed three subcommittees focused on the following topics: (1) compute-intensive science, (2) data-intensive science, and (3) software evolution.
Sensitivity Analysis for the Optimization of Radiofrequency Ablation in the Presence of Material Parameter Uncertainty|
I. Altrogge, T. Preusser, T. Kroeger, S. Haase, T. Paetz, R.M. Kirby. In International Journal for Uncertainty Quantification, 2011.
We present a sensitivity analysis of the optimization of the probe placement in radiofrequency (RF) ablation which takes the uncertainty associated with bio-physical tissue properties (electrical and thermal conductivity) into account. Our forward simulation of RF ablation is based upon a system of partial differential equations (PDEs) that describe the electric potential of the probe and the steady state of the induced heat. The probe placement is optimized by minimizing a temperature-based objective function such that the volume of destroyed tumor tissue is maximized. The resulting optimality system is solved with a multi-level gradient descent approach. By evaluating the corresponding optimality system for certain realizations of tissue parameters (i.e. at certain, well-chosen points in the stochastic space) the sensitivity of the system can be analyzed with respect to variations in the tissue parameters. For the interpolation in the stochastic space we use a stochastic finite element approach with piecewise multilinear ansatz functions on adaptively refined, hierarchical grids. We underscore the significance of the approach by applying the optimization to CT data obtained from a real RF ablation case.
Keywords: netl, stochastic sensitivity analysis, stochastic partial dierential equations, stochastic nite element method, adaptive sparse grid, heat transfer, multiscale modeling, representation of uncertainty
Implementation and Verification of a Nodally-Integrated Tetrahedral Element in FEBio|
SCI Technical Report, S.A. Maas, B.J. Ellis, D.S. Rawlins, L.T. Edgar, C.R. Henak, J.A. Weiss. No. UUSCI-2011-007, SCI Institute, University of Utah, 2011.
Finite element simulations in computational biomechanics commonly require the discretization of extremely complicated geometries. Creating meshes for these complex geometries can be very difficult and time consuming using hexahedral elements. Automatic meshing algorithms exist for tetrahedral elements, but these elements often have numerical problems that discourage their use in complex finite element models. To overcome these problems we have implemented a stabilized, nodally-integrated tetrahedral element formulation in FEBio, our in-house developed finite element code, allowing researchers to use linear tetrahedral elements in their models and still obtain accurate solutions. In addition to facilitating automatic mesh generation, this also allows researchers to use mesh refinement algorithms which are fairly well developed for tetrahedral elements but not so much for hexahedral elements. In this document, the implementation of the stabilized, nodallyintegrated, tetrahedral element, named the \"UT4 element\", is described. Two slightly different variations of the nodally integrated tetrahedral element are considered. In one variation the entire virtual work is stabilized and in the other one the stabilization is only applied to the isochoric part of the virtual work. The implementation of both formulations has been verified and the convergence behavior illustrated using the patch test and three verification problems. Also, a model from our laboratory with very complex geometry is discretized and analyzed using the UT4 element to show its utility for a problem from the biomechanics literature. The convergence behavior of the UT4 element does vary depending on problem, tetrahedral mesh structure and choice of formulation parameters, but the results from the verification problems should assure analysts that a converged solution using the UT4 element can be obtained that is more accurate than the solution from a classical linear tetrahedral formulation.
Dark Regions of No-Reflow on Late Gadolinium Enhancement Magnetic Resonance Imaging Result in Scar Formation After Atrial Fibrillation Ablation|
C.J. McGann, E.G. Kholmovski, J.J. Blauer, S. Vijayakumar, T.S. Haslam, J.E. Cates, E.V. DiBella, N.S. Burgon, B. Wilson, A.J. Alexander, M.W. Prastawa, M. Daccarett, G. Vergara, N.W. Akoum, D.L. Parker, R.S. MacLeod, N.F. Marrouche. In Journal of the American College of Cardiology, Vol. 58, No. 2, pp. 177--185. 2011.
PubMed ID: 21718914
Objectives: The aim of this study was to assess acute ablation injuries seen on late gadolinium enhancement (LGE) magnetic resonance imaging (MRI) immediately post-ablation (IPA) and the association with permanent scar 3 months post-ablation (3moPA).
Background: Success rates for atrial fibrillation catheter ablation vary significantly, in part because of limited information about the location, extent, and permanence of ablation injury at the time of procedure. Although the amount of scar on LGE MRI months after ablation correlates with procedure outcomes, early imaging predictors of scar remain elusive.
Methods: Thirty-seven patients presenting for atrial fibrillation ablation underwent high-resolution MRI with a 3-dimensional LGE sequence before ablation, IPA, and 3moPA using a 3-T scanner. The acute left atrial wall injuries on IPA scans were categorized as hyperenhancing (HE) or nonenhancing (NE) and compared with scar 3moPA.
Results: Heterogeneous injuries with HE and NE regions were identified in all patients. Dark NE regions in the left atrial wall on LGE MRI demonstrate findings similar to the \"no-reflow\" phenomenon. Although the left atrial wall showed similar amounts of HE, NE, and normal tissue IPA (37.7 ± 13\%, 34.3 ± 14\%, and 28.0 ± 11\%, respectively; p = NS), registration of IPA injuries with 3moPA scarring demonstrated that 59.0 ± 19\% of scar resulted from NE tissue, 30.6 ± 15\% from HE tissue, and 10.4 ± 5\% from tissue identified as normal. Paired t-test comparisons were all statistically significant among NE, HE, and normal tissue types (p less than 0.001). Arrhythmia recurrence at 1-year follow-up correlated with the degree of wall enhancement 3moPA (p = 0.02).
Conclusion: Radiofrequency ablation results in heterogeneous injury on LGE MRI with both HE and NE wall lesions. The NE lesions demonstrate no-reflow characteristics and reveal a better predictor of final scar at 3 months. Scar correlates with procedure outcomes, further highlighting the importance of early scar prediction. (J Am Coll Cardiol 2011;58:177–85) © 2011 by the American College of Cardiology Foundation
Smoothness-Increasing Accuracy-Conserving (SIAC) Postprocessing for Discontinuous Galerkin Solutions Over Structured Triangular Meshes|
H. Mirzaee, Liangyue, J.K. Ryan, R.M. Kirby. In SIAM Journal of Numerical Analysis, Vol. 49, No. 5, pp. 1899--1920. 2011.
Theoretically and computationally, it is possible to demonstrate that the order of accuracy of a discontinuous Galerkin (DG) solution for linear hyperbolic equations can be improved from order k+1 to 2k+1 through the use of smoothness-increasing accuracy-conserving (SIAC) filtering. However, it is a computationally complex task to perform this in an efficient manner, which becomes an even greater issue considering nonquadrilateral mesh structures. In this paper, we present an extension of this SIAC filter to structured triangular meshes. The basic theoretical assumption in the previous implementations of the postprocessor limits the use to numerical solutions solved over a quadrilateral mesh. However, this assumption is restrictive, which in turn complicates the application of this postprocessing technique to general tessellations. Additionally, moving from quadrilateral meshes to triangulated ones introduces more complexity in the calculations as the number of integrations required increases. In this paper, we extend the current theoretical results to variable coefficient hyperbolic equations over structured triangular meshes and demonstrate the effectiveness of the application of this postprocessor to structured triangular meshes as well as exploring the effect of using inexact quadrature. We show that there is a direct theoretical extension to structured triangular meshes for hyperbolic equations with bounded variable coefficients. This is a challenging first step toward implementing SIAC filters for unstructured tessellations. We show that by using the usual B-spline implementation, we are able to improve on the order of accuracy as well as decrease the magnitude of the errors. These results are valid regardless of whether exact or inexact integration is used. The results here demonstrate that it is still possible, both theoretically and computationally, to improve to 2k+1 over the DG solution itself for structured triangular meshes.
Efficient Implementation of Smoothness-Increasing Accuracy-Conserving (SIAC) Filters for Discontinuous Galerkin Solutions|
H. Mirzaee, J.K. Ryan, R.M. Kirby. In Journal of Scientific Computing, pp. (in press). 2011.
The discontinuous Galerkin (DG) methods provide a high-order extension of the finite volume method in much the same way as high-order or spectral/hp elements extend standard finite elements. However, lack of inter-element continuity is often contrary to the smoothness assumptions upon which many post-processing algorithms such as those used in visualization are based. Smoothness-increasing accuracy-conserving (SIAC) filters were proposed as a means of ameliorating the challenges introduced by the lack of regularity at element interfaces by eliminating the discontinuity between elements in a way that is consistent with the DG methodology; in particular, high-order accuracy is preserved and in many cases increased. The goal of this paper is to explicitly define the steps to efficient computation of this filtering technique as applied to both structured triangular and quadrilateral meshes. Furthermore, as the SIAC filter is a good candidate for parallelization, we provide, for the first time, results that confirm anticipated performance scaling when parallelized on a shared-memory multi-processor machine.
Numerical Solution of Linear Volterra Integral Equations of the Second Kind with Sharp Gradients|
S.A. Isaacson, R.M. Kirby. In Journal of Computational and Applied Mathematics, Vol. 235, No. 14, pp. 4283--4301. 2011.
Collocation methods are a well-developed approach for the numerical solution of smooth and weakly singular Volterra integral equations. In this paper, we extend these methods through the use of partitioned quadrature based on the qualocation framework, to allow the efficient numerical solution of linear, scalar Volterra integral equations of the second kind with smooth kernels containing sharp gradients. In this case, the standard collocation methods may lose computational efficiency despite the smoothness of the kernel. We illustrate how the qualocation framework can allow one to focus computational effort where necessary through improved quadrature approximations, while keeping the solution approximation fixed. The computational performance improvement introduced by our new method is examined through several test examples. The final example we consider is the original problem that motivated this work: the problem of calculating the probability density associated with a continuous-time random walk in three dimensions that may be killed at a fixed lattice site. To demonstrate how separating the solution approximation from quadrature approximation may improve computational performance, we also compare our new method to several existing Gregory, Sinc, and global spectral methods, where quadrature approximation and solution approximation are coupled.
Towards the Development on an h-p-Refinement Strategy Based Upon Error Estimate Sensitivity|
P.K. Jimack, R.M. Kirby. In Computers and Fluids, Vol. 46, No. 1, pp. 277--281. 2011.
The use of (a posteriori) error estimates is a fundamental tool in the application of adaptive numerical methods across a range of fluid flow problems. Such estimates are incomplete however, in that they do not necessarily indicate where to refine in order to achieve the most impact on the error, nor what type of refinement (for example h-refinement or p-refinement) will be best. This paper extends preliminary work of the authors (Comm Comp Phys, 2010;7:631–8), which uses adjoint-based sensitivity estimates in order to address these questions, to include application with p-refinement to arbitrary order and the use of practical a posteriori estimates. Results are presented which demonstrate that the proposed approach can guide both the h-refinement and the p-refinement processes, to yield improvements in the adaptive strategy compared to the use of more orthodox criteria.
To CG or to HDG: A Comparative Study|
R.M. Kirby, B. Cockburn, S.J. Sherwin. In Journal of Scientific Computing, Note: published online, 2011.
Hybridization through the border of the elements (hybrid unknowns) combined with a Schur complement procedure (often called static condensation in the context of continuous Galerkin linear elasticity computations) has in various forms been advocated in the mathematical and engineering literature as a means of accomplishing domain decomposition, of obtaining increased accuracy and convergence results, and of algorithm optimization. Recent work on the hybridization of mixed methods, and in particular of the discontinuous Galerkin (DG) method, holds the promise of capitalizing on the three aforementioned properties; in particular, of generating a numerical scheme that is discontinuous in both the primary and flux variables, is locally conservative, and is computationally competitive with traditional continuous Galerkin (CG) approaches. In this paper we present both implementation and optimization strategies for the Hybridizable Discontinuous Galerkin (HDG) method applied to two dimensional elliptic operators. We implement our HDG approach within a spectral/hp element framework so that comparisons can be done between HDG and the traditional CG approach.
We demonstrate that the HDG approach generates a global trace space system for the unknown that although larger in rank than the traditional static condensation system in CG, has significantly smaller bandwidth at moderate polynomial orders. We show that if one ignores set-up costs, above approximately fourth-degree polynomial expansions on triangles and quadrilaterals the HDG method can be made to be as efficient as the CG approach, making it competitive for time-dependent problems even before taking into consideration other properties of DG schemes such as their superconvergence properties and their ability to handle hp-adaptivity.
Formal Specification of MPI 2.0: Case Study in Specifying a Practical Concurrent Programming API|
G. Li, R. Palmer, M. DeLisi, G. Gopalakrishnan, R.M. Kirby. In Science of Computer Programming, Vol. 76, pp. 65--81. 2011.
We describe the first formal specification of a non-trivial subset of MPI, the dominant communication API in high performance computing. Engineering a formal specification for a non-trivial concurrency API requires the right combination of rigor, executability, and traceability, while also serving as a smooth elaboration of a pre-existing informal specification. It also requires the modularization of reusable specification components to keep the length of the specification in check. Long-lived APIs such as MPI are not usually 'textbook minimalistic' because they support a diverse array of applications, a diverse community of users, and have efficient implementations over decades of computing hardware. We choose the TLA+ notation to write our specifications, and describe how we organized the specification of around 200 of the 300 MPI 2.0 functions. We detail a handful of these functions in this paper, and assess our specification with respect to the aforementioned requirements. We close with a description of possible approaches that may help render the act of writing, understanding, and validating the specifications of concurrency APIs much more productive.