9th Workshop on Language-Based Parallel Programming (WLPP 2022)

Frontmatter

Kokkos-Based Implementation of MPCD on Heterogeneous Nodes

The Kokkos based library Cabana, which has been developed in the Co-design Center for Particle Applications (CoPA), is used for the implementation of Multi-Particle Collision Dynamics (MPCD), a particle-based description of hydrodynamic interactions. It allows a performance portable implementation, which has been used to study the interplay between CPU and GPU usage on a multi-node system. As a result, we see most advantages in a homogeneous GPU usage, but we also discuss the extent to heterogeneous applications, using both CPU and GPU concurrently.

Rene Halver, Christoph Junghans, Godehard Sutmann

Comparison of Load Balancing Schemes for Asynchronous Many-Task Runtimes

A popular approach to program scalable irregular applications is Asynchronous Many-Task (AMT) Programming. Here, programs define tasks according to task models such as dynamic independent tasks (DIT) or nested fork-join (NFJ). We consider cluster AMTs, in which a runtime system maps the tasks to worker threads in multiple processes.Thereby, dynamic load balancing can be achieved via work-stealing or work-sharing. A well-performing work-stealing variant is the lifeline scheme. While previous implementations are restricted to single-worker processes, a recent hybrid extension combines the scheme with intra-process work-sharing between multiple workers. The hybrid scheme comes at the price of a higher complexity.This paper investigates whether this complexity is indispensable by contrasting the scheme with a pure work-stealing extension of the lifeline scheme introduced in the paper. In an experimental comparison based on independent DIT and NFJ implementations and three benchmarks, the pure work-stealing scheme is on a par or even outperforms the hybrid one by up to $$3.8\%$$ 3.8 % .

Lukas Reitz, Kai Hardenbicker, Claudia Fohry

New Insights on the Revised Definition of the Performance Portability Metric

The rise in the demand for new performance portability frameworks for heterogeneous computing systems has brought with it a number of proposals of workable metrics for evaluating the performance portability of applications. This article compares the revised definition and criteria of the metric and the metric that was derived from it and improves it. The comparison is based on a detailed analysis of nine properties and recent studies of the performance portability of various applications.

Ami Marowka

Inferential Statistical Analysis of Performance Portability

The assessment of the performance portability of hybrid programming models is based on many unverifiable observations. Drawing from the assessment by knowledgeable analysts, subjective conclusions from unverifiable data are incomplete without descriptive and inferential statistical analysis.In this article, a knowledgeable analyst’s assessment of the performance portability of OpenACC, OpenMP, Kokkos and Raja, on CPU and GPU architectures is confronted with inferential statistical analysis of two types of hypothesis tests while carefully examining the effect of outliers.

Ami Marowka

NPDP Benchmark Suite for Loop Tiling Effectiveness Evaluation

The paper introduces ten non-serial polyadic dynamic programming (NPDP) kernels as a benchmark suite dedicated to effectiveness evaluation of tiled code generated by means of polyhedral optimization compilers. Most of the applications implement bioinformatics algorithms which are challenging and ongoing tasks for automatic loop nest tiling transformations. The paper describes mathematically examined kernels and uniformizes them in the form of loop nests presented in the C language. In an experimental study, we applied the two automatic source-to-source compilers, TRACO and PluTo, to generate cache-efficient codes and analysed their performance on three multi-core machines. We discuss the limitations of well-known tiling approaches and outline future tiling strategies for the introduced NPDP Benchmark suite.

Marek Palkowski, Wlodzimierz Bielecki

Parallel Vectorized Implementations of Compensated Summation Algorithms

The aim of this paper is to show that Kahan’s and Gill-Møller compensated summation algorithms that allow to achieve high accuracy of summing long sequences of floating-point numbers can be efficiently vectorized and parallelized. The new implementation uses Intel AVX-512 intrinsics together with OpenMP constructs in order to utilize SIMD extension of modern multicore processors. We describe in detail the vectorization technique and show how to define custom reduction operators in OpenMP. Numerical experiments performed on a server with Intel Xeon Gold 6342 processors show that the new implementations of the compensated summation algorithms achieve much better accuracy than ordinary summation and their performance is comparable with the performance of the ordinary summation algorithm optimized automatically. Moreover, the experiments show that the vectorized implementation of the Gill-Møller algorithm is faster than the vectorized implementation of Kahan’s algorithm.

Beata Dmitruk, Przemysław Stpiczyński

6th Workshop on Models, Algorithms and Methodologies for Hybrid Parallelism in New HPC Systems (MAMHYP 2022)

Frontmatter

Malleability Techniques for HPC Systems

Abstract The current static usage model of HPC systems is becoming increasingly inefficient due to the continuously growing complexity of system architectures, combined with the increased usage of coupled applications, the need for strong scaling with extreme scale parallelism, and the increasing reliance on complex and dynamic workflows. Malleability techniques adjust resource usage dynamically for HPC systems and applications to extract maximum efficiency. In this paper, we present FlexMPI, a tool being developed in the ADMIRE project that provides an intelligent global coordination of resource usage at the application level. FlexMPI considers runtime scheduling of computation, network usage, and I/O across all system architecture components. It can optimize the exploitation of HPC and I/O resources while minimizing the makespan of applications in many cases. Furthermore, FlexMPI provides facilities such as application world recomposition to generate a new consistent state when processes are added or removed to the applications, data redistribution to the new application world, and I/O interference detection to migrate congesting processes. We also present an environmental use case co-designed using FlexMPI. The evaluation shows its adaptability and scalability.

Jesus Carretero, David Exposito, Alberto Cascajo, Raffaele Montella

Algorithm and Software Overhead: A Theoretical Approach to Performance Portability

In the last years, the portability term has enriched itself with new meanings: research communities are talking about how to measure the degree to which an application (or library, programming model, algorithm implementation, etc.) has become “performance portable”. The term “performance portability” has been informally used in computing communities to substantially refer to: (1) the ability to run one application across multiple hardware platforms; and (2) achieving some decent level of performance on these platforms [1, 2]. Among the efforts related to the “performance portability” issue, we note the annual performance portability workshops organized by the US Department of Energy [3]. This article intends to add a new point of view to the performance portability issue, starting from a more theoretical point of view, that shows the convenience of splitting the proper algorithm from the emphoverhead, and exploring the different factors that introduce different kind of overhead. The paper explores the theoretical framework to get a definition of the execution time of a software but that definition is not the point. The aim is to show and understand the link between that execution time and the beginning of the design, to exploit what part of any program is really environment-sensitive and exclude from performance portability formulas everything is not going to change, as theoretically shown.

Valeria Mele, Giuliano Laccetti

Benchmarking a High Performance Computing Heterogeneous Cluster

The paper describes the results of some benchmarking tests aimed to verify and validate all the solutions implemented during the deployment of a HPC heterogeneous resource acquired by the data center of the University of Naples “Federico II” thanks to the funds of the IBiSCo (Infrastructure for Big data and Scientific COmputing) Italian National Project. The first set of benchmarks evaluates how the network interconnection technologies affect the inter- and intra-node communications of GP-GPU workloads. The second set evaluates the performance of the Lustre parallel file system to ensure an efficient environment for data-intensive applications. The tests, especially those that analyze the lower level of the middleware (micro-benchmarks), seem to confirm the ability of the resource to guarantee the expected performance.

Luisa Carracciuolo, Davide Bottalico, Davide Michelino, Gianluca Sabella, Bernardino Spisso

A Generative Adversarial Network Approach for Noise and Artifacts Reduction in MRI Head and Neck Imaging

As the volume of data available to healthcare and life sciences specialists proliferates, so do the opportunities for life-saving breakthroughs. But time is a key factor. High-Performance Computing (HPC) can help practitioners accurately analyze data and improve patient outcomes, from drug discovery to finding the best-tailored therapy options. In this paper, we present and discuss an Artificial Intelligent methodology based on a Generative Adversarial Network to improve the perceived visual quality of MRI images related to the head and neck region. The experimental results demonstrate that once trained and validated, our model performs better with respect to the state of art methods and testing it on unseen real corrupted data improved the quality of the images in most cases.

Salvatore Cuomo, Francesco Fato, Lorenzo Ugga, Gaia Spadarella, Reanto Cuocolo, Fabio Giampaolo, Francesco Piccialli

A GPU Accelerated Hyperspectral 3D Convolutional Neural Network Classification at the Edge with Principal Component Analysis Preprocessing

The Edge Computing paradigm promises to transfer decision-making processes based on artificial intelligence algorithms to the edge of the network without the need to query servers far from the data collection point. Hyperspectral image classification is one of the application fields that can benefit most from the close relationship between Edge Computing and Artificial Intelligence. It consists of a framework of techniques and methodologies for collecting and processing images related to objects or scenes on the Earth’s surface, employing cameras or other sensors mounted on Unmanned Aerial Vehicles. However, the computing performance of the edge devices is not comparable with those of high-end servers, so specific approaches are required to consider the influence of the computing environment on the algorithm development methodology. In the present work, we propose a hybrid technique to make the Hyperspectral Image classification through Convolutional Neural Network affordable on low-power and high-performance sensor devices. We first use the Principal Component Analysis to filter insignificant wavelengths to reduce the dataset dimension; then, we use a process acceleration strategy to improve the performance by introducing a GPU-based form of parallelism.

Gianluca De Lucia, Marco Lapegna, Diego Romano

Parallel gEUD Models for Accelerated IMRT Planning on Modern HPC Platforms

Radiotherapy treatments apply high doses of radiation to tumorous cells to break the structure of cancer DNA, trying at the same time to minimize radiation doses absorbed by healthy cells. The personalized design of radiotherapy plans has been a relevant challenge since the beginning of these therapies. A wide set of models have been defined to translate complex clinical prescriptions into optimization problems. The model based on the generalized equivalent uniform dose, gEUD, is very relevant for IMRT radiotherapy planning in clinical practice. This way, the expert physicists can tune plans near the prescriptions, solving the optimization problem based on gEUD in a trial-and-error process. The gradient descent methods can be applied for solving these models personalized for every patient. However, their computational requirements are huge. So, to facilitate their use in clinical practice it is necessary to apply HPC techniques to implement such models. In this work, we have developed two parallel implementations of an gEUD model for IMRT planning on multi-core and GPU architectures, as they are increasingly available in clinical settings. Both implementations are evaluated with two Head &Neck clinical tumor cases on modern GPU and multi-core CPU platforms. Our implementations are very useful since they help expert physicists obtain fast plans that can satisfy all the prescriptions.

Juan José Moreno, Janusz Miroforidis, Ignacy Kaliszewski, Gracia Ester Martín Garzón

First Workshop on Quantum Computing and Communication

Frontmatter

On Quantum-Assisted LDPC Decoding Augmented with Classical Post-processing

Utilizing present and futuristic Quantum Computers to solve difficult problems in different domains has become one of the main endeavors at this moment. Of course, in arriving at the requisite solution both quantum and classical computers work in conjunction. With the continued popularity of Low Density Parity Check (LDPC) codes and hence their decoding, this paper looks into the latter as a Quadratic Unconstrained Binary Optimization (QUBO) and utilized D-Wave 2000Q Quantum Annealer to solve it. The outputs from the Annealer are classically post-processed using simple minimum distance decoding to further improve the performance. We evaluated and compared this implementation against the decoding performance obtained using Simulated Annealing (SA) and belief propagation (BP) decoding with classical computers. The results show that implementations of annealing (both simulated and quantum) are superior to BP decoding and suggest that the advantage becomes more prominent as block lengths increase. Reduced Bit Error Rate (BER) and Frame Error Rate (FER) are observed for simulated annealing and quantum annealing, at useful SNR range - a trend that persists for various codeword lengths.

Aditya Das Sarma, Utso Majumder, Vishnu Vaidya, M Girish Chandra, A Anil Kumar, Sayantan Pramanik

Quantum Annealing to Solve the Unrelated Parallel Machine Scheduling Problem

Quantum computing has emerged in recent years as an alternative to classical computing, which could improve the latter in solving some types of problems. One of the quantum programming models, Adiabatic Quantum Computing, has been successfully used to solve problems such as graph partitioning, traffic routing and task scheduling. Specifically, in this paper we focus on the scheduling on unrelated parallel machines problem. It is a workload-balancing problem where the processing time of any procedure executed on any of the available processing elements is known. Here, the problem is expressed as Quadratic Unconstrained Binary Optimisation, which can be subsequently solved using quantum annealers. The quantum nonlinear programming framework discussed in this work consists of three steps: quadratic approximation of cost function, binary representation of parameter space, and solving the resulting Quadratic Unconstrained Binary Optimisation. One of the novelties in tackling this problem has been to compact the model bearing in mind the repetitions of each task, to make it possible to solve larger scheduling problems.

Francisco Orts, Antonio M. Puertas, Ester M. Garzón, Gloria Ortega

Early Experiences with a Photonic Quantum Simulator for Solving Job Shop Scheduling Problem

Quantum computing is a rapidly developing technology that, in theory, can solve complex computational problems practically intractable for classical computers. Although the technology offers promising breakthroughs, it is only in the early stages of development, and various quantum computer architectures are emerging. One such new development is the photonic quantum computer. Since the work on discrete optimization using different quantum computer architectures is well studied, in this paper, we experiment with solving a toy instance of the Job-Shop Scheduling problem using a hybrid learning algorithm on a photonic quantum computer simulator. The promising results, combined with some highly desirable properties of photonic quantum computers, show that this new architecture is worth considering for further development and investment in the quantum technology landscape.

Mateusz Slysz, Krzysztof Kurowski, Jan Węglarz

Some Remarks on Super-Gram Operators for General Bipartite Quantum States

The Gramian matrices approach to study certain aspects of quantum entanglement contained in the bipartite pure quantum states is being extended to the level of a general quantum bipartite states. The corresponding Gram matrices, called here super-gram matrices are being constructed over the Hilbert-Schmidt structure build on the Hilbert space of pure states. The main result is the extension of the widely known realignment criterion to the level of super-operators.

Roman Gielerak, Marek Sawerwain

Solving the Traveling Salesman Problem with a Hybrid Quantum-Classical Feedforward Neural Network

Solving combinatorial optimization problems with the Quantum Approximate Optimization Algorithm (QAOA) is becoming more and more popular. The performance of the QAOA strongly depends on the initial parameters and the optimization procedure. This work presents a benchmark for solving the Traveling Salesman Problem (TSP) that introduces a hybrid feedforward neural network as the QAOA’s optimization routine. The strength of this method lies in training the optimization procedure on many instances of the problem and using mini-batch updates of the parameters. Although the learning process is costly, the advantage of this method is that after the neural network is trained, it immediately returns optimized parameters for new problem instances. We present the advantage of our method by evaluating it on two sets of initial parameters. The experiments demonstrated that the proposed hybrid quantum-classical feedforward neural network can be successfully used to solve the TSP.

Justyna Zawalska, Katarzyna Rycerz

Software Aided Analysis of EWL Based Quantum Games

In this paper, we present the library supporting analysis of Eisert–Wilkens–Lewenstein (EWL) scheme [3] proposed as a quantum extension for $$2 \times 2$$ 2 × 2 bimatrix games on the example of Prisoner’s Dilemma. Such schemes are often used as a basis for quantization in game theory field [8]. The proposed solution is based on modern approach combining symbolic and numerical calculations with the actual access to quantum simulators and real devices provided by IBM-Q. In particular, the library provides high-level functions for searching Nash equilibria in pure strategies as well as finding the best response cycles which can lead to the existence of Nash equilibria in mixed states [16].

Piotr Kotara, Tomasz Zawadzki, Katarzyna Rycerz

First Workshop on Applications of Machine Learning and Artificial Intelligence in High Performance Computing (WAML 2022)

Frontmatter

Adaptation of AI-Accelerated CFD Simulations to the IPU Platform

Intelligence Processing Units (IPU) have proven useful for many AI applications. In this paper, we evaluate them within the emerging field of AI for simulation, where traditional numerical simulations are supported by artificial intelligence approaches. We focus specifically on a program for training machine learning models supporting a computational fluid dynamics application. We use custom TensorFlow provided by the Poplar SDK to adapt the program for the IPU-POD16 platform and investigate its ease of use and performance scalability. Training a model on data from OpenFOAM simulations allows us to get accurate simulation state predictions in test time. We show how to utilize the popdist library to overcome a performance bottleneck in feeding training data to the IPU on the host side, achieving up to 34% speedup. Due to communication overheads, using data parallelism to utilize two IPUs instead of one does not improve the throughput. However, once the intra-IPU costs have been paid, the hardware capabilities for inter-IPU communication allow for good scalability. Increasing the number of IPUs from 2 to 16 improves the throughput from 560.8 to 2805.8 samples/s.

Paweł Rościszewski, Adam Krzywaniak, Sergio Iserte, Krzysztof Rojek, Paweł Gepner

Performance Analysis of Convolution Algorithms for Deep Learning on Edge Processors

We provide a complete performance comparison of two realizations of the convolution, based on the lowering approach and a blocked variant of the direct convolution algorithm. The theoretical analysis focuses on the conventional, high performance implementation of the general matrix multiplication (gemm), which is the key computational kernel underneath these two algorithms. The study leverages a simulator calibrated for the GAP8 edge processor and exploits the determinism of the memory system in this type of architectures to deliver accurate predictions of the arithmetic and data transfer costs.

Pedro Alonso-Jordá, Héctor Martínez, Enrique S. Quintana-Ortí, Cristian Ramírez

Machine Learning-Based Online Scheduling in Distributed Computing

In this work, we propose and evaluate an online scheduler prototype based on machine learning algorithms. Online job-flow scheduler should make scheduling and resource allocation decisions for individual jobs without any prior knowledge of the subsequent job queue (i.e., online). We simulate and generalize this task to a more formal 0–1 Knapsack problem with unknown utility functions of the knapsack items. In this way we evaluate the implemented machine learning-based solution to classical combinatorial optimization algorithms. A hybrid machine learning and dynamic programming - based approach is proposed to consider and strictly satisfy the knapsack constraint on the total weight. As a main result the proposed hybrid solution showed efficiency comparable to the greedy knapsack approximation.

Victor Toporkov, Dmitry Yemelyanov, Artem Bulkhak

High Performance Computing Queue Time Prediction Using Clustering and Regression

High Performance Computing (HPC) users are often provided little or no information at job submission time regarding how long their job will be queued until it begins execution. Foreknowledge of a long queue time can inform HPC user’s decision to migrate their jobs to commercial cloud infrastructure to receive their results sooner. Various researchers have used different machine learning techniques to build queue time estimators. This research applies the proven technique of K-Means clustering followed by Gradient Boosted Tree regression on over 700,000 jobs actually submitted to an HPC system to predict a submitted job’s queue time from HPC system characteristics and user provided job requirements. This method applied to HPC queue time prediction achieves better than 96% accuracy at classifying whether a job will start prior to an assigned deadline. Additionally, this research shows that historic HPC CPU allocation data can be used to predict future increases or decreases in job queue time with accuracy exceeding 96%.

Scott Hutchison, Daniel Andresen, Mitchell Neilsen, William Hsu, Benjamin Parsons

Acceptance Rates of Invertible Neural Networks on Electron Spectra from Near-Critical Laser-Plasmas: A Comparison

While the interaction of ultra-intense ultra-short laser pulses with near- and overcritical plasmas cannot be directly observed, experimentally accessible quantities (observables) often only indirectly give information about the underlying plasma dynamics. Furthermore, the information provided by observables is incomplete, making the inverse problem highly ambiguous. Therefore, in order to infer plasma dynamics as well as experimental parameter, the full distribution over parameters given an observation needs to considered, requiring that models are flexible and account for the information lost in the forward process. Invertible Neural Networks (INNs) have been designed to efficiently model both the forward and inverse process, providing the full conditional posterior given a specific measurement. In this work, we benchmark INNs and standard statistical methods on synthetic electron spectra. First, we provide experimental results with respect to the acceptance rate, where our results show increases in acceptance rates up to a factor of 10. Additionally, we show that this increased acceptance rate also results in an increased speed-up for INNs to the same extent. Lastly, we propose a composite algorithm that utilizes INNs and promises low runtimes while preserving high accuracy.

Thomas Miethlinger, Nico Hoffmann, Thomas Kluge

4th Workshop on Applied High Performance Numerical Algorithms for PDEs

Frontmatter

MATLAB Implementation of Hp Finite Elements on Rectangles Using Hierarchical Basis Functions

A MATLAB implementation of hierarchical shape functions on 2D rectangles is explained and available for download. Global shape functions are ordered for a given polynomial degree according to the indices of the nodes, edges, or elements to which they belong. For a uniform p-refinement, the hierarchical structure enables an effective assembly of mass and stiffness matrices. A solution to a boundary value problem is approximated for various levels of uniform h and p refinements.

Alexej Moskovka, Jan Valdman

Adaptive Parallel Average Schwarz Preconditioner for Crouzeix-Raviart Finite Volume Method

In this paper, we describe and analyze an Average Schwarz Method with spectrally enriched coarse space for a Crouzeix-Raviart finite volume element discretization of a multiscale problem. The derived preconditioner is symmetric and we apply GMRES iterative method to the preconditioned problem obtaining the convergence rate of GMRES weakly dependent on the ratio of the coarse to fine mesh h/H if the enrichments of the coarse space contain sufficiently many specially constructed eigenfunctions.

Leszek Marcinkowski, Talal Rahman

Parareal Method for Anisotropic Diffusion Denoising

This paper studies time-domain parallelisation using Parareal to speed up the computations of anisotropic diffusion filtering. We consider both explicit and implicit Euler based method for the propagation in time for Parareal. The Preconditioned Conjugate Gradient (PCG) method is used to solve the systems that arise in the implicit based method. The estimation of the iteration numbers of PCG allows us to predict the running time of Parareal calculation, which further guides us in the experimental stage. Parallelisation of the method is implemented using Coarray Fortran. We illustrate the experimental results on 3D low-field MRI images using up to 960 cores. The computational improvement in time is achieved.

Xiujie Shan, Martin B. van Gijzen

Comparison of Block Preconditioners for the Stokes Problem with Discontinuous Viscosity and Friction

Several block preconditioning strategies for the Stokes problem with piecewise discontinuous viscosity and friction are investigated for their efficiency and independence of the contrast in both viscosity and friction. The constituting blocks correspond to inexact solvers, based on algebraic multigrid. It follows that the block triangular preconditioner is the most robust choice.

Piotr Krzyżanowski

On Minimization of Nonlinear Energies Using FEM in MATLAB

Two minimization problems are added to the Moskovka and Valdman MATLAB package (2022): a Ginzburg-Landau (scalar) problem and a topology optimization (both scalar and vector) problem in linear elasticity. Both problems are described as nonlinear energy minimizations that contain the first gradient of the unknown field. Their energy functionals are discretized by finite elements, and the corresponding minima are searched using the trust-region method with a known Hessian sparsity or the Quasi-Newton method.

Alexej Moskovka, Jan Valdman, Marta Vohnoutová

A Model for Crowd Evacuation Dynamics: 2D Numerical Simulations

In [5] we have proposed a numerical scheme for solving a macroscopic model of crowd dynamics. We apply it here to simulate a room evacuation, for velocity fields derived from the p–Poisson equation. We analyze the stability parameters and the influence of p on the dynamics.

Maria Gokieli

5th Minisymposium on HPC Applications in Physical Sciences

Frontmatter

Parallel Identification of Unique Sequences in Nuclear Structure Calculations

Reducing the set of sequences into the set of sequences that are unique can save a lot of memory space in computer programs. We study this problem on the symmetry-adapted no-core shell model (SA-NCSM) nuclear structure calculations, where duplicated sequences of different kinds naturally emerge in the data of the basis of the Hilbert space physically relevant to a given nucleus. For a fast solution of this problem on multicore architectures, we propose and present a multithreaded algorithm suitable for high performance computing (HPC) environments. Furthermore, we provide an experimental evaluation of this algorithm and show that, in practice, it can significantly reduce the time required to identify unique sequences in a real-world application.

Daniel Langr, Tomáš Dytrych

Experimental and Computer Study of Molecular Dynamics of a New Pyridazine Derivative

The paper presents experimental and computer simulation studies of molecular dynamics of new pharmacological substances that are pyridazine derivatives. To obtain the information about molecular dynamics and cross-relaxation, a new pyridazine derivative was studied by the solid-state NMR spectroscopy homemade pulse spectrometer operating at the frequency of 30.2 MHz for protons and 28.411 MHz for fluorine nuclei, with complete absence of their interference. The Fourier Transform Infrared Spectroscopy (FTIR) data for triazolopyridazine derivates were analyzed and compared with the normal vibration frequency as calculated by quantum chemical methods. The standard Molecular Dynamics (MD) simulations were performed using the GROMACS package for the new pyridazine derivative. The simulation data were confronted with the experimental results.

Sebastian Wołoszczuk, Aneta Woźniak-Braszak, Andrzej Olejniczak, Michał Banaszak

Description of Magnetic Nanomolecules by the Extended Multi-orbital Hubbard Model: Perturbative vs Numerical Approach

We present a microscopic description of magnetic molecules by the extended multi-orbital Hubbard model. In the limit of large Coulomb on-site interaction, we derived the spin Hamiltonian using the perturbation theory. The magnetic coupling constant between two ions we determined in two different ways: a) from the expression obtained in the perturbation calculus and b) from the analysis of distances between the lowest levels of the energy spectrum obtained by the diagonalization of the Hamiltonian of the extended multi-orbital Hubbard model. In order to speed up the very long and memory-intensive process of constructing the Hamiltonian matrix, whose size was $$14400\times 14400$$ 14400 × 14400 , we implemented a procedure for locating the positions of non-zero elements. This significantly reduced the time of matrix creation and made it possible to perform calculations for more model parameters. The procedure we use can be applied to various nanomagnets, but the final calculations we performed for the molecular ring Cr $$_8$$ 8 . We showed that the intersite repulsion between electrons located on neighboring ions increases the antiferromagnetic exchange coupling between magnetic moments of these ions, but this increase can be compensated for by the effect of correlated hopping of electrons.

Romuald Lemański, Michał Antkowiak

Structural and Electronic Properties of Small-Diameter Carbon NanoTubes: A DFT Study

One of the crucial properties of Carbon NanoTubes (CNTs) is their conductivity. They can be metallic, semiconducting or insulating in nature [6]. Therefore, their conducting properties are closely related to the existence and width of CNTs energy band gap – quantity which is (relatively) easily calculable. From a theoretical point of view, CNTs have been studied by various methods. Many results have been obtained; however, their status is quite diverse. The widespread rule claims that (n,m) CNT is metallic if $$n-m = 0$$ n - m = 0 mod 3 [2, 6]. This rule was based on ‘gluing’ of graphene sheets into tubes (or the ‘zone folding’ method). Moreover, the geometry of all hexagons has been assumed to be identical – the structure optimization hasn’t been performed. Such an approach can be reliable for large-diameter CNTs, where curvature effects are small. However, it is at least disputable for its applicability to small-diameter CNTs. For these reasons, we undertook a systematic exploration of small-diameter CNTs to examine the significance of the ‘deviation’ effects (i.e. the deviation from planar regular hexagon geometry) on properties of CNTs. In particular, we wanted to check explicitly the validity of the claim that ‘CNTs (n,m), where $$n-m$$ n - m = 0 mod 3, possess zero energy gap’.In our paper, we present the results of calculations for (2, m) and (3, m) series of CNTs. These are optimized geometries, densities of states, energy gaps, and electronic band structures. The general conclusion is that the ‘zone-folding’ based rule predicting metallicity for those CNTs where $$n-m=0$$ n - m = 0 mod 3 is fulfilled, besides the find that hexagons forming CNTs are not planar and possess non-equal bond lengths. So this ‘zone-folding’ based law describes conductivity aspects of CNTs amazingly well.

Bartosz Brzostowski, Artur P. Durajski, Konrad M. Gruszka, Jacek Wojtkiewicz

8th Minisymposium on High Performance Computing Interval Methods

Frontmatter

Need for Techniques Intermediate Between Interval and Probabilistic Ones

In high performance computing, when we process a large amount of data, we do not have much information about the dependence between measurement errors corresponding to different inputs. To gauge the uncertainty of the result of data processing, the two usual approaches are: the interval approach, when we consider the worst-case scenario, and the probabilistic approach, when we assume that all these errors are independent. The problem is that usually, the interval approach leads to too pessimistic, too large uncertainty estimates, while the probabilistic approach – that assumes independence of measurement errors – sometimes underestimates the resulting uncertainty. To get realistic estimates, it is therefore desirable to have techniques intermediate between interval and probabilistic ones. In this paper, we propose such techniques based on the assumption that, in each practical situation, there is an upper bound $$b\in [0,1]$$ b ∈ [ 0 , 1 ] on the absolute value of all correlations between measurement errors – the bound that needs to be experimentally determined. The assumption that measurement errors are independent corresponds to $$b=0$$ b = 0 ; for $$b=1$$ b = 1 , we get interval estimates, and for intermediate values b, we get the desired intermediate techniques. We also provide efficient algorithms for implementing the new techniques.

Olga Kosheleva, Vladik Kreinovich

A Cross-Platform Benchmark for Interval Computation Libraries

Interval computation is widely used in Computer Aided Design to certify computations that use floating point operations to avoid pitfalls related to rounding error introduced by inaccurate operations. Despite its popularity and practical benefits, support for interval arithmetic is not standardized nor available in mainstream programming languages.We propose the first benchmark for interval computations, coupled with reference solutions computed with exact arithmetic, and compare popular C and C++ libraries over different architectures, operating systems, and compilers. The benchmark allows identifying limitations in existing implementations, and provides a reliable guide on which library to use on each system for different CAD applications. We believe that our benchmark will be useful for developers of future interval libraries, as a way to test the correctness and performance of their algorithms.

Xuan Tang, Zachary Ferguson, Teseo Schneider, Denis Zorin, Shoaib Kamil, Daniele Panozzo

Testing Interval Arithmetic Libraries, Including Their IEEE-1788 Compliance

As developers of libraries implementing interval arithmetic, we faced the same difficulties when it came to testing our libraries. What must be tested? How can we devise relevant test cases for unit testing? How can we ensure a high (and possibly 100%) test coverage? In this paper we list the different aspects that, in our opinion, must be tested, giving indications on the choice of test cases. Then we examine how several interval arithmetic libraries actually perform tests. Next, we present two existing frameworks developed specifically to gather test cases and to incorporate easily new libraries in order to test them, namely JInterval and ITF1788. Not every important aspects of our libraries fit in these frameworks and we list extra tests that we deem important, but not easy, to perform.

Nathalie Revol, Luis Benet, Luca Ferranti, Sergei Zhilin

A Survey of Interval Algorithms for Solving Multicriteria Analysis Problems

This paper surveys the research effort of several authors to solve various multicriteria problems, using interval methods. These efforts fall naturally into two categories: approximating the whole Pareto sets and seeking a single solution point optimal with respect to the preferences of a specific decision maker. In both kinds of problems, the interval calculus turns out to be useful. For several of the outlined approaches, their assumptions and practical importance is discussed. Parallelization (potential or actual) of the methods is also investigated. In particular, the discussion on parallelization of the algorithm of Fernandez and Toth seems to be an original contribution of this paper, as well, as an analysis of applicability of various interval goal programming and TOPSIS approaches.

Bartłomiej Jacek Kubica

7th Workshop on Complex Collective Systems

Frontmatter

Social Fragmentation Transitions in Large-Scale Parameter Sweep Simulations of Adaptive Social Networks

Social fragmentation transition is a transition of social states between many disconnected communities with distinct opinions and a well-connected single network with homogeneous opinions. This is a timely research topic with high relevance to various current societal issues. We had previously studied this problem using numerical simulations of adaptive social network models and found that two individual behavioral traits, homophily and attention to novelty, had the most statistically significant impact on the outcomes of social network evolution. However, our previous study was limited in terms of the range of parameter values examined, and possible interactions between multiple behavioral traits were largely ignored. In this study, we conducted a substantially larger-scale parameter sweep numerical experiment of the same model with expanded parameter ranges by an order of magnitude in each parameter dimension, resulting in a total of 116,640 simulation runs. To capture nontrivial interactions among behavioral parameters, we modeled and visualized the dependence of outcome measures on the model parameters using artificial neural networks. Results show that, while the competition between homophily and attention to novelty is still the primary determinant of social fragmentation, another transition plane emerges when individuals have strong social conformity behavior, which was not previously known. This implies that social fragmentation transition can also occur in the homophily-social conformity trade-off, the two behavioral traits that have very similar microscopic individual-level effects but produce very different macroscopic collective-level outcomes, illustrating the nontrivial macroscopic dynamics of complex collective systems.

Hiroki Sayama

Parking Search in Urban Street Networks: Taming Down the Complexity of the Search-Time Problem via a Coarse-Graining Approach

The parking issue is central in transport policies and drivers’ concerns, but the determinants of the parking search time remain relatively poorly understood. The question is often handled in a fairly ad hoc way, or by resorting to crude approximations. Very recently, we proposed a more general agent-based approach, which notably takes due account of the role of the street network and the unequal attractiveness of parking spaces, and showed that it can be solved analytically by leveraging the machinery of Statistical Physics and Graph Theory, in the steady-state mean-field regime. Although the analytical formula is computationally more efficient than direct agent-based simulations, it involves cumbersome matrices, with linear size proportional to the number of parking spaces. Here, we extend the theoretical approach and demonstrate that it can be further simplified, by coarse-graining the parking spot occupancy at the street level. This results in even more efficient analytical formulae for the parking search time, which could be used efficiently by transport engineers.

Léo Bulckaen, Nilankur Dutta, Alexandre Nicolas

A Multi-agent Cellular Automata Model of Lane Changing Behaviour Considering the Aggressiveness and the Autonomy

Various macroscopic and microscopic road traffic models allow traffic flow analysis. However, it should be emphasised that standard traffic flow models do not include drivers’ behaviour. Thus, we propose a multi-agent microscopic model for analysing the traffic flow considering the various type of agents. Agents represent both autonomous cars and various drivers’ behaviour (standard and aggressive drivers). Additionally, the presented model is based on accurate data, because it considers the actual dimensions of the vehicles. To accurately reflect the acquired dimensions of the cars, a small cell cellular automaton was used, where a set of cells represents one car. The obtained numerical results allowed us to bring both trivial and non-trivial conclusions.

Krzysztof Małecki, Piotr Wróbel, Patryk Górka

Comparison of the Use of UWB and BLE as Positioning Methods in Data-Driven Modeling of Pedestrian Dynamics

We conducted experiments on measuring the positioning of people using standard (custom) BLE and UWB positioning systems in an indoor test environment. Then, we analyzed and compared the results in terms of using the data obtained from the sensors for data-driven crowd simulations using Cellular Automata grids. Our research confirmed the usability of both technologies as positioning methods in data-driven modeling of pedestrian dynamics. Positioning using standard UWB configuration revealed to be much more accurate than the standard BLE configuration. In our experiments, based on standard configuration of both devices, the accuracy of the UWB results equated to standard deviation did not exceed 0.1 m - which is an order of magnitude (10 times) less than in the case of BLE.

Dariusz Pałka, Robert Lubaś, Giuseppe Vizzari, Jarosław Wąs

An Insight into the State-of-the-Art Vehicular Fog Computing with an Opportunistic Flavour

Vehicular fog computing constitutes an environment for execution of demanding computation and storage tasks. There formed a hierarchical decentralised and distributed architecture supports the resource-constrained devices in completion of assignments too complex for them, and too latency-sensitive to be delegated to a cloud. This paper evaluates the recent advances in the research on vehicular fog computing focused on the exploitation of resources of the moving vehicles that operate in the opportunistic device-to-device dynamic networking environments. The proposed evaluation criteria consider structural, behavioural, and functional safety aspects of such systems. A set of research directions concludes this article.

Krzysztof Ostrowski, Krzysztof Małecki

Springer Professional

About this book

Table of Contents

Frontmatter

9th Workshop on Language-Based Parallel Programming (WLPP 2022)

Frontmatter

Kokkos-Based Implementation of MPCD on Heterogeneous Nodes

Comparison of Load Balancing Schemes for Asynchronous Many-Task Runtimes

New Insights on the Revised Definition of the Performance Portability Metric

Inferential Statistical Analysis of Performance Portability

NPDP Benchmark Suite for Loop Tiling Effectiveness Evaluation

Parallel Vectorized Implementations of Compensated Summation Algorithms

6th Workshop on Models, Algorithms and Methodologies for Hybrid Parallelism in New HPC Systems (MAMHYP 2022)

Frontmatter

Malleability Techniques for HPC Systems

Algorithm and Software Overhead: A Theoretical Approach to Performance Portability

Benchmarking a High Performance Computing Heterogeneous Cluster

A Generative Adversarial Network Approach for Noise and Artifacts Reduction in MRI Head and Neck Imaging

A GPU Accelerated Hyperspectral 3D Convolutional Neural Network Classification at the Edge with Principal Component Analysis Preprocessing

Parallel gEUD Models for Accelerated IMRT Planning on Modern HPC Platforms

First Workshop on Quantum Computing and Communication

Frontmatter

On Quantum-Assisted LDPC Decoding Augmented with Classical Post-processing

Quantum Annealing to Solve the Unrelated Parallel Machine Scheduling Problem

Early Experiences with a Photonic Quantum Simulator for Solving Job Shop Scheduling Problem

Some Remarks on Super-Gram Operators for General Bipartite Quantum States

Solving the Traveling Salesman Problem with a Hybrid Quantum-Classical Feedforward Neural Network

Software Aided Analysis of EWL Based Quantum Games

First Workshop on Applications of Machine Learning and Artificial Intelligence in High Performance Computing (WAML 2022)

Frontmatter

Adaptation of AI-Accelerated CFD Simulations to the IPU Platform

Performance Analysis of Convolution Algorithms for Deep Learning on Edge Processors

Machine Learning-Based Online Scheduling in Distributed Computing

High Performance Computing Queue Time Prediction Using Clustering and Regression

Acceptance Rates of Invertible Neural Networks on Electron Spectra from Near-Critical Laser-Plasmas: A Comparison

4th Workshop on Applied High Performance Numerical Algorithms for PDEs

Frontmatter

MATLAB Implementation of Hp Finite Elements on Rectangles Using Hierarchical Basis Functions

Adaptive Parallel Average Schwarz Preconditioner for Crouzeix-Raviart Finite Volume Method

Parareal Method for Anisotropic Diffusion Denoising

Comparison of Block Preconditioners for the Stokes Problem with Discontinuous Viscosity and Friction

On Minimization of Nonlinear Energies Using FEM in MATLAB

A Model for Crowd Evacuation Dynamics: 2D Numerical Simulations

5th Minisymposium on HPC Applications in Physical Sciences

Frontmatter

Parallel Identification of Unique Sequences in Nuclear Structure Calculations

Experimental and Computer Study of Molecular Dynamics of a New Pyridazine Derivative

Description of Magnetic Nanomolecules by the Extended Multi-orbital Hubbard Model: Perturbative vs Numerical Approach

Structural and Electronic Properties of Small-Diameter Carbon NanoTubes: A DFT Study

8th Minisymposium on High Performance Computing Interval Methods

Frontmatter

Need for Techniques Intermediate Between Interval and Probabilistic Ones

A Cross-Platform Benchmark for Interval Computation Libraries

Testing Interval Arithmetic Libraries, Including Their IEEE-1788 Compliance

A Survey of Interval Algorithms for Solving Multicriteria Analysis Problems

7th Workshop on Complex Collective Systems

Frontmatter

Social Fragmentation Transitions in Large-Scale Parameter Sweep Simulations of Adaptive Social Networks

Parking Search in Urban Street Networks: Taming Down the Complexity of the Search-Time Problem via a Coarse-Graining Approach

A Multi-agent Cellular Automata Model of Lane Changing Behaviour Considering the Aggressiveness and the Autonomy

Comparison of the Use of UWB and BLE as Positioning Methods in Data-Driven Modeling of Pedestrian Dynamics

An Insight into the State-of-the-Art Vehicular Fog Computing with an Opportunistic Flavour

Backmatter

Premium Partner