Shlomo Yona's Blog -- יומן הרשת של שלמה יונה: Dec 1, 2010

Wednesday, December 1, 2010

קישורים לסיכומים ולמצגות בנושאים של מבני נתונים

מצאתי עמוד שמקשר לחומר רב מסיכומים, מצגות והדגמות של מבני נתונים רבים. החומרים נאספו מאוניברסיטאות ומהטכניון בארץ ובעולם. יש גם כמה קישורים לויקיפדיה. קישורים לבחינות מועילים מאוד לדעתי למי שמעוניין לתרגל את החומר כדי להתרשם האם אכן הבין את אשר קרא. קריאה טובה לממפתחים שבינינו -- לא יזיק לרענן את החומר ולהזכר ואולי אפילו להבין מחדש (ואולי בפעם הראשונה?) את הנושאים -- זה בוודאי לא יזיק וכנראה יעשה רק טוב לאיכות העבודה.

הקישור לאוסף החומר במבני נתונים: http://www.dsdb.co.nr/

התאמת אלגוריתם חיפוש לרוחב בארכיטקטורה של מעברים מרובי ליבה

בפרסומי הכנס, סופר-קומפיוטינג 10 (מחשוב על), SC '10 Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , שהתקיים השנה בניו אורלינס שבלואיזיאנה, בארה"ב, מצאתי נושא נוסף שסקרן אותי והוא יישום מקבילי ומבוזר של אחד האלגוריתמים הידועים והשימושיים בגרפים, אלגוריתם BFS. במאמרם החוקרים מציגים את הבעייתיות בחקירה של גרפים עצומיי מימדים ובודקים את השימוש של אלגוריתם חיפוש לרוחב (BFS) וניצול יעיל שלו עבור מעבדים מרובי ליבה מתקדמים. החוקרים מציגים שלוש גרסאות של אלגוריתם BFS לקונפיגורציה של ארכיטקטורה של מעבדים מרובי ליבה.

הנה הקישור למאמר בפורטל של ACM.

הנה התקציר של המאמר:

Scalable Graph Exploration on Multicore Processors
Authors: Virat Agarwal * Fabrizio Petrini * Davide Pasetto * David A. Bader

Abstract

Many important problems in computational sciences, social network analysis, security, and business analytics, are data-intensive and lend themselves to graph-theoretical analyses. In this paper we investigate the challenges involved in exploring very large graphs by designing a breadth-first search (BFS) algorithm for advanced multi-core processors that are likely to become the building blocks of future exascale systems. Our new methodology for large-scale graph analytics combines a highlevel algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processorspecific optimizations. We present an experimental study that uses state-of-the-art Intel Nehalem EP and EX processors and up to 64 threads in a single system. Our performance on several benchmark problems representative of the power-law graphs found in real-world problems reaches processing rates that are competitive with supercomputing results in the recent literature. In the experimental evaluation we prove that our graph exploration algorithm running on a 4-socket Nehalem EX is (1) 2.4 times faster than a Cray XMT with 128 processors when exploring a random graph with 64 million vertices and 512 millions edges, (2) capable of processing 550 million edges per second with an R-MAT graph with 200 million vertices and 1 billion edges, comparable to the performance of a similar graph on a Cray MTA-2 with 40 processors and (3) 5 times faster than 256 BlueGene/L processors on a graph with average degree 50.

התמרת גאוס מהירה ומקבילית

בפרסומי הכנס, סופר-קומפיוטינג 10 (מחשוב על), SC '10 Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , שהתקיים השנה בניו אורלינס שבלואיזיאנה, בארה"ב, לכד את עיניי פרסום על התמרת גאוס מהירה ומקבילית. החוקרים מציגים אלגוריתם מקבילי לחישוב הסכום של N גאוסיאנים ב-N נקודות. בעוד שחישוב סדור של סכום זה דורש סיבוכיות זמן מסדר גודל ריבועי ב-N, האלגוריתם שהחוקרים מציגים דורש סיבוכיות זמן מקבילית שמוערכת בסדר גודל של- N/Np להתפלגות נקודות אחידה ולסיבוכיות זמן מקבילית שמוערכת בסדר גודל של N/np log N/np + nplognp עבור התפלגות נקודות שאינה אחידה. הקבוע Np מייצג את מספר המעבדים שבשימוש האלגוריתם.

הנה קישור למאמר והנה התקציר של המאמר:

Parallel Fast Gauss Transform
Authors: Rahul S. Sampath * Hari Sundar * Shravan K. Veerapaneni

Abstract
We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take $O(N^2)$ time. The parallel time complexity estimates for our algorithms are $O(N/np)$ for uniform point distributions and $O(N/np log N/np + nplognp)$ for nonuniform distributions using np CPUs. We incorporate a planewave representation of the Gaussian kernel which permits “diagonal translation”. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle nonuniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer at the Oak Ridge National Laboratory. Our implementation is kernel-independent and can handle other “Gaussian-type” kernels even when an explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.