[DynInst_API:] Two new Paradyn project tech reports available


Date: Tue, 17 Mar 2015 11:55:47 -0500
From: Barton Miller <bart@xxxxxxxxxxx>
Subject: [DynInst_API:] Two new Paradyn project tech reports available
The Paradyn project has two new technical reports, one in the area of binary
code analysis and the other MRNet scalability.

Our full list of project publications always can be found at:
 http://www.paradyn.org/html/publications-by-year.html

Comments and feedback on these papers is always welcome!

------------------------------------------------------------------------------
"Binary Code Is Not Easy", Xiaozhu Meng and Barton P. Miller
Submitted for publication, March 2015.
ftp://ftp.cs.wisc.edu/paradyn/papers/Meng15Parsing.pdf
Abstract"
  Binary code analysis is an enabling technique for many applications. Binary
  code analysis tool kits provide several capabilities to automatically
  analyze binary code, including decoding machine instructions, building
  control flow graphs of the program, and determining function boundaries.
  Modern compilers and run-time libraries have introduced significant
  complexities to binary code. These complexities negatively affect the
  capabilities of binary analysis tool kits, which may cause tools to report
  inaccurate information about binary code. Analysts may hence be confused
  and applications based on binary analysis tool kits may have degrading
  quality. We examine the problem of constructing control flow graphs from
  binary code and labeling the graphs with accurate function boundary
  annotations. We identify eight challenging code constructs that represent
  hard-to-analyze aspects of the code from modern compilers, and show code
  examples that illustrate each code construct. As part of this discussion,
  we discuss how the new code parsing algorithms in the open source Dyninst
  tool kit support these constructs. In particular, we present a new model
  for describing jump tables that improves our ability to precisely determine
  the control ow targets, a new interprocedural analysis to determine when a
  function is non-returning, and techniques for handling tail calls, code
  overlapping between functions, and code overlapping within instructions.
  We created test binaries patterned after each challenging code construct
  and evaluated how tool kits, including Dyninst, fare when parsing these
  binaries.
------------------------------------------------------------------------------
"Mr. Scan: A Hybrid/Hybrid Extreme Scale Density Based Clustering Algorithm",
Benjamin Welton and Barton Miller, submitted for publication, March 2015.
ftp://ftp.cs.wisc.edu/paradyn/papers/Welton15MrScan.pdf
Abstract:
  Density-based clustering algorithms are a widely-used class of data mining
  techniques that can find irregularly shaped clusters and cluster data
  without prior knowledge of the number of clusters the data contains. DBSCAN
  is the most well-known density-based clustering algorithm. We introduce our
  extension of DBSCAN, called Mr. Scan, which uses a hybrid/hybrid parallel
  implementation that combines the MRNet tree-based distribution network with
  GPU-equipped nodes. Mr. Scan avoids the problems encountered in other
  parallel versions of DBSCAN, such as scalability limits, reduction in output
  quality at large scales, and the inability to effectively process dense
  regions of data. Mr. Scan uses effective data partitioning and a new merging
  technique to allow data sets to be broken into independently processable
  partitions without the reduction in quality or large amount of node-to-node
  communication seen in other parallel versions of DBSCAN. The dense box
  algorithm designed as part of Mr. Scan allows for dense regions to be
  detected and clustered without the need to individually compare all points
  in these regions to one another. Mr. Scan was tested on both a geolocated
  Twitter dataset and image data obtained from the Sloan Digital Sky Survey.
  In testing Mr. Scan we performed end-to-end benchmarks measuring complete
  application run time from reading raw unordered input point data from the
  file system to writing the final clustered output to the file system. The
  use of end-to-end benchmarking gives a clear picture of the performance that
  can be expected from Mr. Scan in real world use cases. At its largest scale,
  Mr. Scan clustered 6.5 billion points from the Twitter dataset on 8,192 GPU
  nodes on Cray Titan in 7.5 minutes
------------------------------------------------------------------------------
[← Prev in Thread] Current Thread [Next in Thread→]
  • [DynInst_API:] Two new Paradyn project tech reports available, Barton Miller <=