[madPL] CS703 Course presentations


Date: Tue, 01 Dec 2020 17:10:47 -0500
From: "Loris D'Antoni" <loris@xxxxxxxxxxx>
Subject: [madPL] CS703 Course presentations
Students in myÂ703Âclass are giving their projectÂpresentationsÂthis (W,F) and next week (M,W), 11-12.15. Feel free to pass by.

You can watch the talks (abstracts below) at this linkÂhttps://us.bbcollab.com/guest/6dc1eee7ae3745b4a3090856212892de

12/2/2020


Detecting Insecure Implementation of Vulnerable Code Snippets found on Stackoverflow (Islam)

Online programming discussion platforms such as Stack Overflow have a rich source of ready to use code snippets for software developers. It is the de-facto place where developers go to find solutions from the coding snippets given as answers to posted problems by the online developer community. However, previous research work has shown that developers have a tendency to directly copy-paste insecure code snippets from Stack Overflow into their production level code. As a result without any countermeasures Stack Overflow is becoming one of major sources of vulnerability of production level code. To address this problem, in this project, we tackle the problem of analyzing code snippets found on Stack overflow. This is challenging since code snippets from Stack overflow are often erroneous, incomplete, and lack dependencies which makes it harder to analyze them by state-of-the-art static tools. Our goal is to build a static analysis tool that can identify common insecure patterns found on a dataset containing a collection of 1.6K code snippets.ÂÂÂ


Towards reparation of eBPF programs (Ji, Chang)

eBPF programs facilitate the ease of observation and modification of Linux kernel functionality. Since eBPF programs can access and alter kernel behaviors, they have to be proved safe to run. Currently, it is the in-kernel verifier that performs such safety checks. However, the violations that the verifier reports are known to contain false positives that are hard to work around. Therefore, we aim to 1) detect the aforementioned violations in eBPF programs by symbolic analysis on the eBPF source code and 2) automatically propose reparations to these violations. In this project, we build a proof-of-concept of 1) and find that 2) is achievable.



Synthesizing Human Interpretable Playing Strategies for Sudoku (Baklund, Zhao)

In contrast to the way that computers use to solve puzzles (exponential search, backtracking), humans employ a rich structure of domain-specific knowledge and strategies that are not explicitly stated in the game's rules. However, current human-interpretable strategies are handcrafted heuristics that lack automation. This project bridges the gap by implementing a framework that not only can automatically generate strategies for Sudoku, but for any logic puzzles as well.

This project extends the previous work on synthesizing strategies for Nonogram by focusing on a more complex problem domain, namely, Sudoku. The Sudoku board has 9 * 9 grids, with each grid can contain number 1-9, whereas Nonogram may have a smaller board and each grid can only contain 2 possible inputs (filled vs not filled). Moreover, the previous work only considers generating strategies for a single row. However, most logic puzzles have strategies about a row and a column. Sudoku in addition have rules and strategies about a 3-by-3 subgrid. Our goal is to use an SMT-Solver as the mechanism to generate programs in a Domain Specific Language which can be interpreted by humans who can in turn employ these strategies when solving the puzzle.






12/4/2020


Synthesize SQL queries from natural language (Ye, Zhong, Chen)

SQL synthesis from natural language (NL2SQL) has the potential to bridge humans without data science expertise to a vast amount of data. Natural language-based synthesis largely reduces the effort of its users but can be challenging for the synthesizer to understand user intent and thus compose valid queries.Â

SQLizer is a synthesis tool based on the natural language, but its source code is not publicly available. In this project, we reimplement SQLizer and apply some optimized heuristic methods in model searching. More specifically, we use a semantics parser to transform natural language into a SQL sketch and use type-directed synthesis to complete the sketch. We then evaluate the confidence of this sketch completion; if the confidence is below the threshold, we then apply a repairment mechanism to modify the sketch and repeat the sketch completion process until we finally get a complete and functional SQL query.



Learning User Preferences for Reading Companion Robots (White)

Currently, many social robots are deployed with predetermined code that may offer adjustment capabilities to the different tasks it performs, but falls short of adapting to the user's they are working with. This model, especially in social robotics, gives rise to discontinued system usage, as user's realize the system may not be capable of interacting the way the user envisioned. By allowing robots to adapt to the end user, roboticists expect to see more sustained system usage as well as an increase in positive perception of the system. As an initial introduction to this issue, reactive synthesis is used to adjust a simplistic interaction model of a robotic reading companion. Through repeated interaction and user feedback, the robot adjusts the type of support commentary and the amount of questions it gives in order to accommodate what the reader prefers.


From Natural Language to SQL Queries (Mohr)

There is often a disconnect in the business world between wanting to access data in a database and being able to access it. Thatâs where SQLizer, talked about by Yaghmazadeh et al. in SQLizer: Query Synthesis from Natural Language, comes in. Their tool is able to convert a natural language statement into a SQL query using various techniques, such as semantic parsing and automated program repair. It performs very well compared to previous methods for natural language to SQL. However, the authors have not released their implementation. The goal of this project is to try to reimplement their technique and to further test it.



Programmatically Interpretable Reinforcement Learning in a Robotics Domain

Abstract (Skare)

In 2019 Verma et al published a paper titled, Programmatically Interpretable Reinforcement Learning, where they presented a program synthesis technique for solving reinforcement learning (RL) problems. A small functional language is introduced. Using deep reinforcement learning (DRL), solutions to a RL problem in the form of a program are generated. In the paper, this method of solving RL problems was applied to several test problems, including driving a virtual car through a track. This research investigates a real-world robotics application of this RL method. Learning from real data can often have different challenges than learning in a simulated environment. To test the efficacy of PIRL in a real-world robotics context, an experiment was devised. The experiment is based on an early DRL paper published in 2003 from researchers at Oita University, Japan. In their experiment, they built a small robot with a CCD monochrome camera and four infrared distance sensors providing inputs to an actor-critic DRL model. The robot operated in a 70cm square area, and itâs goal was to push a 7cm by 3 cm box.



12/7/2020


Planning to Plan: Faster Enumerative Search in Constrained Program Spaces (Stegner, Welsh)

When a robot is performing a task in an environment such as manufacturing, any number of deviations from the expected environment can occur. Traditionally, when the robot encounters such an unexpected situation, it must re-plan the entire task. This constant re-planning is costly and often seems unnecessary to a human observer. Therefore, this work seeks to address the issue of constant re-planning by creating a more efficient planning approach. In this approach, we represent the state space as a directed hypergraph. By performing an enumerative program search ahead of time, we prune search branches that will never yield optimal programs. Then, at run-time, we perform an enumerative search over the now reduced program space to find the correct program. This allows the robot to act on arbitrary scenarios more quickly than when using an uninformed enumerative search. We evaluated this system and found that it offers more rapid planning but has non-trivial up front computation time.


SQL Query Synthesis from Natural Language: A Domain-Specific Application to Assist Content Creators Designing Speech Content for a Reading Companion Robot (Cagiltay)

When designing the speech content of a reading companion robot, content creators may need to access information stored in a relational database with many tables. Learning to write complex queries to retrieve information can be cumbersome for these users, especially if they are not expected to have the domain knowledge for SQL. The motivation behind this proposed tool is to assist content creators to easily query a relational database without having to learn SQL. As a solution, this tool aims to demonstrate a proof-of-concept for sketch-based SQL query synthesis from natural language. With minimal training, the target audience can use the tool to extract SQL queries based on their intent and use them to view the corresponding tables from the database. The current version of the tool is limited to non-nested select and join queries and requires semi-structured natural language inputs from its users. This tool can be extended in future work to support a wider range of SQL functions by implementing an improved semantic parser and program repair capabilities. Furthermore, although the current tool is limited to a domain specific application, it can be extended to other domains with similar database schemas.


How to Travel between Languages (Grover)

We consider the problem of automatically traveling from one regular language to another through the synthesis of a transducer. In particular, we are interested in the case of synthesizing streaming transducers, where input strings are âeditedâ as the input characters are read. This has potential applications in the repair of programs that can be represented as transducers, for e.g., string sanitizers. The paper âHow to Travel between Languagesâ by Chatterjee et al. presents an algorithm for the partial-repair problem. In this setting, we are allowed a specified number of edits on an input string, and we want to synthesize a transducer that can edit the maximum number of input strings into some output language. We present the first implementation, as far as we know, of this algorithm and try to use it on some repair examples. We find that the algorithm has limited applicability because it cannot repair strings of unbounded length, and this is not satisfactory for certain repair examples.


Macro Synthesis by Example in Common Lisp (Johnson)

Program synthesis has seen recent success in learning data transformations by example, starting with FlashFill and ranging through applications such as data set cleanup and automated XML transformations. However, these applications focus on program data; instead, here we look at transformations to the program text through the lens of Common Lisp macros. We focus on syntactic sugar macros, which are purely syntactic transformations without side effects.

The Prose SDK from Microsoft is used to synthesize a macro-expander function from a DSL of Common Lisp operations. We show that a traditional recursive list grammar requires minor structural changes in order to scale and remain performant. These changes are sufficient to generate expanders for many common syntactic sugar macro patterns, generally with two or fewer given examples. The overall usability of this application is also evaluated, showing that language environment integration is straightforward and generally transparent to the programmer. We conclude that macro synthesis by example can thus be a feasible addition to the Common Lisp programmerâs toolkit.

ÎGuS: Dependent(Î)-Type-(Gu)ided (S)ynthesis (Samuelson)
Dependent type theories are extremely general and powerful logical frameworks. They subsume higher-order logic and constructive set theory, allowing them to be used as a language for general theorem proving. For this reason being able to synthesize programs in these frameworks is very important, as this framework would subsume all other synthesis methods in terms of behavioral and structural constraints, and can lead to improved proof-search techniques. However because of their generality, synthesizing terms in dependent type theories is semi-decidable in the general case, and impractical even for many relatively simple constraints. For this reason most general synthesis frameworks for dependent type theories either focus on solving a severely restricted subset of problems, or employ a heuristic learning-based method.


In this project, we introduce a hybrid type-directed and enumerative general synthesis framework for a dependent type theory. Our framework focuses on finding only proofs that have a simple structure without arbitrarily limiting user constraints, and doing so in a deterministic and explainable manner. Specifically, we propose a formal synthesis framework for a multi-goal "sketching-like" synthesis problem, implement this framework in Haskell, and make comparisons to existing general synthesis and proof-search methods. Finally, we argue how this framework can benefit both interactive and automated theorem provers by allowing for human proof authors or learning-based methods to propose conjectures and generalizations, rather than focusing on minor and tedious details of proofs.



12/9/2020


Snoot: Sniffing out Program Repairs as Feedback in Automated Grading Systems (Nelson)

The need for a timely response is critical to the continuous feedback loop within active learning. Automated grading systems help address this need by giving students quick feedback on their programming assignment submissions. As students write and submit their programs, they receive feedback from the automated grading system so that they can improve their next submission.

As students converge to a solution, it becomes easier to synthesize repairs that will result in a passing solution. Rather than offering these repairs to students as a gimme, Snoot provides feedback based on these repairs. Operating over a general-purpose programming language, Colang, Snoot tests the boundary cases of a student submission to sniff out program repairs and offer appropriate feedback. From this, students will be able to make use of the feedback before making a final submission.


Application of Biochemical Abstract Machine in System Biology (Li)

There are many newly developed tools to facilitate model checking process in genetic and biochemistry studies. The Biochemical Abstract Machine (BIOCHAM) was firstly introduced as an environment to model and analyze biochemical networks using model checking (1). BIOCHAM provide a precise semantics to biomolecular interaction maps. Based on this formal semantics, the BIOCHAM system offers automated reasoning tools for querying the temporal properties of the system under all its possible behaviors. Besides, BIOCHAM allows us to verify that whenever an interaction or a molecule is added to the network, the expressed temporal logic is conserved. However, there are still a handful of successful cases of using BIOCHAM in System Biology.Â

In our previous study, we have successfully identified several macromolecules (proteins and RNA) and confirmed their interactions in the cholesterol metabolism pathway, but it is hard to build a precise model due to the complexity of molecular interactions. Therefore, in this study, we try to implement and apply BIOCHAM to build a biological network based on our experimental identified protein-protein interactions and CTL model checking.


Instrumentation of BTRFS crash recovery by model checking (Chai)

Unintended power losses happen from time to time, and when the OS comes back online, it is critical that the file systems can recover from a potentially inconsistent state left off when I/O operations were interrupted by power loss. Various techniques have been employed in the past, including but not limited to journaling and check pointing, but the complex nature of filesystem makes them hard for developers to spot mistakes. In this study we would like to utilize existing tools \textsc{eXplode}, a file system instrumentation framework, to verify crash guarantees of the BTRFS file system driver and examine how well the file system recovers itself trying to recover inodes and data block from the crash scene.


Synthesizing the Mechanical Structure of Articulated 3D Models (Corning)

Constructive solid geometry (CSG) is a technique for representing 3D models by combining geometric primitives through the boolean operations of union, intersection and subtraction. As a formal grammar, CSG provides fertile ground for applying PL methods to spatial problems. This project builds on the prior work of InverseCSG, a synthesis pipeline that can produce close CSG approximations of standard mesh-based 3D models. We aim to augment the grammar of CSG with relations that describe the mechanical properties of objects in a 3D scene: in particular, to distinguish between rigid components that are attached to each other by rotational joints, as well as those that are physically independent of one another.


This project will extend InverseCSG to include these new relations. In order to recognize this behavior, the pipeline will consume multiple 3D "frames" representing the same scene at different time points. It will then synthesize a common CSG representation of the scene, parametrized by different positional and rotational offsets for rigid components across the various frames. By minimizing the independence of the rigid components, we should be able to learn the structure of complex articulated models, including humanoid figures, from sparse data. In addition to presenting the work, we will also discuss some of its potential applications, such as using photogrammetry scans to capture the mechanical structure of real-world objects

[← Prev in Thread] Current Thread [Next in Thread→]
  • [madPL] CS703 Course presentations, Loris D'Antoni <=