LangSec Workshop

at IEEE Security & Privacy, Thursday May 25, 2017



Opening Keynote: Perry Metzger, "Slow But Steady: Achieving Real Security Within Two Decades"

It is impossible to fix security for everyone in all cases. Nothing can be made foolproof, because fools are too ingenious.

However, currently, even users who "do everything right" are remarkably unsafe, because important software fails to be secure even when properly configured.

I present the case that, surprisingly, enough is finally known to dramatically improve this problem.

I will argue that, if appropriate and (entirely affordable) methods are adopted by the authors of important systems, slow incremental improvements could at last fix the infrastructure problem, and possibly even before computer security issues destroy our civilization.

I will discuss the necessary techniques, and present a plan for their adoption.



Stefan Lucks, Norina Grosch and Joshua Koenig, "Taming the Length Field in Binary Data: Calc-Regular Languages"

When binary data are sent over a byte stream, the binary format sender and receiver are using is a "data serialization language", either explicitly specified, or implied by the implementations. Security is at risk when sender and receiver disagree on details of this language. If, e.g., the receiver fails to reject invalid messages, an adversary may assemble such invalid messages to compromise the receiver's security.

Many data serialization languages are length-prefix languages. When sending/storing some F of flexible size, F is encoded at the binary level as a pair (|F|,F), with |F| representing the length of F (typically in bytes).

This paper's main contributions and results are as follows.

(1) Length-prefix languages are not context-free. This might seem to justify the conjecture that parsing those languages is difficult and not efficient.

(2) The class of "calc-regular languages" is proposed, a minimalistic extension of regular languages with the additional property of handling length-fields. Calc-regular languages can be specified via "calc-regular expressions", a natural extension of regular expressions.

(3) Calc-regular languages are almost as easy to parse as regular languages, using finite-state machines with additional accumulators. This disproves the conjecture from (1).

paper video

Pierre Chifflier and Geoffroy Couprie, "Writing parsers like it is 2017"

Despite being known for a long time, memory violations are still a very important cause of security problems in low-level programming languages containing data parsers. We address this problem by proposing a pragmatic solution to fix not only bugs, but class of bugs. First, using a fast and safe language such as Rust, and then using a parser combinator. We discuss the advantages and difficulties of this solution, and we present two cases of how to implement safe parsers and insert them in large C projects. The implementation is provided as a set of parsers and projects in the Rust language.

paper video

Tobias Bieschke, Lars Hermerschmidt, Bernhard Rumpe and Peter Stanchev, "Eliminating Input-Based Attacks by Deriving Automated Encoders and Decoders from Context-free Grammars"

Software systems nowadays communicate via a number of complex languages. This is often the cause of security vulnerabilities like arbitrary code execution, or injections. Whereby injections such as cross-site scripting are widely known from textual languages such as HTML and JSON that constantly gain more popularity. These systems use parsers to read input and unparsers write output, where these security vulnerabilities arise. Therefore correct parsing and unparsing of messages is of the utmost importance when developing secure and reliable systems. Part of the challenge developers face is to correctly encode data during unparsing and decode it during parsing.

This paper presents McHammerCoder, an (un)parser and encoding generator supporting textual and binary languages. Those (un)parsers automatically apply the generated encoding, that is derived from the language's grammar. Therefore manually defining and applying encoding is not required to effectively prevent injections when using McHammerCoder. By specifying the communication language within a grammar, McHammerCoder provides developers with correct input and output handling code for their custom language.

paper video

Yi Lu, Sora Bae, Paddy Krishnan, and Raghavendra K.R., "Inference of Security-Sensitive Entities in Libraries"

Programming languages such as Java and C# execute code with different levels of trust in the same process, and rely on an access control model with fine-grained permissions to protect program code. Permissions are checked programmatically, and rely on programmer discipline. This can lead to subtle errors. To enable automatic security analysis about unauthorised access or information flow, it is necessary to reason about security-sensitive entities in libraries that must be protected by appropriate sanitisation/declassification via permission checks. Unfortunately, security-sensitive entities are not clearly identified.

In this paper, we investigate security-sensitive entities used in Java-like languages, and develop a static program analysis technique to identify them in large code-bases by analysing the patterns of permission checks. Although the technique is generic, our focus is on Java where checkPermission calls are used to guard potential security-sensitive entities. Our inference analysis uses two parameters called proximity and coverage to reduce false-positive and false-negative reports. The usefulness of the analysis is illustrated by the results obtained while checking the OpenJDK7-b147 for conformance to Java Secure Coding Guidelines that relate to the confidentiality and integrity requirements


Research Reports

Kit S Tse and Peter C Johnson, "A Framework for Validating Session Protocols"

Communication protocols are complex; their implementations are difficult, causing many unintended (and severe) vulnerabilities in protocol parsing. While the problem of packet parsing is solved, session parsing remains challenging. Building on existing systems that reliably parse individual messages, we present our four-component framework for implementing protocol session parsers with the goal to improve security of protocol parsing: specification of a protocol message, description of a protocol state machine, testing routines to validate implementations against fake and real data, and graph generation to visualize implementation. This framework enables the creation of a session parser, which validates individual protocol messages in the context of other messages in the same conversation. This is helpful because more secure parsers lead to more secure communication.


Prashant Anantharaman, Michael Locasto, Gabriela F. Ciocarlie and Ulf Lindqvist, "Building Hardened Internet-of-Things Clients with Language-theoretic Security"

Unprincipled input handling has caused many of the most prevalent and severe vulnerabilities in the Internet era, and this trend appears to continue in the emerging Internet of Things (IoT). In this paper, we present a methodology to build secure input-handling functionality for application-layer IoT protocols by applying the Language-theoretic Security (LangSec) philosophy. We have built working implementations for the XMPP and MQTT protocols and demonstrated that our clients, which consist of less than a hundred lines of code, correctly recognize all valid messages in our tests. With respect to CPU time, our clients compare well against the most widely deployed implementations of these two protocols.

paper video

Jacob Torrey, Mark Bridgman and Tomasz Tuzel, "Hardware-Enforcement of Walther-Recursive Program Functions"

Preliminary experiment design and research goals are presented to measure the security implications of using just-in-time (JIT) compilation in conjunction with hardware-enforced restrictions on the computational complexity of general purpose applications. LLVM passes are used during compilation to mark sub-components of an application if it is provably halting (Walter recursive). A modified JIT engine is under development to unroll loops based on run-time semantics, while Intel's Processor Tracing technology enforces those run-time bounds. This paper describes an on-going effort with the goals of improving security through the enforcement of restricted computational environments; an analysis of the impact on the attacker when return-oriented programming is limited to unidirectional execution highlights the impact of the Ristretto system.

paper video

Andrei Costin, "Lua code: security overview and practical approaches to static analysis"

Lua is an interpreted, cross-platform, embeddable, performant and low-footprint language. Lua's popularity is on the rise in the last couple of years. Simple design and efficient usage of resources combined with its performance make it attractive for production web applications even to big organizations such as Wikipedia, CloudFlare and GitHub. In addition to this, Lua is one of the preferred choices for programming embedded and IoT devices. This context suggests a large and growing Lua codebase yet to be assessed. This growing Lua codebase could be potentially driving production servers and extremely large number of devices, some perhaps with mission-critical function for example in automotive or home-automation domains.

However, there is a substantial and obvious lack of static analysis tools and vulnerable code corpora for Lua as compared to other increasingly popular languages, such as PHP, Python and JavaScript. Even the state-of-the-art commercial tools that support dozens of languages and technologies actually do not support Lua static code analysis.

In this paper we present the first public Static Analysis for Security Testing (SAST) tool for Lua code that is currently focused on web vulnerabilities. We show its potential with good and promising preliminary results that we obtained on simple and intentionally vulnerable Lua code samples that we synthesized for our experiments. We also present and release our synthesized corpus of intentionally vulnerable Lua code, as well as the testing setups used in our experiments in form of virtual and completely reproducible environments. We hope our work can spark additional and renewed interest in this apparently overlooked area of language security and static analysis, as well as motivate community’s contribution to these open-source projects. The tool, the samples, and the testing VM setups will be released and updated at and

paper video

Aniqua Z. Baset, Tamara Denning, "IDE Plugins for Detecting Input-Validation Vulnerabilities"

Many vulnerabilities in products and systems could be avoided if better secure coding practices were in place. There exist a number of Integrated Development Environment (IDE) plugins that help developers check for security flaws while they code. In this work, we present a review of these plugins. We specifically focus on the plugins that detect input validation-related vulnerabilities. We list salient features such as their supported IDEs, applicable languages and specific types of vulnerability checks. We believe this work synthesizes information useful for future research on IDE plugins for detecting input validation-related vulnerabilities.

paper video

Invited Talks & Industry Case Studies

Lee Pike, "Programming Languages for High-Assurance Vehicles" (Galois, Inc.)

We report on our experiences in synthesizing a fully-featured autopilot from embedded domain-specific languages (EDSLs) hosted in Haskell. The autopilot was built for the DARPA High-Assurance Cyber-Military Systems (HACMS) program. After an independent red-team assessment, one government official remarked that it was likely "the most secure UAV in the world". We will also describe our experiences in transitioning the EDSLs to Boeing for use in large-scale platforms. The EDSLs and autopilot are both open-source.


Richo Healey, Dominic Spill, "Embedded Nom: a case study of memory safe parsing in resource constrained environments" (Great Scott Gadgets)

Modern embedded devices have a near unilateral need to parse complex structured data in order to operate. Historically, the parsers for these formats have been defect prone, difficult to work on, and often inefficient. We present a working example of parsers constructed using a modified version of the Nom parser combinator framework operating on an embedded platform, parsing complex input formats and safely and efficiently converting this into structured data.

Further, we demonstrate that these parsers can safely be integrated with an existing C codebase, clearing the way for incremental migrations to memory safe languages in environments where greenfield rewrites are economically infeasible.


Alex Bazhaniuk, "How the BIOS assures its handling of inputs" (Intel)

In this presentation I will explain different types of input to firmware interfaces, and different approaches for BIOS analysis in the perspective of formal validation as well as finding vulnerabilities. I will discuss examples of static analysis, dynamic analysis, and symbolic execution analysis of BIOS components.


LangSec Roundtable

2016: The Year of LangSec Bugs?

Every year since 2014 most bugs that topped LangSec headlines were "LangSec" bugs, findable and preventable using the LangSec approach. Did 2016 have the worst LangSec bugs yet?