SecCloud

Cloud programming is one of the major factors that shape how large-scale distributed applications will be structured and executed on the future Internet. This paradigm shift in the development of Internet-based applications is readily apparent by the server-side changes that it entails; unsurprisingly, there is numerous corresponding research work. The changes to the client-side programming and usage of such applications, however, is also fundamental and profound but have been the subject of much less attention.

Client-side applications are notably characterized by their highly dynamic nature that makes extensive use of dynamically-loaded code. Furthermore, many of these (browser-based or stand-alone) applications are implemented using JavaScript or other scripting language whose semantic foundations and pragmatic but effective usage are not yet well explored or even defined—this is even more surprising because Javascript is by now the fifth to tenth most frequently used general-purpose programming language on the major sites that offer statistics on the use of languages.

The present research project, SecCloud, will provide a comprehensive language-based approach to the definition, analysis and implementation of secure applications developed using Javascript and similar languages. Our high level objectives is to enhance the security of devices (PCs, smartphones, etc.) on which Javascript applications can be downloaded, hence on client-side security in the context of the Cloud. We will achieve this by focusing on three related issues:

1. Formal definition of the semantics of JavaScript;

2. Static and dynamic analysis of web scripting programming languages;

3. Preventive Information Flow Control through a Mechanism of Split Addresses.

1. Formal definition of the semantics of JavaScript

To establish the correctness of tools manipulating JavaScript programs, we first need a mechanized semantics of JavaScript, i.e., a description of the evaluation rules of the language expressed in a formal language such as that of the Coq proof assistant. While prior work has identified several interesting subsets of JavaScript, none have been related to the full-blown semantics of JavaScript through mechanized proofs. We plan to build on prior work to identify a useful yet simple subset of JavaScript, and give a mechanized semantics for it. We will then develop a proof that the semantics of the sublanguage is adequate with respect to that of the full-blown language.

This diagram presents how the JSCert JavaScript formal specification is compared to the actual semantics. This specification can be used to build any JavaScript related work in Coq.

This work is composed into two parts: JSCert, which has been written in an eye-verifiable correspondence to be closed to the EcmaScript 5 specification; and JSRef, a Coq interpreter which has been proved correct with respect to JSCert.

JSRef can be extracted into OCaml to be run against any JavaScript test suite. Its parser is however not yet formalised. The infrastructure of this project has been made such that every line of JSRef corresponds to a JSCert rule, which also corresponds to a line in the EcmaScript 5 specification: we can use any code coverage tool like Bisect in the OCaml interpreter and link it directly to the EcmaScript English specification. This work has thus also been able to confront the specification against the test suites, showing which are the tested and untested parts of it.

More details can be found in the paper “A Trusted Mechanised JavaScript Specification” (see the publications).

We now aim to build certified tools from JSCert. As JSCert itself is already huge, we do not plan to only use it to prove our analyser, but also to build it. To this end, we are adapting abstract interpretation to pretty-big-step semantics, which is the basis of JSCert. The final goal is to automatically get a correct-by-construction abstract interpreter from some manually defined lattices and abstractions.

For now, we are working on toy examples, but we hope to soon be able to adapt existing JavaScript abstractions to the full JSCert semantics, thus getting an abstract JavaScript interpreter fully proven in Coq.

2. Static and dynamic analysis of web scripting programming languages

Scripting languages intensively use runtime transformations, like method addition in objects or string evaluation. Thus, the static analysis of this kind of languages can benefit considerably from a combination of static and dynamic analysis techniques to guarentee security properties of scripts.Part of this challenge consists in the integration of of static and dynamic verification into a formal semantic framework, that can help deciding which trade-off is best. The implementation of dynamic analyses is a second issue that has to be resolved. Dynamic analysis can be done either by external monitoring or by weaving of security aspects into JavaScript code.

In connection with integrating static and dynamic analysis, a crucial question concerns the flow of information between the applications and the system. System-level monitoring is currently the only existing technique for detecting a violation of the security policy based on several file read and write operations and the experience of the CIDre team has shown the viability of monitoring of applications at system level in classical operating systems. The main challenge will be to combine static analysis and dynamic analysis at the application and system level in the setting of novel, web-oriented execution platforms, taking into account the results obtained in the third activity on multi-level information flow monitoring. These results notably concern how static and dynamic security verification can be architectured from applications via the browser down to the OS level.

3. Preventive Information Flow Control through a Mechanism of Split Addresses

The current state of the Internet and web technologies makes it extremely appealing to the vast majority of the computerized world to depend on purely on-line services. The advent of HTML5 has triggered an array of approaches increasing the feature set of web applications. Some of these novel technologies such as CORS, cross-document messaging, WebRTC or web sockets, allow for communication on web pages on levels that were not feasible earlier. Others such as IndexedDB and local storage increase client side storage capabilities. This has led to an increasing need to improve the security of the web-browsers. These techniques stress on the need to monitor these new possibilities of information leakages since the web pages are converging to the functionalities traditionally reserved for desktop applications.

Over the years, there have been a lot of improvements including the key conceptualization of same-origin policy. Despite all these current safeguards, constant threats to web-services prevail, thereby, increasing the desire for security enhancements to help in the progression towards a better Internet. Some of the most prominent vulnerabilities such as cross-site scripting (XSS) and cross-site request forgery (CSRF) feature in the OWASP Top Ten. These attacks can be characterized by malicious information flows and hence information flow control (IFC) is an effective methodology to counter them.

Our approach focuses on typical web-browser which generally consists of the JavaScript engine, Storage mechanisms, DOM engine and extensions. The JavaScript engine communicates with all the other components of the browser as shown above. We add a policy manager to the default mechanism and modify the JS engine to reflect the same. The content from a website is downloaded following a browser request. This contains headers, JavaScript code and rendering information. We transmit the policy specification as part of the headers. These policies are used by the policy manager to influence the execution by the JS Engine.

We believe that a greater amount of IFC could be achieved by directly modifying the interpreter. Hence, we focus on modifying the core JavaScript engine of the browser which implies using the full-set of JavaScript implementation including the modern HTML5 functionalities such as web sockets and cross origin resource sharing. We aim to develop an effective and practical security model based on IFC and access control mechanisms. In this model we propose to split the address spaces managed by the Chromium (V8) engine into public and protected spaces based on the policy. The various effects of such a mechanisms with respect to effective IFC are being analysed as well.

Our model for preventive enforcement is called the “Address Split Design”. The idea is to have two address spaces for each sensitive variable. The private address space contains the secret values and the public address space contains a dummy value. The private address is provided to the functions with the necessary privileges. A small illustration of our model is provided below.

We perceive that the loss of functionality must be minimized despite the need for stronger security mechanisms. Our model is in line with this philosophy and the results of our analysis will follow.

Keep following us on this page and kindly check our publications & results for more information.

Presentation