API Misuse

Ensuring that library APIs are correctly used

When developers use Application Programming Interfaces (APIs), they often make mistakes that can lead to bugs, system crashes, or security vulnerabilities. We refer to such mistakes as misuses. One example of a misuse is forgetting to call close() after opening a FileInputStream and writing to it.

We study various types of API misuse.

API Misuse of Data-centric Python Libraries

Example of a data-centric misuse
Data-centric Python libraries, such as pandas, matplotlib etc., often deal with diverse data structures, intricate processing workflows, and a multitude of parameters, which can make them inherently more challenging to use correctly. Detecting problems in the usage of these libraries is challenging, not only due to the dynamic nature of Python but due to the fact that some misuses depend on the data that is being processed. In this line of work, we investigate how API misuse manifests in these data-centric libraries and how we can design successful detection strategies to help developers use them correctly.

General Java API Misuse

MuBench
MuDetect

We created MUBench, a benchmark of existing Java API misuses against which we can evaluate several misuse-detectors. We systematically compared existing Java API-misuse detectors and identified weaknesses. This allowed us to design a new API misuse detector, MuDetect, that can achieve higher recall and precision. MuDetect allows us to mine API usage rules that involve method calls and preconditions. These usage rules are then used to find misuses in target projects. MuDetect uses a graph representation called an API Usage Graph (AUG) to represent different aspects of a method call such as the parameters that are required by a method, the types of those parameters, the order in which different method calls are invoked, the exceptions thrown by different method calls, objects that are returned by different method calls.

Annotation Misuse in Java

Rule Validation Tool

While MuDetect focuses on method calls, there are other categories of APIs misuses as well, such as misuses that involve annotations. We built a human-in-the-loop approach that focuses on producing accurate Java annotation usage rules. For the ease of usability, these usage rules are packaged into a Maven plugin that can be used to catch bugs (similar to SpotBugs). Our tool is a complete pipeline that provides an easy way to mine and validate usage rules, and generate a misuse detector from confirmed rules.

Java Cryptography Misuse

Through analyzing StackOverflow posts, GitHub repositories, and conducting two surveys of a total of 48 application developers, we collect the problems developers face with the current cryptography APIs and their suggestions for improvement. Some of our findings included that developers have problems choosing the correct algorithm to use and also want higher level abstractions such as tasks. To address these issues, we looked closer at the cryptography domain, and realized that there is a wide variety of cryptographic components and algorithms (e.g., ciphers, digests, signatures, etc.) and that each of these components comes with its own variability. For example, a cipher can be symmetric or asymmetric. If it is symmetric, it can operate on blocks or streams. Additionally, there are different modes of operations (e.g., ECB vs CBC) as well as different padding schemes. In order to deal with this huge variability space, we model cryptographic components using concepts from feature modeling. However, such components have many attributes. Additionally, some cryptography solutions may use multiple components at the same time. We, therefore, need additional modeling notations than those offered by basic feature modeling.

CogniCrypt was built on the insights derived from these studies.

(missing reference)

Related Resources

Related Publications