Skip to content

1. Overview

CodableLLM is built to be easily extendible, allowing users to add support for new programming languages and custom decompilers. While the goal is to provide as many languages and decompilers with built-in support as possible, it may not always be feasible or immediately available for every use case.

In these cases, extending CodableLLM allows you to integrate your own logic for extracting source functions or decompiling binaries. This flexibility ensures that you can adapt the framework for proprietary languages, specialized binary formats, or experimental tooling.

Why Would You Want to Extend CodableLLM?

  • Add support for new programming languages that are not yet supported by the framework
  • Use custom decompilers for unique binary formats or research projects
  • Implement proprietary or domain-specific extractors for closed-source environments or legacy systems
  • Experiment with specialized heuristics or intermediate representations tailored to your dataset generation needs

How to Extend CodableLLM

Extending CodableLLM is straightforward — all you need to do is create subclasses of:

Class Description
Extractor Defines how to parse and extract functions from source code for a new langauge.
Decompiler Defines how to decompile binaries and return decompiled functions.

Both of these classes expose clear interfaces that make it simple to plugin your logic and start generating datasets.

In the follow subsections, we'll dive into how to create custom extractors and implement your own decompilers.