2. Creating a Custom Extractor
If you need to support a programming language not yet implemented by CodableLLM, you can easily extend the framework by creating your own custom extractor.
All custom extractors must subclass Extractor, an abstract base class that defined the interface for parsing source files and returning extracted functions.
Methods to Implement
When extending Extractor, you must implement two methods:
extract: Parses a given source code file and returns a sequence ofSourceFunctioninstances.get_extractable_files: Given a directory or file path, returns all files that the extractor can process. This typically involves filtering by file extensions.
Example: Creating a Custom Language Extractor
Below is a template for defining your own extractor. The implementation of each method will depend on how you choose to parse and extract functions for your language (e.g. using Tree-sitter, regex, or another parser).
Creating Your Custom Extractor Class
from pathlib import Path
from typing import Sequence
from codablellm.core.extractor import Extractor
from codablellm.core.function import SourceFunction
class CustomLanguageExtractor(Extractor):
'''
Extractor for CustomLanguage source code files.
'''
def extract(self, file_path: Path | str, repo_path: Path | str | None = None) -> Sequence[SourceFunction]:
# Implementation goes here: parse the file and return SourceFunction objects
pass
def get_extractable_files(self, path: Path | str) -> Sequence[Path]:
# Implementation goes here: return a list of extractable source files (e.g., based on file extensions)
pass
Registering Your Custom Extractor
Once your extractor is implemented, you must register it with CodableLLM's extractor: