extractor
Module containing functions for managing and registering source code extractors.
Source code extractors are responsible for parsing and extracting function definitions from different programming languages.
Transform = Callable[[SourceFunction], SourceFunction]
module-attribute
A callable object that transforms a source code function into another source code function.
ExtractConfig
dataclass
Configuration for extracting source code functions.
Source code in src/codablellm/core/extractor.py
accurate_progress = True
class-attribute
instance-attribute
Whether to accurately track progress by counting extractable files in advance. This may take longer to start but provides more accurate progress tracking.
checkpoint = 10
class-attribute
instance-attribute
The number of steps between saving checkpoints. Set to 0 to disable checkpoints.
exclude_subpaths = field(default_factory=set)
class-attribute
instance-attribute
A set of subpaths to exclude from extraction. If specified, these subpaths will be ignored.
exclusive_subpaths = field(default_factory=set)
class-attribute
instance-attribute
A set of subpaths to exclusively extract functions from. If specified, only these subpaths will be extracted.
extract_as_repo = True
class-attribute
instance-attribute
True if the path should be treated as a repository root for calculating relative function scopes.
extractor_args = field(default_factory=dict)
class-attribute
instance-attribute
Positional arguments to pass to the extractor's __init__ method. The keys are language
names. The values are sequences of arguments. For example, {'C': [arg1, arg2]}.
extractor_kwargs = field(default_factory=dict)
class-attribute
instance-attribute
Keyword arguments to pass to the extractor's __init__ method. The keys are language names.
The values are dictionaries of keyword arguments. For example, {'C': {'kwarg1': value1}}.
max_workers = None
class-attribute
instance-attribute
Maximum number of files to extract functions in parallel.
transform = None
class-attribute
instance-attribute
An optional transformation to apply to each source code function.
use_checkpoint = True
class-attribute
instance-attribute
True if a checkpoint file should be loaded and used to resume extraction.
Extractor
Bases: ABC
Abstract base class for source code extractors.
Extractors are responsible for parsing source code files and returning extracted function
definitions as SourceFunction instances.
Source code in src/codablellm/core/extractor.py
extract(file_path, repo_path=None)
abstractmethod
Extracts functions from the given source code file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
PathLike
|
The path to the source file. |
required |
repo_path
|
Optional[PathLike]
|
Optional repository root path to calculate relative function scopes. |
None
|
Returns:
| Type | Description |
|---|---|
Sequence[SourceFunction]
|
A sequence of |
Source code in src/codablellm/core/extractor.py
get_extractable_files(path)
abstractmethod
Retrieves all files that can be processed by the extractor from the given path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
PathLike
|
A file or directory path to search for extractable files. |
required |
Returns:
| Type | Description |
|---|---|
Set[Path]
|
A sequence of |
Source code in src/codablellm/core/extractor.py
create_extractor(language, *args, **kwargs)
Retrieves the registered extractor instance for the specified language.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
language
|
str
|
The name of the language for which to retrieve an extractor. |
required |
*args
|
Any
|
Positional arguments passed to the extractor's constructor. |
()
|
**kwargs
|
Any
|
Keyword arguments passed to the extractor's constructor. |
{}
|
Returns:
| Type | Description |
|---|---|
Extractor
|
An instance of the extractor class for the given language. |
Raises:
| Type | Description |
|---|---|
ExtractorNotFound
|
If no extractor is registered for the specified language. |
Source code in src/codablellm/core/extractor.py
extract_directory_task(path, config=ExtractConfig())
Extracts source functions from the given path using the specified configuration.
If as_callable_pool is True, returns a deferred callable extractor that can be executed later,
typically used for progress bar display or asynchronous processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
PathLike
|
The file or directory path from which to extract functions. |
required |
config
|
ExtractConfig
|
Extraction configuration options. |
ExtractConfig()
|
as_callable_pool
|
If |
required |
Returns:
| Type | Description |
|---|---|
List[SourceFunction]
|
Either a list of extracted |
Source code in src/codablellm/core/extractor.py
register(language, symbol, order=None)
Registers a new source code extractor for a given language.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
language
|
str
|
The name of the language (e.g., "C", "Python") to associate with the extractor. |
required |
class_path
|
The full import path to the extractor class. |
required | |
order
|
Optional[Literal['first', 'last']]
|
Optional order for insertion. If 'first', prepends the extractor; if 'last', appends it. |
None
|
Source code in src/codablellm/core/extractor.py
set_registered(extractors)
Replaces all existing source code extractors with a new set.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
extractors
|
Mapping[str, DynamicSymbol]
|
A mapping from language names to extractor class paths. |
required |