This library contains various utils to parse GitHub repositories into function definition and docstring pairs. It is based on tree-sitter to parse code into ASTs and apply heuristics to parse metadata in more details. Currently, it supports 6 languages: Python, Java, Go, Php, Ruby, and Javascript. It also parses function calls and links them with their definitions for Python.
pip install function-parser
In order to use the library you must download and build the language grammars for tree-sitter
to parser source code with. Included in the library is a handy CLI tool for setting this up.
To download and build grammars: build_grammars
This command will download and build the grammars in the same location this python library was installed on your computer after pip installing.
import function_parser
import os
import pandas as pd
from function_parser.language_data import LANGUAGE_METADATA
from function_parser.process import DataProcessor
from tree_sitter import Language
language = "python"
DataProcessor.PARSER.set_language(
Language(os.path.join(function_parser.__path__[0], "tree-sitter-languages.so"), language)
)
processor = DataProcessor(
language=language, language_parser=LANGUAGE_METADATA[language]["language_parser"]
)
dependee = "keras-team/keras"
definitions = processor.process_dee(dependee, ext=LANGUAGE_METADATA[language]["ext"])
pd.DataFrame(definitions).head()