import ast
ast.parse("foo(x, y)")<ast.Module at 0x1063c3950>
I’m on the lookout for full-time Data Scientist (or similar) positions. If you have any leads or just want to chat, ! Here’s my resume if you’re interested in checking it out. Thanks a bunch! 😊
Arumoy Shome
March 23, 2024
ast modulePython has a built-in ast module which provides detailed documentation on the various nodes used to represent different elements of Python source code.
We can use the ast.parse method to create a AST from a given Python source code. Here is an example of how a function call is represented in a AST.
The ast.parse method returns a ast.Module object which by itself is not very helpful. To view the internal structure of the tree, we can use the ast.dump method.
"Module(body=[Expr(value=Call(func=Name(id='foo', ctx=Load()), args=[Name(id='x', ctx=Load()), Name(id='y', ctx=Load())]))])"
We can pass the indent argument to ast.dump along with a print statement to make the output more readable.
Module(
body=[
Expr(
value=Call(
func=Name(id='foo', ctx=Load()),
args=[
Name(id='x', ctx=Load()),
Name(id='y', ctx=Load())]))])
Note that the function call is represented by a ast.Call node which contains a func and args attribute. The function name (in our case, foo) is represented by a ast.Name node, with the actual name under the id attribute.
And here is the AST when we want to use a function defined in a different module.
Module(
body=[
Expr(
value=Call(
func=Attribute(
value=Name(id='bar', ctx=Load()),
attr='foo',
ctx=Load()),
args=[
Name(id='x', ctx=Load()),
Name(id='y', ctx=Load())]))])
Things are a bit different now. We see that the Call.func is no longer a ast.Name node, but instead an ast.Attribute node. Attribute.value is now a ast.Name node with the name of the module (in our case bar) on the id attribute and the name of the function on the attr attribute.
Lets examine something a bit more complicated: What if the function is in a submodule?
Module(
body=[
Expr(
value=Call(
func=Attribute(
value=Attribute(
value=Name(id='baz', ctx=Load()),
attr='bar',
ctx=Load()),
attr='foo',
ctx=Load()),
args=[
Name(id='x', ctx=Load()),
Name(id='y', ctx=Load())]))])
And even more nested?
Module(
body=[
Expr(
value=Call(
func=Attribute(
value=Attribute(
value=Attribute(
value=Name(id='quack', ctx=Load()),
attr='baz',
ctx=Load()),
attr='bar',
ctx=Load()),
attr='foo',
ctx=Load()),
args=[
Name(id='x', ctx=Load()),
Name(id='y', ctx=Load())]))])
It seems that nested function calls are represented using nested ast.Attribute nodes. The top level module name is always a ast.Name node under the deepest ast.Attribute.value node. And the function name is always under the first ast.Attribute.attr node
Lets start with the simplest case, where we are only interested in the function names (ie. we only want to extract foo from all scenarios presented above). There are two cases to consider here:
Call.func node will contain a Name node.Call.func will contain nested Attribute nodes. The name of the function will be under the first Attribute.attr.We can do this using the ast.NodeVisitor class. Lets create a FunctionNameCollector class which inherits from ast.NodeVisitor. In the class, we define a visit_Name and visit_Attribute methods which are called every time we visit a Name or Attribute method respectively (more on this later).
class FunctionNameCollector(ast.NodeVisitor):
def __init__(self):
self.names = []
def visit_Name(self, node: ast.Name) -> None:
self.names.append(node.id)
def visit_Attribute(self, node: ast.Attribute) -> None:
self.names.append(node.attr)
tests = ["foo(x, y)", "bar.foo(x, y)", "baz.bar.foo(x, y)", "quack.baz.bar.foo(x, y)"]
trees = [ast.parse(test) for test in tests]
call_nodes = [
node for tree in trees for node in ast.walk(tree) if isinstance(node, ast.Call)
]
collector = FunctionNameCollector()
for node in call_nodes:
collector.visit(node.func)
collector.names['foo', 'foo', 'foo', 'foo']
I collect all the ast.Call nodes in our test cases using the ast.walk function which returns a generator that yields every child node under the given AST. Then I call the visit method provided by ast.NodeVisitor which visits only the direct child nodes of all Call.func nodes in our test cases.
Here is where things get a bit more interesting. Here are the cases to consider:
Name.id under Call.func (same as before).Attribute.attr andAttribute.value.id.So the visit_Name method remains the same however, we do need to modify the visit_Attribute method such that it traverses all child nodes under Attribute.value until we hit the base case. Here is the modified code.
class NameCollector(ast.NodeVisitor):
def __init__(self):
self.names = []
self.stack = []
def visit_Name(self, node: ast.Name) -> None:
if self.stack:
self.names.append((node.id, self.stack[0].attr))
else:
self.names.append((None, node.id))
def visit_Attribute(self, node: ast.Attribute) -> None:
self.stack.append(node)
self.visit(node.value)
collector = NameCollector()
for node in call_nodes:
collector.visit(node.func)
collector.names[(None, 'foo'), ('bar', 'foo'), ('baz', 'foo'), ('quack', 'foo')]
The code is similar to FunctionNameCollector defined above, with a few key changes. The collector.names now returns a list of tuples containing the module, function names.
I use a stack to keep track of the Attribute nodes we visit. Whenever we visit an Attribute node, I append it to the stack and then call the NodeVisitor.visit method on the Node under its value attribute.
When we visit a Name node, it can either be because we are at the base case (a direct function call) or because we have reached the last Attribute.value node. If the stack is not empty, then its the latter which means the Name node is the module name and the function name is under the first Attribute.attr in the stack. Otherwise, its a direct function call so the Name node is the function name. Since we don’t have a module in this case, we return None in the tuple.