Monday, July 2, 2012

Disassembling Python One Line at a Time

Recently, I had a desire to inspect Python bytecodes during the course of execution. My (admittedly non-exhaustive) search yielded no good pointers, so I figured I could do this using some crazy combination of Python tracing, inspection, and disassembly. This turned out to be true, though I had to futz around with some of the supplied Python code.

If you take a look at Python's dis module, you'll find that it does lots of really neat things. However, if you just want to disassemble a single line of code (for example, the line about to be executed) and analyze than you're a little SOL. However, looking at the code, it's actually pretty simple. Below, I've got a modified version of disassemble that takes a line number as a parameter and it only prints the bytecodes associated with that line. Of course, this is heavily cribbed from the cpython source code, so nearly all of the credit rests with the hackers over there.

import inspect, dis, opcode

def foo(arg):
    result = arg + 1
    return result

def __find(l, func):
    for ix, v in enumerate(l):
        if func(v):
            return ix
    return -1

def disassemble_line(co, lineno):
    code = co.co_code
    labels = dis.findlabels(code)
    linestarts = list(dis.findlinestarts(co))
    line_offset_ix = __find(linestarts, lambda val : val[1] == lineno)
    line_offset = linestarts[line_offset_ix][0]
    n = len(code)

    if line_offset_ix + 1 < len(linestarts):
        next_offset = linestarts[line_offset_ix + 1][0]
    else:
        next_offset = n

    i = line_offset
    extended_arg = 0
    free = None
    while i < next_offset:
        c = code[i]
        op = ord(c)
        if i in labels: print '>>>',
        else: print '   ',
        print repr(i).rjust(4),
        print opcode.opname[op].ljust(20),
        
        i = i + 1
        if op >= opcode.HAVE_ARGUMENT:
            oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
            extended_arg = 0
            i = i + 2
            if op == opcode.EXTENDED_ARG:
                extended_arg = oparg*65536L
            print repr(oparg).rjust(5),
            if op in opcode.hasconst:
                print '(' + repr(co.co_consts[oparg]) + ')',
            elif op in opcode.hasname:
                print '(' + co.co_names[oparg] + ')',
            elif op in opcode.hasjrel:
                print '(to ' + repr(i + oparg) + ')',
            elif op in opcode.haslocal:
                print '(' + co.co_varnames[oparg] + ')',
            elif op in opcode.hascompare:
                print '(' + opcode.cmp_op[oparg] + ')',
            elif op in opcode.hasfree:
                if free is None:
                    free = co.co_cellvars + co.co_freevars
                print '(' + free[oparg] + ')',
        print

if __name__ == "__main__":
    disassemble_line(foo.func_code, 4)