Bytecode API

Constants

__version__

Module version string (ex: '0.1').

UNSET

Singleton used to mark the lack of value. It is different than None.

Functions

format_bytecode(bytecode, \*, lineno: bool = False) str:

Format a bytecode to a str representation. ConcreteBytecode, Bytecode and ControlFlowGraph are accepted for bytecode.

If lineno is true, show also line numbers and instruction index/offset.

This function is written for debug purpose.

dump_bytecode(bytecode, \*, lineno=False)

Dump a bytecode to the standard output. ConcreteBytecode, Bytecode and ControlFlowGraph are accepted for bytecode.

If lineno is true, show also line numbers and instruction index/offset.

This function is written for debug purpose.

Instruction classes

Instr

class Instr(name: str, arg=UNSET, \*, lineno: Union[int, None, UNSET] = UNSET, location: Optional[InstrLocation] = None)

Abstract instruction.

The type of the arg parameter (and the arg attribute) depends on the operation:

  • If the operation has a jump argument (has_jump(), ex: JUMP_ABSOLUTE): arg must be a Label (if the instruction is used in Bytecode) or a BasicBlock (used in ControlFlowGraph).

  • If the operation has a cell or free argument (ex: LOAD_DEREF): arg must be a CellVar or FreeVar instance.

  • If the operation has a local variable (ex: LOAD_FAST): arg must be a variable name, type str.

  • If the operation has a constant argument (LOAD_CONST): arg must not be a Label or BasicBlock instance.

  • If the operation has a compare argument (COMPARE_OP): arg must be a Compare enum.

  • If the operation has no argument (ex: DUP_TOP), arg must not be set.

  • Otherwise (the operation has an argument, ex: CALL_FUNCTION), arg must be an integer (int) in the range 0..2,147,483,647.

To replace the operation name and the argument, the set() method must be used instead of modifying the name attribute and then the arg attribute. Otherwise, an exception is raised if the previous operation requires an argument and the new operation has no argument (or the opposite).

Attributes:

arg

Argument value.

It can be UNSET if the instruction has no argument.

lineno

Line number (int >= 1), or None.

name

Operation name (str). Setting the name updates the opcode attribute.

opcode

Operation code (int). Setting the operation code updates the name attribute.

Changed in version 0.3: The op attribute was renamed to opcode.

location

Detailed location (InstrLocation)

Methods:

require_arg() bool

Does the instruction require an argument?

copy()

Create a copy of the instruction.

is_final() bool

Is the operation a final operation?

Final operations:

  • RETURN_VALUE

  • RAISE_VARARGS

  • BREAK_LOOP

  • CONTINUE_LOOP

  • unconditional jumps: is_uncond_jump()

has_jump() bool

Does the operation have a jump argument?

More general than is_cond_jump() and is_uncond_jump(), it includes other operations. Examples:

  • FOR_ITER

  • SETUP_EXCEPT

  • CONTINUE_LOOP

is_cond_jump() bool

Is the operation a conditional jump?

Conditional jumps:

  • JUMP_IF_FALSE_OR_POP

  • JUMP_IF_TRUE_OR_POP

  • JUMP_FORWARD_IF_FALSE_OR_POP

  • JUMP_BACKWARD_IF_FALSE_OR_POP

  • JUMP_FORWARD_IF_TRUE_OR_POP

  • JUMP_BACKWARD_IF_TRUE_OR_POP

  • POP_JUMP_IF_FALSE

  • POP_JUMP_IF_TRUE

is_uncond_jump() bool

Is the operation an unconditional jump?

Unconditional jumps:

  • JUMP_FORWARD

  • JUMP_ABSOLUTE

  • JUMP_BACKWARD

  • JUMP_BACKWARD_NO_INTERRUPT

is_abs_jump() bool

Is the operation an absolute jump?

is_forward_rel_jump() bool

Is the operation a forward relative jump?

is_backward_rel_jump() bool

Is the operation a backward relative jump?

set(name: str, arg=UNSET)

Modify the instruction in-place: replace name and arg attributes, and update the opcode attribute.

Changed in version 0.3: The lineno parameter has been removed.

stack_effect(jump: bool = None) int

Operation effect on the stack size as computed by dis.stack_effect().

The jump argument takes one of three values. None (the default) requests the largest stack effect. This works fine with most instructions. True returns the stack effect for taken branches. False returns the stack effect for non-taken branches.

Changed in version 0.8: stack_effect was changed from a property to a method in order to add the keyword argument jump.

pre_and_post_stack_effect(jump: bool | None = None) Tuple[int, int]

Effect of the instruction on the stack before and after its execution.

The impact on the stack before the instruction reflects how many values from the stacks are used/popped. The impact on the stack after the instruction execution reflects how many values are pushed back on the stack. Those are deduced from dis.stack_effect() and manual analysis.

The jump argument has the same meaning as in Instr.stack_effect().

New in version 0.12.

ConcreteInstr

class ConcreteInstr(name: str, arg=UNSET, \*, lineno: int=None)

Concrete instruction Inherit from Instr.

If the operation requires an argument, arg must be an integer (int) in the range 0..2,147,483,647. Otherwise, arg must not by set.

Concrete instructions should only be used in ConcreteBytecode.

Attributes:

arg

Argument value: an integer (int) in the range 0..2,147,483,647, or UNSET. Setting the argument value can change the instruction size (size).

size

Read-only size of the instruction in bytes (int): between 1 byte (no argument) and 6 bytes (extended argument).

Static method:

static disassemble(code: bytes, offset: int) ConcreteInstr

Create a concrete instruction from a bytecode string.

Methods:

get_jump_target(instr_offset: int) int or None

Get the absolute target offset of a jump. Return None if the instruction is not a jump.

The instr_offset parameter is the offset of the instruction. It is required by relative jumps.

Note

Starting with Python 3.10, this quantity is expressed in term of instruction offset rather than byte offset, and is hence twice smaller than in 3.9 for identical code.

assemble() bytes

Assemble the instruction to a bytecode string.

use_cache_opcodes() int

Number of cache opcodes that should follow the instruction.

Compare

class Compare

Enum for the argument of the COMPARE_OP instruction.

Equality test:

  • Compare.EQ (2): x == y

  • Compare.NE (3): x != y

  • Compare.IS (8): x is y removed in Python 3.9+

  • Compare.IS_NOT (9): x is not y removed in Python 3.9+

Inequality test:

  • Compare.LT (0): x < y

  • Compare.LE (1): x <= y

  • Compare.GT (4): x > y

  • Compare.GE (5): x >= y

Other tests:

  • Compare.IN (6): x in y removed in Python 3.9+

  • Compare.NOT_IN (7): x not in y removed in Python 3.9+

  • Compare.EXC_MATCH (10): used to compare exceptions in except: blocks. Removed in Python 3.9+

Binary operation

class BinaryOp

Enum for the argument of the BINARY_OP instruction (3.11+).

Arithmetic operations

BinaryOp.ADD (0): x + y BinaryOp.SUBTRACT (10): x - y BinaryOp.MULTIPLY (5): x * y BinaryOp.TRUE_DIVIDE (11): x / y BinaryOp.FLOOR_DIVIDE (2): x // y BinaryOp.REMAINDER (6): x % y BinaryOp.MATRIX_MULTIPLY (4): x @ y BinaryOp.POWER (8): x ** y

Logical and binary operations

BinaryOp.LSHIFT (3): x << y BinaryOp.RSHIFT (9): x >> y BinaryOp.AND (1): x & y BinaryOp.OR (7): x | y BinaryOp.XOR (12): x ^ y

Inplace operations:

BinaryOp.INPLACE_ADD (13): x += y BinaryOp.INPLACE_SUBTRACT (23): x -= y BinaryOp.INPLACE_MULTIPLY (18): x *= y BinaryOp.INPLACE_TRUE_DIVIDE (24): x /= y BinaryOp.INPLACE_FLOOR_DIVIDE (15): x //= y BinaryOp.INPLACE_REMAINDER (19): x %= y BinaryOp.INPLACE_MATRIX_MULTIPLY (17): x @= y BinaryOp.INPLACE_POWER (21): x **= y BinaryOp.INPLACE_LSHIFT (16): x <<= y BinaryOp.INPLACE_RSHIFT (22): x >>= y BinaryOp.INPLACE_AND (14): x &= y BinaryOp.INPLACE_OR (20): x |= y BinaryOp.INPLACE_XOR (25): x ^= y

Intrinsic operations

class Intrinsic1Op

Enum for the argument of the CALL_INTRINSIC_1 instruction (3.12+).

INTRINSIC_1_INVALID INTRINSIC_PRINT INTRINSIC_IMPORT_STAR INTRINSIC_STOPITERATION_ERROR INTRINSIC_ASYNC_GEN_WRAP INTRINSIC_UNARY_POSITIVE INTRINSIC_LIST_TO_TUPLE INTRINSIC_TYPEVAR INTRINSIC_PARAMSPEC INTRINSIC_TYPEVARTUPLE INTRINSIC_SUBSCRIPT_GENERIC INTRINSIC_TYPEALIAS

class Intrinsic2Op

Enum for the argument of the CALL_INTRINSIC_2 instruction (3.12+).

INTRINSIC_2_INVALID INTRINSIC_PREP_RERAISE_STAR INTRINSIC_TYPEVAR_WITH_BOUND INTRINSIC_TYPEVAR_WITH_CONSTRAINTS INTRINSIC_SET_FUNCTION_TYPE_PARAMS

CellVar and FreeVar

The following classes are used to represent the argument of opcode listed in opcode.hasfree which includes:

  • MAKE_CELL

  • LOAD_CLOSURE

  • LOAD_DEREF

  • STORE_DEREF

  • DELETE_DEREF

  • LOAD_CLASSDEREF

  • LOAD_FROM_DICT_OR_DEREF

class CellVar

Argument of an opcode referring to a variable held in a cell.

Cell variables cannot always be inferred only from the instructions (__class__ used by super() is implicit) and as a consequence cellvars are explicitly listed on all bytecode objects.

Attributes:

name

Name of the cell variable (str).

class FreeVar

Argument of opcode referring to a free variable.

Free variables cannot always be inferred only from the instructions (__class__ used by super() is implicit) and as a consequence freevars are explicitly listed on all bytecode objects.

Attributes:

name

Name of the free variable (str).

Label

class Label

Pseudo-instruction used as targets of jump instructions.

Label targets are “resolved” by Bytecode.to_concrete_bytecode.

Labels must only be used in Bytecode.

SetLineno

class SetLineno(lineno: int)

Pseudo-instruction to set the line number of following instructions.

lineno must be greater or equal than 1.

lineno

Line number (int), read-only attribute.

InstrLocation

class InstrLocation(lineno: int | None, end_lineno: int | None, col_offset: int | None, end_col_offset: int | None)

Detailed location for an instruction.

lineno

Line number on which the instruction starts.

end_lineno

Line number on which the instruction ends.

col_offset

Column offset within the start line at which the instruction starts.

end_col_offset

Column offset within the end line at which the instruction starts.

classmethod from_positions(cls, position: dis.Positions) InstrLocation

Build an InstrLocation from a dis.Position object.

TryBegin

class TryBegin(target: Label | BasicBlock, push_lasti: bool, stack_depth: int | UNSET = UNSET)

Pseudo instruction marking the beginning of an exception table entry.

TryBegin can never be nested.

Used in Python 3.11+ in Bytecode and BasicBlock.

target

Target Label or BasicBlock to which to jump to if an exception occurs on an instruction sitting between this TryBegin and the matching TryEnd.

push_lasti

Is the instruction offset at which an exception occurred pushed on the stack before the exception itself when handling an exception.

stack_depth

Stack depth that will be restored by the interpreter by popping from the stack when handling an exception, before pushing the exception possibly preceded by the instruction offset depending on TryBegin.push_lasti.

copy() TryBegin

Create a copy of the TryBegin.

TryEnd

class TryEnd(entry: TryBegin)

Pseudo instruction marking the end of an exception table entry.

Note

In a BasicBlock, one may find a TryEnd instance after a final instruction. This results from the exception enclosing the final instruction. Since TryEnd is only a pseudo-instruction this does not violate the guarantee made by a BasicBlock which only applies to instructions.

Note

A jump may cause to exit an exception table entry. If the jump is unconditional the instruction is final and the above applies. For conditional jumps, within a ControlFlowGraph, we insert a TryEnd at the beginning of the target block to explicitly signal that we left the exception table entry region. As a consequence, multiple TryExit corresponding to a single TryBegin can exist. TryEnd corresponding to exiting an exception table entry through a conditional jump always appear before the first instruction of the target block. However, care needs to be taken since that block may be reached through a different path in which no TryBegin was encountered. In such cases, the TryEnd should be ignored.

Bytecode classes

BaseBytecode

class BaseBytecode

Base class of bytecode classes.

Attributes:

argcount

Argument count (int), default: 0.

cellvars

Names of the cell variables (list of str), default: empty list.

docstring

Documentation string aka “docstring” (str), None, or UNSET. Default: UNSET.

If set, it is used by ConcreteBytecode.to_code() as the first constant of the created Python code object.

filename

Code filename (str), default: '<string>'.

first_lineno

First line number (int), default: 1.

flags

Flags (int).

freevars

List of free variable names (list of str), default: empty list.

posonlyargcount

Positional-only argument count (int), default: 0.

New in Python 3.8

kwonlyargcount

Keyword-only argument count (int), default: 0.

name

Code name (str), default: '<module>'.

qualname

Qualified code name (str).

New in Python 3.11

Changed in version 0.3: Attribute kw_only_argcount renamed to kwonlyargcount.

Bytecode

class Bytecode

Abstract bytecode: list of abstract instructions (Instr). Inherit from BaseBytecode and list.

A bytecode must only contain objects of the 4 following types:

Changed in version 0.14.0: It is not possible anymore to use concrete instructions (ConcreteInstr) in Bytecode.

Attributes:

argnames

List of the argument names (list of str), default: empty list.

Static methods:

static from_code(code) Bytecode

Create an abstract bytecode from a Python code object.

Methods:

legalize()

Check the validity of all the instruction and remove the SetLineno instances after updating the instructions.

to_concrete_bytecode(compute_jumps_passes: int = None, compute_exception_stack_depths: bool = True) ConcreteBytecode

Convert to concrete bytecode with concrete instructions.

Resolve jump targets: replace abstract labels (Label) with concrete instruction offsets (relative or absolute, depending on the jump operation). It will also add EXTENDED_ARG prefixes to jump instructions to ensure that the target instructions can be reached.

If compute_jumps_passes is not None, it sets the upper limit for the number of passes that can be made to generate EXTENDED_ARG prefixes for jump instructions. If None then an internal default is used. The number of passes is, in theory, limited only by the number of input instructions, however a much smaller default is used because the algorithm converges quickly on most code. For example, running CPython 3.6.5 unittests on OS X 11.13 results in 264996 compiled methods, only one of which requires 5 passes, and none requiring more.

If compute_exception_stack_depths is True, the stack depth for each exception table entry will be computed (which requires to convert the the bytecode to a ControlFlowGraph)

to_code(compute_jumps_passes: int = None, stacksize: int = None, *, check_pre_and_post: bool = True, compute_exception_stack_depths: bool = True) types.CodeType

Convert to a Python code object.

It is based on to_concrete_bytecode() and so resolve jump targets.

compute_jumps_passes: see to_concrete_bytecode()

stacksize: see ConcreteBytecode.to_code()

check_pre_and_post: see ConcreteBytecode.to_code()

compute_exception_stack_depths: see to_concrete_bytecode()

compute_stacksize(*, check_pre_and_post: bool = True) int

Compute the stacksize needed to execute the code. Will raise an exception if the bytecode is invalid.

This computation requires to build the control flow graph associated with the code.

check_pre_and_post Allows caller to disable checking for stack underflow

update_flags(is_async: bool = None) None

Update the object flags by calling :py:func:infer_flags on itself.

ConcreteBytecode

class ExceptionTableEntry

Entry for a given line in the exception table.

All offsets are expressed in instructions not in bytes.

Attributes:

start_offset

Offset (int) in instruction between the beginning of the bytecode and the beginning of this entry.

stop_offset

Offset (int) in instruction between the beginning of the bytecode and the end of this entry. This offset is inclusive meaning that the instruction it points to is included in the try/except handling.

target

Offset (int) in instruction to the first instruction of the exception handling block.

stack_depth

Minimal stack depth (int) in the block delineated by start and stop offset of the exception table entry. Used to restore the stack (by popping items) when entering the exception handling block.

push_lasti

bool indicating if the offset, at which an exception was raised, should be pushed on the stack before the exception itself (which is pushed as a single value).

class ConcreteBytecode

List of concrete instructions (ConcreteInstr). Inherit from BaseBytecode.

A concrete bytecode must only contain objects of the 2 following types:

Label, TryBegin, TryEnd and Instr must not be used in concrete bytecode.

Attributes:

consts

List of constants (list), default: empty list.

names

List of names (list of str), default: empty list.

varnames

List of variable names (list of str), default: empty list.

exception_table

List of ExceptionTableEntry describing portion of the bytecode in which exceptions are caught and where there are handled. Used only in Python 3.11+

Static methods:

static from_code(code, \*, extended_arg=false) ConcreteBytecode

Create a concrete bytecode from a Python code object.

If extended_arg is true, create EXTENDED_ARG instructions. Otherwise, concrete instruction use extended argument (size of 6 bytes rather than 3 bytes).

Methods:

legalize()

Check the validity of all the instruction and remove the SetLineno instances after updating the instructions.

to_code(stacksize: int = None, *, check_pre_and_post: bool = True, compute_exception_stack_depths: bool = True) types.CodeType

Convert to a Python code object.

stacksize Allows caller to explicitly specify a stacksize. If not specified a ControlFlowGraph is created internally in order to call ControlFlowGraph.compute_stacksize(). It’s cheaper to pass a value if the value is known.

check_pre_and_post Allows caller to disable checking for stack underflow

If compute_exception_stack_depths is True, the stack depth for each exception table entry will be computed (which requires to convert the the bytecode to a ControlFlowGraph)

to_bytecode() Bytecode

Convert to abstract bytecode with abstract instructions.

compute_stacksize(*, check_pre_and_post: bool = True) int

Compute the stacksize needed to execute the code. Will raise an exception if the bytecode is invalid.

This computation requires to build the control flow graph associated with the code.

check_pre_and_post Allows caller to disable checking for stack underflow

update_flags(is_async: bool = None)

Update the object flags by calling :py:func:infer_flags on itself.

BasicBlock

class BasicBlock

Basic block. Inherit from list.

A basic block is a straight-line code sequence of abstract instructions (Instr) with no branches in except to the entry and no branches out except at the exit.

A block must only contain objects of the 4 following types:

Changed in version 0.14.0: It is not possible anymore to use concrete instructions (ConcreteInstr) in BasicBlock.

Only the last instruction can have a jump argument, and the jump argument must be a basic block (BasicBlock).

Labels (Label) must not be used in blocks.

Attributes:

next_block

Next basic block (BasicBlock), or None.

Methods:

legalize(first_lineno: int) None

Check the validity of all the instruction and remove the SetLineno instances after updating the instructions. first_lineno specifies the line number to use for instruction without a set line number encountered before the first SetLineno instance.

get_jump() --> BasicBlock | None

Get the target block (BasicBlock) of the jump if the basic block ends with an instruction with a jump argument. Otherwise, return None.

get_trailing_end(index: int) TryEnd | None

Get the first TryEnd found after the position index in the block if any.

ControlFlowGraph

class ControlFlowGraph

Control flow graph (CFG): list of basic blocks (BasicBlock). A basic block is a straight-line code sequence of abstract instructions (Instr) with no branches in except to the entry and no branches out except at the exit. Inherit from BaseBytecode.

Labels (Label) must not be used in blocks.

This class is not designed to emit code, but to analyze and modify existing code. Use Bytecode to emit code.

Attributes:

argnames

List of the argument names (list of str), default: empty list.

Methods:

static from_bytecode(bytecode: Bytecode) ControlFlowGraph

Convert a Bytecode object to a ControlFlowGraph object: convert labels to blocks.

Splits blocks after final instructions (Instr.is_final()) and after conditional jumps (Instr.is_cond_jump()).

legalize(first_lineno: int)

Legalize all the blocks of the CFG.

add_block(instructions=None) BasicBlock

Add a new basic block. Return the newly created basic block.

get_block_index(block: BasicBlock) int

Get the index of a block in the bytecode.

Raise a ValueError if the block is not part of the bytecode.

New in version 0.3.

split_block(block: BasicBlock, index: int) BasicBlock

Split a block into two blocks at the specific instruction. Return the newly created block, or block if index equals 0.

get_dead_blocks() List[BasicBlock]

Retrieve all the blocks of the CFG that are unreachable.

to_bytecode() Bytecode

Convert to a bytecode object using labels.

compute_stacksize(*, check_pre_and_post: bool = True, compute_exception_stack_depths: bool = True) int

Compute the stack size required by a bytecode object. Will raise an exception if the bytecode is invalid.

check_pre_and_post Allows caller to disable checking for stack underflow

compute_exception_stack_depths Allows caller to disable the computation of the stack depth required by exception table entries.

NOTE:

The computation will only consider block that can be reached from the entry block. In particular, stack size for TryBegin/TryEnd in dead blocks is not updated.

In some cases, stack usage may be slightly overestimated compared to CPython. This occurs when CPython duplicated the code for a finally clause but computed stack size before the duplication in which case one could infer a smaller stack usage for a TryBegin/TryEnd pair than can be done with the final bytecode form.

update_flags(is_async: bool = None)

Update the object flags by calling :py:func:infer_flags on itself.

to_code(stacksize: int = None, *, check_pre_and_post: bool = True, compute_exception_stack_depths: bool = True)

Convert to a Python code object. Refer to descriptions of Bytecode.to_code() and ConcreteBytecode.to_code().

check_pre_and_post Allows caller to disable checking for stack underflow

compute_exception_stack_depths Allows caller to disable the computation of the stack depth required by exception table entries.

Line Numbers

The line number can set directly on an instruction using the lineno parameter of the constructor. Otherwise, the line number if inherited from the previous instruction, starting at first_lineno of the bytecode.

SetLineno pseudo-instruction can be used to set the line number of following instructions.

Starting with Python 3.11, instructions now have a starting lineno, and end lineno along with a starting column offset and an end column offset. InstrLocation is used to store these new detailed information.

Compiler Flags

class CompilerFlags
OPTIMIZED

Set if a code object only uses fast locals

NEWLOCALS

Set if the code execution should be done with a new local scope

VARARGS

Set if a code object expects variable number of positional arguments

VARKEYWORDS

Set if a code object expects variable number of keyword arguments

NESTED

Set if a code object correspond to function defined in another function

GENERATOR

Set if a code object is a generator (contains yield instructions)

NOFREE

Set if a code object does not use free variables

COROUTINE

Set if a code object is a coroutine. New in Python 3.5

ITERABLE_COROUTINE

Set if a code object is an iterable coroutine. New in Python 3.5

ASYNC_GENERATOR

Set if a code object is an asynchronous generator. New in Python 3.6

FUTURE_GENERATOR_STOP

Set if a code object is defined in a context in which generator_stop has been imported from __future__

infer_flags(bytecode, async: bool = None) CompilerFlags

Infer the correct values for the compiler flags for a given bytecode based on the instructions. The flags that can be inferred are :

  • OPTIMIZED

  • GENERATOR

  • NOFREE

  • COROUTINE

  • ASYNC_GENERATOR

Force the code to be marked as asynchronous if True, prevent it from being marked as asynchronous if False and simply infer the best solution based on the opcode and the existing flag if None.