Bytecode API

Constants

__version__

Module version string (ex: '0.1').

UNSET

Singleton used to mark the lack of value. It is different than None.

Functions

dump_bytecode(bytecode, *, lineno=False)

Dump a bytecode to the standard output. ConcreteBytecode, Bytecode and ControlFlowGraph are accepted for bytecode.

If lineno is true, show also line numbers and instruction index/offset.

This function is written for debug purpose.

Instruction classes

Instr

class Instr(name: str, arg=UNSET, *, lineno: int=None)

Abstract instruction.

The type of the arg parameter (and the arg attribute) depends on the operation:

  • If the operation has a jump argument (has_jump(), ex: JUMP_ABSOLUTE): arg must be a Label (if the instruction is used in Bytecode) or a BasicBlock (used in ControlFlowGraph).
  • If the operation has a cell or free argument (ex: LOAD_DEREF): arg must be a CellVar or FreeVar instance.
  • If the operation has a local variable (ex: LOAD_FAST): arg must be a variable name, type str.
  • If the operation has a constant argument (LOAD_CONST): arg must not be a Label or BasicBlock instance.
  • If the operation has a compare argument (COMPARE_OP): arg must be a Compare enum.
  • If the operation has no argument (ex: DUP_TOP), arg must not be set.
  • Otherwise (the operation has an argument, ex: CALL_FUNCTION), arg must be an integer (int) in the range 0..2,147,483,647.

To replace the operation name and the argument, the set() method must be used instead of modifying the name attribute and then the arg attribute. Otherwise, an exception is be raised if the previous operation requires an argument and the new operation has no argument (or the opposite).

Attributes:

arg

Argument value.

It can be UNSET if the instruction has no argument.

lineno

Line number (int >= 1), or None.

name

Operation name (str). Setting the name updates the opcode attibute.

opcode

Operation code (int). Setting the operation code updates the name attribute.

stack_effect

Operation effect on the stack size as computed by dis.stack_effect().

Changed in version 0.3: The op attribute was renamed to opcode.

Methods:

copy()

Create a copy of the instruction.

is_final() → bool

Is the operation a final operation?

Final operations:

  • RETURN_VALUE
  • RAISE_VARARGS
  • BREAK_LOOP
  • CONTINUE_LOOP
  • unconditional jumps: is_uncond_jump()
has_jump() → bool

Does the operation have a jump argument?

More general than is_cond_jump() and is_uncond_jump(), it includes other operations. Examples:

  • FOR_ITER
  • SETUP_EXCEPT
  • CONTINUE_LOOP
is_cond_jump() → bool

Is the operation an conditional jump?

Conditional jumps:

  • JUMP_IF_FALSE_OR_POP
  • JUMP_IF_TRUE_OR_POP
  • POP_JUMP_IF_FALSE
  • POP_JUMP_IF_TRUE
is_uncond_jump() → bool

Is the operation an unconditional jump?

Unconditional jumps:

  • JUMP_FORWARD
  • JUMP_ABSOLUTE
set(name: str, arg=UNSET)

Modify the instruction in-place: replace name and arg attributes, and update the opcode attribute.

Changed in version 0.3: The lineno parameter has been removed.

ConcreteInstr

class ConcreteInstr(name: str, arg=UNSET, *, lineno: int=None)

Concrete instruction Inherit from Instr.

If the operation requires an argument, arg must be an integer (int) in the range 0..2,147,483,647. Otherwise, arg must not by set.

Concrete instructions should only be used in ConcreteBytecode.

Attributes:

arg

Argument value: an integer (int) in the range 0..2,147,483,647, or UNSET. Setting the argument value can change the instruction size (size).

size

Read-only size of the instruction in bytes (int): between 1 byte (no agument) and 6 bytes (extended argument).

Static method:

static disassemble(code: bytes, offset: int) → ConcreteInstr

Create a concrete instruction from a bytecode string.

Methods:

get_jump_target(instr_offset: int) → int or None

Get the absolute target offset of a jump. Return None if the instruction is not a jump.

The instr_offset parameter is the offset of the instruction. It is required by relative jumps.

assemble() → bytes

Assemble the instruction to a bytecode string.

Compare

class Compare

Enum for the argument of the COMPARE_OP instruction.

Equality test:

  • Compare.EQ (2): x == y
  • Compare.NE (3): x != y
  • Compare.IS (8): x is y
  • Compare.IS_NOT (9): x is not y

Inequality test:

  • Compare.LT (0): x < y
  • Compare.LE (1): x <= y
  • Compare.GT (4): x > y
  • Compare.GE (5): x >= y

Other tests:

  • Compare.IN (6): x in y
  • Compare.NOT_IN (7): x not in y
  • Compare.EXC_MATCH (10): used to compare exceptions in except: blocks

Label

class Label

Pseudo-instruction used as targets of jump instructions.

Label targets are “resolved” by Bytecode.to_concrete_bytecode.

Labels must only be used in Bytecode.

SetLineno

class SetLineno(lineno: int)

Pseudo-instruction to set the line number of following instructions.

lineno must be greater or equal than 1.

lineno

Line number (int), read-only attribute.

Bytecode classes

BaseBytecode

class BaseBytecode

Base class of bytecode classes.

Attributes:

argcount

Argument count (int), default: 0.

cellvars

Names of the cell variables (list of str), default: empty list.

docstring

Documentation string aka “docstring” (str), None, or UNSET. Default: UNSET.

If set, it is used by ConcreteBytecode.to_code() as the first constant of the created Python code object.

filename

Code filename (str), default: '<string>'.

first_lineno

First line number (int), default: 1.

flags

Flags (int).

freevars

List of free variable names (list of str), default: empty list.

kwonlyargcount

Keyword-only argument count (int), default: 0.

name

Code name (str), default: '<module>'.

Changed in version 0.3: Attribute kw_only_argcount renamed to kwonlyargcount.

Bytecode

class Bytecode

Abstract bytecode: list of abstract instructions (Instr). Inherit from BaseBytecode and list.

A bytecode must only contain objects of the 4 following types:

It is possible to use concrete instructions (ConcreteInstr), but abstract instructions are preferred.

Attributes:

argnames

List of the argument names (list of str), default: empty list.

Static methods:

static from_code(code) → Bytecode

Create an abstract bytecode from a Python code object.

Methods:

to_concrete_bytecode() → ConcreteBytecode

Convert to concrete bytecode with concrete instructions.

Resolve jump targets: replace abstract labels (Label) with concrete instruction offsets (relative or absolute, depending on the jump operation).

to_code() → types.CodeType

Convert to a Python code object.

It is based on to_concrete_bytecode() and so resolve jump targets.

compute_stacksize() → int

Compute the stacksize needed to execute the code. Will raise an exception if the bytecode is invalid.

This computation requires to build the control flow graph associated with the code.

ConcreteBytecode

class ConcreteBytecode

List of concrete instructions (ConcreteInstr). Inherit from BaseBytecode.

A concrete bytecode must only contain objects of the 2 following types:

Label and Instr must not be used in concrete bytecode.

Attributes:

consts

List of constants (list), default: empty list.

names

List of names (list of str), default: empty list.

varnames

List of variable names (list of str), default: empty list.

Static methods:

static from_code(code, *, extended_arg=false) → ConcreteBytecode

Create a concrete bytecode from a Python code object.

If extended_arg is true, create EXTENDED_ARG instructions. Otherwise, concrete instruction use extended argument (size of 6 bytes rather than 3 bytes).

Methods:

to_code() → types.CodeType

Convert to a Python code object.

On Python older than 3.6, raise an exception on negative line number delta.

to_bytecode() → Bytecode

Convert to abstract bytecode with abstract instructions.

compute_stacksize() → int

Compute the stacksize needed to execute the code. Will raise an exception if the bytecode is invalid.

This computation requires to build the control flow graph associated with the code.

BasicBlock

class BasicBlock

Basic block. Inherit from list.

A basic block is a straight-line code sequence of abstract instructions (Instr) with no branches in except to the entry and no branches out except at the exit.

A block must only contain objects of the 3 following types:

It is possible to use concrete instructions (ConcreteInstr) in blocks, but abstract instructions (Instr) are preferred.

Only the last instruction can have a jump argument, and the jump argument must be a basic block (BasicBlock).

Labels (Label) must not be used in blocks.

Attributes:

next_block

Next basic block (BasicBlock), or None.

Methods:

get_jump()

Get the target block (BasicBlock) of the jump if the basic block ends with an instruction with a jump argument. Otherwise, return None.

ControlFlowGraph

class ControlFlowGraph

Control flow graph (CFG): list of basic blocks (BasicBlock). A basic block is a straight-line code sequence of abstract instructions (Instr) with no branches in except to the entry and no branches out except at the exit. Inherit from BaseBytecode.

Labels (Label) must not be used in blocks.

This class is not designed to emit code, but to analyze and modify existing code. Use Bytecode to emit code.

Attributes:

argnames

List of the argument names (list of str), default: empty list.

Methods:

static from_bytecode(bytecode: Bytecode) → ControlFlowGraph

Convert a Bytecode object to a ControlFlowGraph object: convert labels to blocks.

Splits blocks after final instructions (Instr.is_final()) and after conditional jumps (Instr.is_cond_jump()).

add_block(instructions=None) → BasicBlock

Add a new basic block. Return the newly created basic block.

get_block_index(block: BasicBlock) → int

Get the index of a block in the bytecode.

Raise a ValueError if the block is not part of the bytecode.

New in version 0.3.

split_block(block: BasicBlock, index: int) → BasicBlock

Split a block into two blocks at the specific instruction. Return the newly created block, or block if index equals 0.

to_bytecode() → Bytecode

Convert to a bytecode object using labels.

compute_stacksize() → int

Compute the stack size required by a bytecode object. Will raise an exception if the bytecode is invalid.

Cell and Free Variables

CellVar

class CellVar(name: str)

Cell variable used for instruction argument by operations taking a cell or free variable name.

Attributes:

name

Name of the cell variable (str).

FreeVar

class FreeVar(name: str)

Free variable used for instruction argument by operations taking a cell or free variable name.

Attributes:

name

Name of the free variable (str).

Line Numbers

The line number can set directly on an instruction using the lineno parameter of the constructor. Otherwise, the line number if inherited from the previous instruction, starting at first_lineno of the bytecode.

SetLineno pseudo-instruction can be used to set the line number of following instructions.