Series: Writing an IronPython Debugger

18 posts

Writing an IronPython Debugger: Dynamic Stack Trace

Now that I can interact with my debugger, it’s time to add a command. I decided to start with something simple – or at least something I thought would be simple – printing a stack trace.

In the unmanaged debugger API, threads have the concept of both stack chains and stack frames. A stack chain represents a segment of the physical stack. In a typical managed app, you’ll have at least two stack chains: the unmanaged stack chain and the managed stack chain. You can interate through the stack chains for a given thread via the Chains property. However, ipydbg is a managed only debugger, so I can ignore the unmanaged stack chain. Instead, I just retrieve the current (managed) chain via the thread’s ActiveChain property.

Within a managed stack chain, there is a collection of stack frames. This is the call stack that managed developers are typically used to working with. It turns out that printing a raw stack trace is very easy to do. Here was my first stab at it:

elif k.Key == ConsoleKey.T:
  print "nManaged Stack Trace"
  for f in active_thread.ActiveChain.Frames:
    offset, sp = get_location(f)
    metadata_import = CorMetadataImport(f.Function.Module)
    method_info = metadata_import.GetMethodInfo(f.FunctionToken)
    print "  ",
      "%s::%s --" % (method_info.DeclaringType.Name, method_info.Name),
      sp if sp != None else "(offset %d)" % offset

This elif block is part of the input method I showed last time. It loops thru the frames in the Active Chain of the active thread and prints some data to the console. As I said, pretty easy. Of course, the devil is in the details.

First detail I should call out is that active_thread variable. As per Mike Stall, “there is no notion of “active thread” in the underlying debug APIs. It’s purely a construct in a debugger UI to make it easier for end-users.” My console based UI may be rudimentary, but it’s still a UI. Events like OnBreakpoint include the active thread as a event argument, so I stash that away in a variable so it’ll be available to the input loop.

Second detail is the call to get_location. When we last saw get_location, it was returning a formatted string. Since my last post, I’ve refactored the code so it returns the raw location data – a tuple of the raw IP offset and the associated sequence point, if available. I’ve also added a __str__ method to my sequence point object, so when I print it to the console, I get the filename and line nicely formatted.

Finally, there’s all CorMetadataImport code. In addition to wrapping the unmanaged debugger API, CorDebug also wraps the unmanaged metadata API. This code lets me get MethodInfo compatible view of the function metadata for a given stack frame. I use it here to get the type and function name for each frame on the stack.

The end result looks something like this. Note, I’ve replaced “Microsoft.Scripting” with “MS.Scripting” to avoid word wrapping.

OnBreakpoint Initialize Location: simpletest.py:1 (offset: 84)
» t
Managed Stack Trace
   S$2::Initialize simpletest.py:1 (offset: 84)
   MS.Scripting.Runtime.OptimizedScriptCode::InvokeTarget (offset 72)
   MS.Scripting.ScriptCode::Run (offset 0)
   IronPython.Hosting.PythonCommandLine::RunFileWorker (offset 77)
   IronPython.Hosting.PythonCommandLine::RunFile (offset 15)
   MS.Scripting.Hosting.Shell.CommandLine::Run (offset 46)
   IronPython.Hosting.PythonCommandLine::Run (offset 240)
   MS.Scripting.Hosting.Shell.CommandLine::Run (offset 74)
   MS.Scripting.Hosting.Shell.ConsoleHost::RunCommandLine (offset 158)
   MS.Scripting.Hosting.Shell.ConsoleHost::ExecuteInternal (offset 32)
   MS.Scripting.Hosting.Shell.ConsoleHost::Execute (offset 63)
   MS.Scripting.Hosting.Shell.ConsoleHost::Run (offset 390)
   PythonConsoleHost::Main — (offset 125)

As we can see, we may be on the first line of the python script, but we’ve got a pretty deep stack trace already. Everything but the top-most frame are from the underlying IronPython implementation. Those extra frames obscure the stack frames I actually care about, so it would be nice to hide any stack frames from IronPython or the DLR. It’s easy enough to write a python generator function that filters out frames that from the DLR or IronPython namespaces. In order to get the type name, we need the method_info like we did above. I’ve factored that code into a separate function in order to avoid code duplication.

def get_method_info_for_frame(frame):
    if frame.FrameType != CorFrameType.ILFrame:
      return None
    metadata_import = CorMetadataImport(frame.Function.Module)
    return metadata_import.GetMethodInfo(frame.FunctionToken)

def get_dynamic_frames(chain):
  for f in chain.Frames:
    method_info = get_method_info_for_frame(f)
    if method_info == None:
      continue
    typename = method_info.DeclaringType.Name
    if typename.startswith("Microsoft.Scripting.")
      or typename.startswith("IronPython.")
      or typename == "PythonConsoleHost":
        continue
    yield f

You’ll notice I’ve added a guard to get_method_info_for_frame in order to ensure that the frame argument is an IL Frame. There are three types of stack frames in the debugger API: IL, native and internal. Most of the frames we’re dealing with are IL frames, but you do run into the occasional lightweight function (i.e. DynamicMethod) frame when debugging IronPython code. Typically, IronPython generates DynamicMethods for all python code except for a few cases related to .NET interop. However, you can’t debug DynamicMethods, so when you run with –D, we generate normal non-dynamic methods instead. However, even when running with –D, we still use DynamicMethods for call site dispatch. Since they’re an implementation detail, we want to filter those out in get_dynamic_frames too.

This gives us a much more manageable stack trace:

OnBreakpoint Initialize Location: simpletest.py:1 (offset: 84)
» t
Stack Trace
   S$2::Initialize -- simpletest.py:1 (offset: 84)

As usual, the latest ipydbg source is up on GitHub.

Writing an IronPython Debugger: Refactoring

When we last left ipydbg, it was up to about 200 lines of code. Not bad in terms of overall length, but I started to detect some code smell. I was relying pretty heavily on global variables and the structure of my code made it difficult to control how the debugger was run. I wanted to change ipydbg so it would automatically spin up an MTA thread if I forgot to add the –X:MTA command line parameter. But since by debugger and process objects were global, they’d get created on the main thread of ipydbg, regardless if it was STA or MTA. So for this “release” (I’d say I’m almost to version 0.0.0.1), I decided on focusing on enginering and refactoring rather than new features.

The big new addition is the IPyDebugProcess class, which is clearly the workhorse of the application. All of the previously global variables are now class instance variables on IPyDebugProcess. Input and run along with all the event handlers as well as do_break_event and get_location are now class methods, as they need to access instance variables (setting the break event, accessing the symbol reader dictionary, etc.). Functions that didn’t need to access instance variables (get_sequence_points, create_breakpoint, get_dynamic_frames and get_method_info_for_frame) I left as top-level functions. If they get more complex, I may break them out into their own modules, but for now I left them in ipydbg.py.

The conversion process was fairly trivial. I had to add “self.” lots of places and change the indention level all over but that was pretty much it. Once I finished the conversion, I was able to add the run_debugger function to handle the thread creation, if necessary.

def run_debugger(py_file):
    if Thread.CurrentThread.GetApartmentState() == ApartmentState.STA:
        t = Thread(ParameterizedThreadStart(run_debugger))
        t.SetApartmentState(ApartmentState.MTA)
        t.Start(py_file)
        t.Join()
    else:
        p = IPyDebugProcess()
        p.run(py_file)

if __name__ == "__main__":
    run_debugger(sys.argv[1])

Originally, I tried to put this logic in IPyDebugProcess.run. However, since I’m creating the debugger object in the __init__ function, that meant it would be created on the wrong thread. I could have moved the debugger creation to the run method or move the thread management code to __init__, but I decided to factor that logic into a separate function completely. Felt cleaner that way.

Writing an IronPython Debugger: Stepping Thru Code

So far, I’ve written seven posts about my IronPython debugger, but frankly it isn’t very functional yet. It runs, breaks on the first line and can show a stack trace. Not exactly Jolt award material. In this post, I’m going to add one of the core functions of any debugger: stepping. Where previously I’ve written a bunch of code but had little to show in terms of features, now I’m getting three new features (basic step, step in and step out) at once!

def _input(self):
  #remaining _input code omitted for clarity
  elif k.Key == ConsoleKey.S:
      print "nStepping"
      self._do_step(False)
      return
  elif k.Key == ConsoleKey.I:
      print "nStepping In"
      self._do_step(True)
      return
  elif k.Key == ConsoleKey.O:
      print "nStepping Out"
      stepper = create_stepper(self.active_thread)
      stepper.StepOut()

def _do_step(self, step_in):
  stepper = create_stepper(self.active_thread)
  mod = self.active_thread.ActiveFrame.Function.Module
  if mod not in self.symbol_readers:
      stepper.Step(step_in)
  else:
    range = get_step_ranges(self.active_thread, self.symbol_readers[mod])
    stepper.StepRange(step_in, range)

Here you can see the _input clauses for step, step in and step out. Of the three, step out is the simplest to implement: create the stepper object and call StepOut. For step and step in, I could simply call Step (the boolean argument indicates if you want to step into or over functions) but that only steps a single IL statement. The vast majority of the time there are multiple IL instructions for every line of source code, so IL statement stepping is very tedious. As we learned when setting a breakpoint, debug symbols contain sequence points that map between source and IL locations. If they’re available, I use the sequence points to determine the range of IL statements to step over so that I can step single source statements instead.

The stepping code above depends on three helper functions defined at global scope.

def create_stepper(thread):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  return stepper  

def create_step_range(start, end):
  range = Array.CreateInstance(COR_DEBUG_STEP_RANGE, 1)
  range[0] = COR_DEBUG_STEP_RANGE(startOffset = UInt32(start),
                                  endOffset = UInt32(end))
  return range

def get_step_ranges(thread, reader):
    frame = thread.ActiveFrame
    offset, mapResult = frame.GetIP()
    method = reader.GetMethod(SymbolToken(frame.FunctionToken))
    for sp in get_sequence_points(method):
        if sp.offset > offset:
            return create_step_range(offset, sp.offset)
    return create_step_range(offset, frame.Function.ILCode.Size)

The first function, create_stepper, simply constructs and configures the stepper object. The call to SetUnmappedStopMask tells the debugger not to stop if it encounters code that can’t be mapped to IL. If you need to debug at that level, ipydbg is *not* for you.

Next is create_step_range, which exists purely for .NET interop purposes. There are three interop warts hidden in this function. First is creating a .NET array of COR_DEBUG_STEP_RANGE structs. Every time I write Array code like this, I wish for a CreateFromCollection static method on Array. However, in this case it isn’t that big a deal since it’s a one element array. Second wart is having to set the values of COR_DEBUG_STEP_RANGE via constructor keyword arguments. It turns out that IronPython disallows direct updates to value type fields (read this for the reason why). Instead, I pass in the field values into the constructor as keyword arguments. Finally, you have to explicitly convert the start and end offsets to a unsigned int in order to set the offset fields in the COR_DEBUG_STEP_RANGE struct constructor.

Finally is get_step_ranges, which iterates thru the list of sequence points in the current method looking for the one with the smallest offset that is larger than the current offset position. If it can’t find a matching sequence point, it sets the range to the end of the current function. The start range offset is always the current offset. I did make a significant change to get_sequence_points – it no longer yields sequence points that have a start line of 0xfeefee. By convention, that indicates a sequence point to be skipped. Originally, the logic to ignore 0xfeefee sequence points was in get_location. But when I originally wrote get_step_ranges, it had essentially the same sequence point skipping logic, so I moved it to get_location instead.

Technically, I’ve built three new features but the reality is that if you end up in IronPython infrastructure code it’s really hard to find your way back to python code. Step in is particularly useless right now. Luckily, the .NET debugger API supports a feature called “Just My Code” that will make stepping much more useful. In the meantime, the latest version of ipydbg is up on GitHub as usual.

Writing an IronPython Debugger: Debugging Just My Code

As I wrote last time, in order to make debug stepping actually useful in ipydbg I need to avoid stepping into frames that are part of the IronPython infrastructure. I did something similar when I hide infrastructure frames in the stack trace. Originally, I had planned to automatically stepping again if we ended up on a frame that didn’t correspond to a python file. However, Mike Stall showed me a much cleaner and better performing solution: Just My Code. As I mentioned at the start of this series, support for JMC is one of the main reasons I wanted to build my own debugger rather than use MDbg.

Enabling JMC in the stepper object is trivial:

def create_stepper(thread, JMC = True):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  stepper.SetJmcStatus(JMC)  
  return stepper

If I make that single change and run ipydbg, any step effectively turns into a full continue since none of the code has been marked as “My Code” yet. As you see, the tricky part of JMC isn’t enabling it on the stepper, it’s “painting” the parts of the code where you want JMC stepping to work. You can set JMC status at the module, class or the method level. In the case of ipdbg, it’s easiest to work at the class level:

infrastructure_methods =  ['TryGetExtraValue',
    'TrySetExtraValue',
    '.cctor',
    '.ctor',
    'CustomSymbolDictionary.GetExtraKeys',
    'IModuleDictionaryInitialization.InitializeModuleDictionary']

def OnClassLoad(self, sender, e):
    cmi = CorMetadataImport(e.Class.Module)
    mt = cmi.GetType(e.Class.Token)
    print "OnClassLoad", mt.Name

    if not e.Class.Module.IsDynamic:
      e.Class.JMCStatus = False
    elif mt.Name.startswith('IronPython.NewTypes'):
      e.Class.JMCStatus = False
    else:
      e.Class.JMCStatus = True
      for mmi in mt.GetMethods():
        if mmi.Name in infrastructure_methods:
          f = e.Class.Module.GetFunctionFromToken(mmi.MetadataToken)
          f.JMCStatus = False

OnClassLoad is where the action is. This event handler is responsible for enabling JMC for all class methods that map to python code. To understand how the logic in OnClassLoad works, you need to understand a little about the .NET types and code that IronPython generates. Note, the following description is for the IronPython 2.0 branch. Code generation evolves from release to release and I know for a fact there are changes in the upcoming 2.6 version. I assume that I’ll eventually have to sniff the IronPython version in order to set JMC correctly.

Today, IronPython generates all code into dynamic modules and methods. Since I want to limit stepping to python code only, I automatically disable JMC for non-dynamic modules. I can imagine a scenario where I want to step into non-dynamically generated code, but I think the best way to handle that would be to disable JMC at the stepper rather than widening the amount of code marked as JMC enabled.

For every module that gets loaded, IronPython generates a type. At a minimum you’re going to load two modules: site.py and whatever python script you ran. If you have the python standard library installed, site.py loads a bunch of other modules as well. Each of these module types have a bunch of standard methods that always get generated. For example, the global scope code in the module is placed in a static method on the module type called Initialize. Any python functions you define get generated static methods with mangled names on the module type [1]. All these methods have corresponding python code and should be JMC enabled. The other standard methods on a module type should not be JMC enabled. So in my debugger, I mark the class as JMC enabled but then iterate over the list of methods and mark any in the list of standard methods (except for Initialize) as JMC disabled.

Of course, you can also create classes in Python. As you might expect, classes in Python are generated as .NET types. However, the semantics of Python classes are very different than .NET types. For example, you can change the inheritance hierarchy of python classes at runtime. That’s obviously not allowed for .NET types. So the .NET types we generate have all the logic to implement Python class semantics. As it turns out, these .NET types only have the logic to implement Python class semantics, which is to say they have none of Python class methods code. This makes sense when you think about it – since Python can add and remove methods from a class at runtime, IronPython can’t put the method code in the .NET type itself. Instead, Python class methods are generated as static methods on the module type, just like top-level functions are. Since Python class types only contain Python class semantics logic, we never want to enable JMC for Python class types. Python class types get generated in the IronPython.NewTypes namespace, so it’s fairly easy to check the class name in OnClassLoad and automatically disable JMC for classes any in that namespace.

Adding JMC support makes ipydbg significantly more usable. It’s almost like a real tool now, isn’t it? Latest bits are up on GitHub.


  1. FYI, IronPython generates python functions as dynamic methods in release mode and static module class methods in debug mode since you can’t step into dynamic methods. The description above is specific to debug mode since ipydbg exclusively runs in debug mode. ↩︎

Writing an IronPython Debugger: Showing Source Code

It’s been almost a week since my last ipydbg post. I’m not done, I just needed to catch my breath for a few days and get some other work done. Contrary to popular believe, my day job revolves around more than just ipydbg! 😄

Actually, I’ve made ten commit since my last post, but it’s been a mostly minor changes. For example, I was hacking around with breakpoints and restored a bunch of commented out code in BreakpointEnumerator. Since I was changing the original C# CorDebug wrapper source, I decided to add a few helper functions to return metadata for functions and classes as well as cleaning up some C# filenames. On the Python side, I added an active_appdomain field to IPyDebugProcess to go along with active_thread.

Today, I added what started as a fairly minor feature – showing the current line of source code at the start of the input loop. The initial code for this was cake, simply getting the sequence point for the current location and mapping that to a source file. In order to avoid hitting the file system over and over, I cache source files the first time they are accessed.

def _get_file(self,filename):
    filename = Path.GetFileName(filename)
    if not filename in self.source_files:
      self.source_files[filename] = File.ReadAllLines(filename)
    return self.source_files[filename]  

def _input(self):
    offset, sp = self._get_location(self.active_thread.ActiveFrame)
    lines = self._get_file(sp.doc.URL)
    print "%d:" % sp.start_line, lines[sp.start_line-1]
    #input loop ommited for clarity

However, when I did this, I discovered a slight issue. When you step into a Python function, the CLR debugger breaks at the very beginning of the function being stepped into. In C#, the function start is mapped to the opening curly brace of the function. IronPython, on the other hand, doesn’t map the start of the function to anything since there’s a bunch of infrastructure code at the start of every function that has no correlation to the python source. This means _get_location return a null sequence point when I first step into a function and thus I wouldn’t be able to show any source code.

I could make the argument that start of the function should be mapped to the colon that starts the function block. However, I’m not in a position to make changes to how the shipping version of IronPython emits debug symbols. So instead, I decided to insert an automatic step whenever I step into a function by modifying OnStepComplete:

def OnStepComplete(self, sender,e):
    offset, sp = self._get_location(e.Thread.ActiveFrame)
    print "OnStepComplete Reason:", e.StepReason,
           "Location:", sp if sp != None else "offset %d" % offset
    if e.StepReason == CorDebugStepReason.STEP_CALL:
      self._do_step(e.Thread, False)
    else:
      self._do_break_event(e)

I have this nagging feeling that a simple step won’t suffice and I’ll need to add logic to ensure that I’m only auto-stepping when the start of the function doesn’t have a matching sequence point. But I have tested this with a few different python scripts and it appears to work fine. If I need something more sophisticated, I can always add it later. BTW, notice I modified the signature of _do_step so that it takes the thread as an argument rather than picking it up as an IPyDebugProcess field.

As usual, latest ipydbg (including new compiled version of CorDebug.dll) is available at GitHub.

← All series