Articles Fixing FireMonkey Heisenbugs by Dalija Prasnikar

emailx45

Местный
Регистрация
5 Май 2008
Сообщения
3,571
Реакции
2,439
Credits
574
Fixing FireMonkey Heisenbugs
Dalija Prasnikar - 31/Jan/2019
[SHOWTOGROUPS=4,20]
Every once in a while, every developer encounters random bugs that happen only in production and cannot be reproduced at will. If you cannot reproduce it, you can hardly fix it. In such situations, recording exceptions with various error loggers can help us find the culprit and fix the error. However, sometimes the information collected simply does not contain enough data to do so.

This post is inspired by the following Stack Overflow question Для просмотра ссылки Войди или Зарегистрируйся where the logger has recorded an exception and its call stack.

Код:
Argument out of range
At address: $002CDD4B (Generics.Collections.TListHelper.CheckItemRange(Integer) + 62)

Call stack:
MyApp $00BB153D Grijjy.Errorreporting.backtrace(Pointer*, Integer) + 8
MyApp $00BB1427 Grijjy.Errorreporting.TgoExceptionReporter.GlobalGetExceptionStackInfo(TExceptionRecord*) + 74
MyApp $001C4D83 Sysutils.Exception.RaisingException(TExceptionRecord*) + 38
MyApp $001E903D Sysutils.RaiseExceptObject(TExceptionRecord*) + 44
MyApp $001B0D9D _RaiseAtExcept(TObject*, Pointer) + 164
MyApp $001B1007 _RaiseExcept(TObject*) + 14
MyApp $002CDD4B Generics.Collections.TListHelper.CheckItemRange(Integer) + 62
MyApp $0059D4B3 Fmx.Controls.TControl.PaintChildren() + 222
MyApp $005BB987 Fmx.Controls.TControl.PaintInternal().DoPaintInternal(Pointer) + 1162
MyApp $005BC165 Fmx.Controls.TControl.PaintInternal().PaintAndClipChild(Pointer) + 500
MyApp $005B8F09 Fmx.Controls.TControl.PaintInternal() + 376
MyApp $007569D5 Fmx.Forms.TCustomForm.PaintRects(Types.TRectF const*, Integer) + 1008
MyApp $0074A001 __stub_in660v62__ZN3Fmx5Forms17TCommonCustomForm10PaintRectsEPKN6System5Types6TRectFEi + 24
MyApp $0068257D Fmx.Platform.Ios.TFMXView3D.drawRect(Iosapi.Foundation.NSRect) + 204
MyApp $00C2BA57 DispatchToDelphi + 82
MyApp $00C2B927 dispatch_first_stage_intercept + 18
QuartzCore $246A9F63 <redacted> + 106
QuartzCore $2468E551 <redacted> + 204
QuartzCore $2468E211 <redacted> + 24
QuartzCore $2468D6D1 <redacted> + 368
QuartzCore $2468D3A5 <redacted> + 520
QuartzCore $24686B2B <redacted> + 138
CoreFoundation $220456C9 <redacted> + 20
CoreFoundation $220439CD <redacted> + 280
CoreFoundation $22043DFF <redacted> + 958
CoreFoundation $21F93229 CFRunLoopRunSpecific + 520
CoreFoundation $21F93015 CFRunLoopRunInMode + 108
GraphicsServices $23583AC9 GSEventRunModal + 160
UIKit $26667189 UIApplicationMain + 144
MyApp $003CBF15 Iosapi.Uikit.UIApplicationMain(Integer, Byte**, Pointer, Pointer) + 8
MyApp $00676843 Fmx.Platform.Ios.TPlatformCocoaTouch.Run() + 70
MyApp $006767FB __stub_in92s__ZN3Fmx8Platform3Ios19TPlatformCocoaTouch3RunEv + 10
MyApp $0074628F Fmx.Forms.TApplication.Run() + 182
MyApp $00C2B893 main + 246
$1FE2EF0F

Asking the right question
So, the question asked is how to find the exact line of code where the exception happened. That is a valid question on its own. However, in this particular case, knowing the answer to that question will not provide a solution to the real problem - preventing the application crash.

The real question that should have been asked is "How to prevent or stop an application from crashing?"
Finding the answer to the wrong question
So, let's walk down the call stack and see what happened:

The actual place where the exception was raised is in Generics.Collections.TListHelper.CheckItemRange
Код:
procedure TListHelper.CheckItemRange(AIndex: Integer);
begin
  if Cardinal(AIndex) >= Cardinal(FCount) then
    ErrorArgumentOutOfRange;
end;

Here it is fairly obvious where the exception happened and why. Accessing the array (list) of items at an index that is larger than the list's size - hence Argument out of range. But that method is called quite often, and it is not specific enough to locate the real source of trouble.

The next is Fmx.Controls.TControl.PaintChildren
Код:
procedure TControl.PaintChildren;
var
  I, J      : Integer;
  R         : TRectF;
  AllowPaint: Boolean;
  Control   : TControl;
begin
  if (FScene <> nil) and (ControlsCount > 0) then
    for I := GetFirstVisibleObjectIndex to GetLastVisibleObjectIndex - 1 do
      if FControls[I].Visible then
      begin
        Control := FControls[I];
        if Control.FScene = nil then
          Continue;
        if not Control.FInPaintTo and Control.UpdateRect.IsEmpty then
          Continue;
        if (ClipChildren or SmallSizeControl) and not IntersectRect(Self.UpdateRect, Control.UpdateRect) then
          Continue;
        //
        AllowPaint := False;
        if Control.FInPaintTo then
          AllowPaint := True;
        if not AllowPaint then
        begin
          if Assigned(Control.CustomSceneAddRect) then
            AllowPaint := True
          else
          begin
            R     := UnionRect(Control.GetChildrenRect, Control.UpdateRect);
            for J := 0 to FScene.GetUpdateRectsCount - 1 do
              if IntersectRect(FScene.GetUpdateRect(J), R) then
              begin
                AllowPaint := True;
                Break;
              end;
          end;
        end;

        if AllowPaint then
          Control.PaintInternal;
      end;
end;

A bit better, but still very vague. And this is the method that prompted the question - how to find the exact line where an exception was raised in the above code.

In this case, there is only one TList<T> access that could directly call the TListHelper.CheckItemRange method - on the third line:
Код:
if FControls[I].Visible then

So, the answer to the original question - which line triggered the exception - is right here. But are we any closer to solving the real problem?
-- No. Not even close.

Why?
Just like CheckItemRange, the PaintChildren method is also called often and is not specific enough.

No problem... there are still many lines in call stack... but... if we take look at the call's origin - it came from the message loop handler while processing a paint request - and we have no clue where that request originated.

Finding the answer to the right question
If we have additional logs, where we logged users' activity and from which we could tell what was used exactly before the paint request was triggered, maybe we could locate a piece of the code that brought up the issue. But even with that, it may be hard to reproduce and fix the issue.

Let's go back to the PaintChildren method and how iteration through the controls tried to access an out-of-range index. This is a UI operation, and as we all know those must run in the context of main UI thread because they are not thread safe. (Well, there are some bits and pieces of UI code here and there that are thread safe, but this is not one of them).

So we have several options that could mess up the indicies:
  1. Touching the UI from a background thread - particularly removing some of the controls from the list
  2. Errors in GetFirstVisibleObjectIndex or GetLastVisibleObjectIndex, as they are virtual and their implementations can potentially return the wrong index
  3. Changing the list of controls within any code called during the iteration - for instance Control.PaintInternal
Now, if you have the previously mentioned activity log then maybe, just maybe, you could inspect the code involved and spot any of the mentioned errors. If you find them, great - problem solved - but what if you cannot? You are still stuck with a crashing application and no solution in sight.

Desperate times call for desperate measures and a bit of creative thinking
While finding the real issue and fixing it is always the preferable solution, when you run out of options there is always another thing you can do.

The ultimate goal of this bug chasing endeavour is preventing application crashes.

If you cannot locate the piece of code where the issue originates, maybe you can change the piece of code where you know the exception occurs.

Of course, in this case that means making changes in the FMX framework, but since it is not an interface breaking change, we can just put a changed FMX.Controls unit under our project and it will be picked up and used instead of the original one.

Of course, this will not work if your application is using the FMX framework as a runtime library.

The original code accesses list twice. The first thing to do is to limit that to a single access point.
Код:
if FControls[I].Visible then
begin
  Control := FControls[I];
   ...

can be replaced with
Код:
   Control := FControls[I];
   //
  if Control.Visible then
   begin
   ...

The above change does not solve the problem, but it is a step closer.

The original exception is caused by accessing an out-of-range index. What would happen if we used additional index check before we access the list and, in the event of an invalid index, do nothing?

Well, this is painting code. The worst thing that could happen is that some control wouldn't get painted. Since that is the control that is also no longer visible - not in the controls list - nothing bad would happen. If, by any remote chance, there is a more serious painting problem behind this, we would get a visual cue of where the error lies - some part of the user interface would not be painted correctly - which is still better than crashing.
Код:
if I < FControls.Count then
begin
  Control := FControls[I];
//
  if Control.Visible then
  begin
...

Problem solved.

Well, not really.


Background thread touching UI
If the real culprit is the code executing in a background thread, then you are out of luck. Protecting UI from background threads can only be solved in code that executes in the context of background thread, synchronizing parts that access the UI. Or changing the logic completely to prevent UI interaction in the first place.

Even if a background thread is the cause, the thing with threading issues is that slight variations in code, like changing the original FMX code to prevent Argument out of range, can have impact on how often threads collide.

You can make things worse, but you can also make them better, reducing the number of crashes - even to the point that you don't experience them at all. That does not mean that the threading issue is fixed, but it is the next best thing you can get - it will be less prominent.

Really desperate measures?
If you are seriously out of options, you can always just wrap the entire PaintChildren method in a try..except block. But, seriously... don't do that.

At some point, you just have to give up.


[/SHOWTOGROUPS]