Friday, October 31, 2014

Comparing code using WPF Perforate and Visual Profiler


For INPC code vs non standard code, there are some steps we can follow:
        private double? _q;
        public double? Qty
        {
          ... NotifyPropertyChanged(this,"Qty");
        }

        private double? _q;
        public double? Qty
        {
            get { return GetQty(0); }
            set { _bidQtys[0] = value; }  // by adding NotifyPropertyChanged(this,"Qty") here speed will pick  up to standard INPC

(1) generate stressfull/human data 1ms, 10ms 250ms and bind to WPF Xaml UI
            d = Observable.Interval(TimeSpan.FromMilliseconds(1)).Delay(TimeSpan.FromSeconds(20)).Subscribe((ms) =>
            {
                Random r = new Random();
                for (int i = 0; i < FakeData.Count; i++)
                {
                    FakeData[i].Qty = -1000 * r.NextDouble();

(2) Perforator:FRPS/DRAR higher=> data updates faster (Key Observation: INPC has 9x FRPS than non-standard that use intermediate storage
(3) VisualProfiler CPU %
 DispatcherInvoke -- Dispatcher Operation,  Increasing to high=> buggy code, e.g too many timer pushing UI

 Rendering Thread --- Unmanaged render pass (Brushed, tessalation, call DirectX), find only the visual element and draws 

the whole window at a 60 FPS as default, popularly called as Composition Thread , Graphics acceleration uses Channel 

Protocol to communicate with the other thread. High and increasing % => need to Profile to feed XPerf, GPUViwer, WPA, 

etc in Windows Performance ToolKit.

 Layout ---measure/arrange passes higher => variable/Compute control size, fast changing text, bad GPU/Box.

(4) 1ms -- most stressful  10ms -- Physical limit 16ms per frame =60 FRPS, 250 Human eyeball, Win8 App Fast= 100-200ms

Sunday, October 19, 2014

Rx ObserveOn SubscribeOn

In WPF Window update UI code look like the follwoing:

    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
            Observable.Interval(TimeSpan.FromSeconds(1)).ObserveOn(Dispatcher).Subscribe((i) => this.Title = i.ToString());
        }
    }
or

 Observable.Interval(TimeSpan.FromSeconds(1)).ObserveOn(this).Subscribe((i) => this.Title = i.ToString());

ObserveOn = Notify Observer on a Dispatcher
SubscribeOn = Subscribe/unsubscribe Observers on a scheduler, where background/task pool will run.

Friday, October 10, 2014

Useful Tools, scripts and Concept


Power Shell set path
====================
(Get-ItemProperty -Path 'Registry::HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Environment' -Name PATH).Path

$oldPath=(Get-ItemProperty -Path 'Registry::HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Environment' -Name PATH).Path

$newPath=$oldPath+’;C:\tools\snoop\’

Set-ItemProperty -Path 'Registry::HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Environment' -Name PATH –Value $newPath

Power Shell:
============
get-childitem | select-string "double"  ( find string)
get-ChildItem -Path c:\log |select-string "error"

add user to admin  net localgroup Administrators /Add Domain\UserId

Color Hex converter
===================
http://www.colorschemer.com/online.html

Performance Tools and Concept
==============================

Vsync --- GPU sync up buffers and refresh rate so no Tearning (frame override previous one). If GPU does not finsh render before one VSync,
 then could CPU taking too long.
XPerf/WPA --ETW (CLR GC/ThreadPool/JIT events, Context Switching, CPU, Page Fault, Disk IO, reg access) best OS logging. some managed code

PerfView --- Managed code, Stacks (CPU,Disk IO, GC Head Alloc,Image Load, Managed Load), Stats(GC, JIT,Event)

Some details on PerfView:g 
=========================
(1) Memory -> Task Snapshot of Heap -> filter to your process -> Force GC-> dump your GC heap => can compare be
fore after closing of some part of UI to see if Memory get reclaimed.

(2)CPU Stack high CPU % path drill down can identify hot code.

(3) Run Command " your.exe" will still collect all ETW data so you have to drill down to "your.exe". But this will bracket the time you are interested in.  The other is "Collect"

(4) PerfView is for  managed code. So before any analysis, use VMMap to check workingset break down: Heap vs. Managed Heap size. Also, Task manager private Memory column can tell if Memory stay high and increasing even after some view are closed or running for a long time.

(5) Diff requires open two stack viewer of dump to diff. If Comparison to baseline show positive MB=> memory increase or baseline reclaimed GC Heap after GC dump and your app GC Dump.
(6) Click on a row in diff stack => trace back to root where you can start to analyze source code for who is holding heap memory.
(7) Wall Clock Analysis: Collect ->check Thread Time-> get 3 Thread time view stack in collected data.
(8) CPU_TIME, BLOCK_TIME= waiting for eg disk access come back, PAGE_FAULT= Virtual Ram. Ctx Menu Include Item/Back button can focus/exit on CPU_TIME.
(9) Zoom in = Select two cell -> set time range 
(10) Thread Time (With Task)--- charge Child task related time to Parent Task so time will be for the child  real work not in the task. Next use Include item can zoom in to the code in the thread that use Wall Clock time.

Tuesday, September 23, 2014

Using DevExpress Layout Manager access a View from Shell

When View (UserControl) cannot be access from Presenter and burried inside inside a shared DLL, you may use Shell Window
 to walk up/down tree. But some docking situation requires using DockLayoutManager to track down. 


            var shellWindow = _mySvc.GetApplicationShell() as XpfRibbonShellView;
            if (shellWindow == null) return;

            shellWindow.Dispatcher.Invoke(new Action(() =>
            {
                try
                {
                    // when TheView is docked
                    var wrkSpace = VisualTreeHelpers.FindChild<ContentControl>(shellWindow, "wrkSpace");
                    var gContent = VisualTreeHelpers.FindChild<GroupPaneContentPresenter>(wrkSpace, "PART_Content");
                    var lpItemsCtl = VisualTreeHelpers.FindChild<LayoutItemsControl>(gContent);

                    foreach (var i in lpItemsCtl.Items)
                    {
                        // TheView docked top layer
                        var lp = i as LayoutPanel;
                        if (lp != null && lp.Content is TheView)
                            lp.ShowCaption = true;


                        // TheView Tabbed inside another docked
                        var lg = i as LayoutGroup;
                        if (lg == null) continue;
                        foreach (var i2 in lg.Items)
                        {
                            var tg = i2 as TabbedGroup;
                            if (tg != null)
                            {
                                var layoutItems = tg.GetItems();
                                foreach (var lp2 in layoutItems.Cast<LayoutPanel>().Where(lp2 => lp2.Content is TheView))
                                {
                                    lp2.ShowCaption = true;
                                }
                            }
                            // TheView could end up here if close and then added back whiel in Tab mode
                            var lp3 = i2 as LayoutPanel;
                            if (lp3 != null && lp3.Content is TheView)
                                lp3.ShowCaption = true;
                        }
                    }

                    // When TheView is floating
                    foreach (var lp in shellWindow.DockLayoutManager.FloatGroups.SelectMany(fg => fg.Items.OfType<LayoutPanel>().Where(lp => lp.Content is TheView)))
                    {
                        lp.ShowCaption = true;
                    }

                    // All auto hide layout panel need to be handled here , not just TheView.
                    foreach (var lp in shellWindow.DockLayoutManager.AutoHideGroups.SelectMany(ahGroup => ahGroup.Items).OfType<LayoutPanel>())
                    {
                        lp.ShowCaption = true;
                    }
                }
                catch (Exception ex)
                {
                    _log.Error(ex);
                }
            }));

Sunday, September 21, 2014

Low Latency Programing


Looks like their thinking are influenced by Peter Lowery' s Talk on Friday and possibly by
 Martin Fowler and Martin Thompson's writing about Low Latency.

I think there are two things highly relevant to Insight-Desktop (Carol, Hai, Daniel, Alec also in Peter's talk so please comment)

(1) Journaling, aka EventSourcing
   The idea is to log every input event and replay them to help debug PROD issue in DEV. Obviously, we need a low latency logger

(2) Profile Market watch.
   I think Market Data come in here. We should profile for GC pause, Lock contentioin, Caching.
   This help us to understand MVVM + Presenter pattern better in term of low latency,

Here are key points from their talk/writing

Setup correct performance test --- Theory are most likely wrong.
GC-Free  ---The biggest Performance cost is GC Pause
Lock-Free  ---- Lock cause Context switching, clear cache line
Cache Friendly --- L3 Cache is shared memory cross cores.
EventSourcing --- Replay input event to debug PROD in DEV, instead of analysing Log files

Actionable Items:

(1) Performance Benchmark Test App: 
    A WPF Unit testing simulator can be build for Logging, Journaling, Rx vs. .Net event, Market data <100 micro-second.

(2) Logging Improvement:
    a separate email thread already starting writing a logging using RingBuffer.

    



Shared Memory (L3 cache), Cache Friendly Collectioin
Thread Affinity and Isolation
Queue (LinkedList, Array) Ring Buffer, In Memory

Concurrency
Single Threaded, Fx 4.5 Async-Await
Journaler, Sequential Disk

Array is Cache Friendly.

64-bit Cache key, Concurrent Map/Segments, 1000 Segments. Producer never block to wait consumer

 Loop unrolling,Lock-coarsing, Inlining

Queue has fundamental Issues, Ring Buffer is better but on Dsktop cannot have 10M

Network: 10mcs local hop, 10GigE, FPGA market Data


http://mechanical-sympathy.blogspot.hk/

http://martinfowler.com/articles/lmax.html - even martin fowler has done a review of it

http://lmax-exchange.github.io/disruptor/ - all the source code is on github

http://www.infoq.com/presentations/LMAX - presentation on the architecture


Tuesday, September 16, 2014

How to visualize UI hanging using Concurrency Visualizer in VS 2013

Fx 4.5 Aysnc-Wait allow UI Processing 80%+ of time, while Fx 4.0 Blocking Collection only allow 20%-



Ring Buffer looks better in UI Processing more like 40%

Sunday, September 14, 2014

Classical Asyn Pattern and "AsyncRollingFileAppender" using BlockingCollection in Log4Net

Action a = ()=>{ do some work};
                a.BeginInvoke(CB, a);
        void CB(IAsyncResult ar)
        {
            Action a = ar.AsyncState as Action;
            a.EndInvoke(ar);
        }


(1) It seems UI Async cannot be implemented in Fx 4.0.
    We have to wait for Fx 4.5 async-await to have non-blocking UI
(2) AsyncRollingFile is  "async-like", slightly block UI. In my test it
    blocking UI for 9 seconds, then writing file take 40 seconds to complete.
    It is definetly faster than classic Log4net RollingFile which block UI by the entire 40 secs.
(3) Log4Net achieved Async by using Task offloading logging from a 
    queue-like buffer.I also tried TaskCompletionSource and it has similar "short-blocking" async behavior.
(4) All these async-alike can lose 20 seconds of data during delayed write to files since app can crash. 
    That is  when we really need to log why the app crashed.
(5) Fx 4.0 already has a solution to deal with Buffer overflow in Log4NetAsync --- instead of throw away
    logging, we can block logging. So we slow down but do no lose data.
    Specifically, I think we can implement IProducerConsumer using RingBuffer (so 2x faster writing files)
    and feed it to BlockingCollection with Capacity 1000.
    I tried BlockingCollection, capacity 10 or 1000, default ConcurrentQueue, end up Blocking UI for 20 secs and
    70 sec write to files. so much slower.

            Stopwatch sw = new Stopwatch();
            sw.Start();
            for (int i = 0; i < Int32.MaxValue / 1000; i++)
               // your Log.Info
            sw.Stop();
            MessageBox.Show(sw.ElapsedMilliseconds.ToString());


namespace Log4Net.Async
{
    public class AsyncRollingFileAppender : RollingFileAppender
    {
        BlockingCollection<LoggingEvent> pendingAppends = new BlockingCollection<LoggingEvent>(10);

        Task t ;
        public override void ActivateOptions()
        {
            base.ActivateOptions();
             t = new Task(AppendLoggingEvents,TaskCreationOptions.PreferFairness);
             t.Start();
        }

        protected override void Append(LoggingEvent[] loggingEvents)
        {
            Array.ForEach(loggingEvents, Append);
        }
       
        protected override void Append(LoggingEvent loggingEvent)
        {

            Task.Factory.StartNew(() =>
            {
                
                if (FilterEvent(loggingEvent))
                {
                    pendingAppends.Add(loggingEvent);
                }
            });
        }

        private void AppendLoggingEvents()
        {
            LoggingEvent loggingEventToAppend;
            while (true)
            {

                while (!pendingAppends.TryTake(out loggingEventToAppend))
                {

                }
                if (loggingEventToAppend == null)
                {
                    continue;
                }

                try
                {
                    base.Append(loggingEventToAppend);
                }
                catch
                {
                }
            }

            while (pendingAppends.TryTake(out loggingEventToAppend))
            {
                try
                {
                   
                    base.Append(loggingEventToAppend);
                }
                catch
                {
                }
            }
        }
    }
}