For INPC code vs non standard code, there are some steps we can follow: private double? _q; public double? Qty { ... NotifyPropertyChanged(this,"Qty"); } private double? _q; public double? Qty { get { return GetQty(0); } set { _bidQtys[0] = value; } // by adding NotifyPropertyChanged(this,"Qty") here speed will pick up to standard INPC (1) generate stressfull/human data 1ms, 10ms 250ms and bind to WPF Xaml UI d = Observable.Interval(TimeSpan.FromMilliseconds(1)).Delay(TimeSpan.FromSeconds(20)).Subscribe((ms) => { Random r = new Random(); for (int i = 0; i < FakeData.Count; i++) { FakeData[i].Qty = -1000 * r.NextDouble(); (2) Perforator:FRPS/DRAR higher=> data updates faster (Key Observation: INPC has 9x FRPS than non-standard that use intermediate storage (3) VisualProfiler CPU % DispatcherInvoke -- Dispatcher Operation, Increasing to high=> buggy code, e.g too many timer pushing UI Rendering Thread --- Unmanaged render pass (Brushed, tessalation, call DirectX), find only the visual element and draws the whole window at a 60 FPS as default, popularly called as Composition Thread , Graphics acceleration uses Channel Protocol to communicate with the other thread. High and increasing % => need to Profile to feed XPerf, GPUViwer, WPA, etc in Windows Performance ToolKit. Layout ---measure/arrange passes higher => variable/Compute control size, fast changing text, bad GPU/Box. (4) 1ms -- most stressful 10ms -- Physical limit 16ms per frame =60 FRPS, 250 Human eyeball, Win8 App Fast= 100-200ms
Friday, October 31, 2014
Comparing code using WPF Perforate and Visual Profiler
Sunday, October 19, 2014
Rx ObserveOn SubscribeOn
In WPF Window update UI code look like the follwoing: public partial class MainWindow : Window { public MainWindow() { InitializeComponent(); Observable.Interval(TimeSpan.FromSeconds(1)).ObserveOn(Dispatcher).Subscribe((i) => this.Title = i.ToString()); } } or Observable.Interval(TimeSpan.FromSeconds(1)).ObserveOn(this).Subscribe((i) => this.Title = i.ToString()); ObserveOn = Notify Observer on a Dispatcher SubscribeOn = Subscribe/unsubscribe Observers on a scheduler, where background/task pool will run.
Friday, October 10, 2014
Useful Tools, scripts and Concept
Power Shell set path ==================== (Get-ItemProperty -Path 'Registry::HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Environment' -Name PATH).Path $oldPath=(Get-ItemProperty -Path 'Registry::HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Environment' -Name PATH).Path $newPath=$oldPath+’;C:\tools\snoop\’ Set-ItemProperty -Path 'Registry::HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Environment' -Name PATH –Value $newPath Power Shell: ============ get-childitem | select-string "double" ( find string) get-ChildItem -Path c:\log |select-string "error" add user to admin net localgroup Administrators /Add Domain\UserId Color Hex converter =================== http://www.colorschemer.com/online.html Performance Tools and Concept ============================== Vsync --- GPU sync up buffers and refresh rate so no Tearning (frame override previous one). If GPU does not finsh render before one VSync, then could CPU taking too long. XPerf/WPA --ETW (CLR GC/ThreadPool/JIT events, Context Switching, CPU, Page Fault, Disk IO, reg access) best OS logging. some managed code PerfView --- Managed code, Stacks (CPU,Disk IO, GC Head Alloc,Image Load, Managed Load), Stats(GC, JIT,Event) Some details on PerfView:g ========================= (1) Memory -> Task Snapshot of Heap -> filter to your process -> Force GC-> dump your GC heap => can compare be fore after closing of some part of UI to see if Memory get reclaimed. (2)CPU Stack high CPU % path drill down can identify hot code. (3) Run Command " your.exe" will still collect all ETW data so you have to drill down to "your.exe". But this will bracket the time you are interested in. The other is "Collect" (4) PerfView is for managed code. So before any analysis, use VMMap to check workingset break down: Heap vs. Managed Heap size. Also, Task manager private Memory column can tell if Memory stay high and increasing even after some view are closed or running for a long time. (5) Diff requires open two stack viewer of dump to diff. If Comparison to baseline show positive MB=> memory increase or baseline reclaimed GC Heap after GC dump and your app GC Dump. (6) Click on a row in diff stack => trace back to root where you can start to analyze source code for who is holding heap memory. (7) Wall Clock Analysis: Collect ->check Thread Time-> get 3 Thread time view stack in collected data. (8) CPU_TIME, BLOCK_TIME= waiting for eg disk access come back, PAGE_FAULT= Virtual Ram. Ctx Menu Include Item/Back button can focus/exit on CPU_TIME. (9) Zoom in = Select two cell -> set time range (10) Thread Time (With Task)--- charge Child task related time to Parent Task so time will be for the child real work not in the task. Next use Include item can zoom in to the code in the thread that use Wall Clock time.
Tuesday, September 23, 2014
Using DevExpress Layout Manager access a View from Shell
When View (UserControl) cannot be access from Presenter and burried inside inside a shared DLL, you may use Shell Window to walk up/down tree. But some docking situation requires using DockLayoutManager to track down. var shellWindow = _mySvc.GetApplicationShell() as XpfRibbonShellView; if (shellWindow == null) return; shellWindow.Dispatcher.Invoke(new Action(() => { try { // when TheView is docked var wrkSpace = VisualTreeHelpers.FindChild<ContentControl>(shellWindow, "wrkSpace"); var gContent = VisualTreeHelpers.FindChild<GroupPaneContentPresenter>(wrkSpace, "PART_Content"); var lpItemsCtl = VisualTreeHelpers.FindChild<LayoutItemsControl>(gContent); foreach (var i in lpItemsCtl.Items) { // TheView docked top layer var lp = i as LayoutPanel; if (lp != null && lp.Content is TheView) lp.ShowCaption = true; // TheView Tabbed inside another docked var lg = i as LayoutGroup; if (lg == null) continue; foreach (var i2 in lg.Items) { var tg = i2 as TabbedGroup; if (tg != null) { var layoutItems = tg.GetItems(); foreach (var lp2 in layoutItems.Cast<LayoutPanel>().Where(lp2 => lp2.Content is TheView)) { lp2.ShowCaption = true; } } // TheView could end up here if close and then added back whiel in Tab mode var lp3 = i2 as LayoutPanel; if (lp3 != null && lp3.Content is TheView) lp3.ShowCaption = true; } } // When TheView is floating foreach (var lp in shellWindow.DockLayoutManager.FloatGroups.SelectMany(fg => fg.Items.OfType<LayoutPanel>().Where(lp => lp.Content is TheView))) { lp.ShowCaption = true; } // All auto hide layout panel need to be handled here , not just TheView. foreach (var lp in shellWindow.DockLayoutManager.AutoHideGroups.SelectMany(ahGroup => ahGroup.Items).OfType<LayoutPanel>()) { lp.ShowCaption = true; } } catch (Exception ex) { _log.Error(ex); } }));
Sunday, September 21, 2014
Low Latency Programing
Looks like their thinking are influenced by Peter Lowery' s Talk on Friday and possibly by Martin Fowler and Martin Thompson's writing about Low Latency. I think there are two things highly relevant to Insight-Desktop (Carol, Hai, Daniel, Alec also in Peter's talk so please comment) (1) Journaling, aka EventSourcing The idea is to log every input event and replay them to help debug PROD issue in DEV. Obviously, we need a low latency logger (2) Profile Market watch. I think Market Data come in here. We should profile for GC pause, Lock contentioin, Caching. This help us to understand MVVM + Presenter pattern better in term of low latency, Here are key points from their talk/writing Setup correct performance test --- Theory are most likely wrong. GC-Free ---The biggest Performance cost is GC Pause Lock-Free ---- Lock cause Context switching, clear cache line Cache Friendly --- L3 Cache is shared memory cross cores. EventSourcing --- Replay input event to debug PROD in DEV, instead of analysing Log files Actionable Items: (1) Performance Benchmark Test App: A WPF Unit testing simulator can be build for Logging, Journaling, Rx vs. .Net event, Market data <100 micro-second. (2) Logging Improvement: a separate email thread already starting writing a logging using RingBuffer. Shared Memory (L3 cache), Cache Friendly Collectioin Thread Affinity and Isolation Queue (LinkedList, Array) Ring Buffer, In Memory Concurrency Single Threaded, Fx 4.5 Async-Await Journaler, Sequential Disk Array is Cache Friendly. 64-bit Cache key, Concurrent Map/Segments, 1000 Segments. Producer never block to wait consumer Loop unrolling,Lock-coarsing, Inlining Queue has fundamental Issues, Ring Buffer is better but on Dsktop cannot have 10M Network: 10mcs local hop, 10GigE, FPGA market Data http://mechanical-sympathy.blogspot.hk/ http://martinfowler.com/articles/lmax.html - even martin fowler has done a review of it http://lmax-exchange.github.io/disruptor/ - all the source code is on github http://www.infoq.com/presentations/LMAX - presentation on the architecture
Tuesday, September 16, 2014
How to visualize UI hanging using Concurrency Visualizer in VS 2013
Sunday, September 14, 2014
Classical Asyn Pattern and "AsyncRollingFileAppender" using BlockingCollection in Log4Net
Action a = ()=>{ do some work}; a.BeginInvoke(CB, a); void CB(IAsyncResult ar) { Action a = ar.AsyncState as Action; a.EndInvoke(ar); } (1) It seems UI Async cannot be implemented in Fx 4.0. We have to wait for Fx 4.5 async-await to have non-blocking UI (2) AsyncRollingFile is "async-like", slightly block UI. In my test it blocking UI for 9 seconds, then writing file take 40 seconds to complete. It is definetly faster than classic Log4net RollingFile which block UI by the entire 40 secs. (3) Log4Net achieved Async by using Task offloading logging from a queue-like buffer.I also tried TaskCompletionSource and it has similar "short-blocking" async behavior. (4) All these async-alike can lose 20 seconds of data during delayed write to files since app can crash. That is when we really need to log why the app crashed. (5) Fx 4.0 already has a solution to deal with Buffer overflow in Log4NetAsync --- instead of throw away logging, we can block logging. So we slow down but do no lose data. Specifically, I think we can implement IProducerConsumer using RingBuffer (so 2x faster writing files) and feed it to BlockingCollection with Capacity 1000. I tried BlockingCollection, capacity 10 or 1000, default ConcurrentQueue, end up Blocking UI for 20 secs and 70 sec write to files. so much slower. Stopwatch sw = new Stopwatch(); sw.Start(); for (int i = 0; i < Int32.MaxValue / 1000; i++) // your Log.Info sw.Stop(); MessageBox.Show(sw.ElapsedMilliseconds.ToString()); namespace Log4Net.Async { public class AsyncRollingFileAppender : RollingFileAppender { BlockingCollection<LoggingEvent> pendingAppends = new BlockingCollection<LoggingEvent>(10); Task t ; public override void ActivateOptions() { base.ActivateOptions(); t = new Task(AppendLoggingEvents,TaskCreationOptions.PreferFairness); t.Start(); } protected override void Append(LoggingEvent[] loggingEvents) { Array.ForEach(loggingEvents, Append); } protected override void Append(LoggingEvent loggingEvent) { Task.Factory.StartNew(() => { if (FilterEvent(loggingEvent)) { pendingAppends.Add(loggingEvent); } }); } private void AppendLoggingEvents() { LoggingEvent loggingEventToAppend; while (true) { while (!pendingAppends.TryTake(out loggingEventToAppend)) { } if (loggingEventToAppend == null) { continue; } try { base.Append(loggingEventToAppend); } catch { } } while (pendingAppends.TryTake(out loggingEventToAppend)) { try { base.Append(loggingEventToAppend); } catch { } } } } }
Subscribe to:
Posts (Atom)