Code for Parallelism Features Tour

Fri, February 5, 2010, 10:54 PM under ParallelComputing
Last year I linked to a screencast that shows off many VS2010 features delivered by the Parallel Computing team.

There have been requests for the code used to demonstrate the features. Like with all my screencasts, you can see all the code in action, so you could simply type it in. To save you doing that though, you may download the two files with the demo code here: MM.cs and Program.cs. HTH.

Managed code and the Shell – Do?

Sun, January 31, 2010, 06:24 PM under dotNET
Back in 2006 I wrote a blog post titled: Managed code and the Shell – Don't!. Please visit that post to see why that advice was given.

The crux of the issue has been addressed in the latest CLR via In-Process Side-by-Side Execution. In addition to the MSDN documentation I just linked, there is also an MSDN article on the topic: In-Process Side-by-Side.

Now, even though the major technical impediment seems to be removed, I don’t know if Microsoft is now officially supporting managed extensions to the shell. Either way, I noticed a CodePlex project that is marching ahead to enable exactly that: Managed Mini Shell Extension Framework. Not much activity there, but maybe it will grow once .NET 4 is released...

Dev Lead Job opening on my team

Sun, January 3, 2010, 11:00 PM under ParallelComputing
My product unit (Parallel Developer Tools) is hiring a developer lead here in Redmond. This position is specifically on the debugger feature team that I "Program Manage".

So, if you have what it takes and don't mind working with me every single day, click on the link below to read more and apply. You can also send me your resume and I'll make sure it gets to the right place and that you get a prompt response.

There is a very long job description on the Microsoft careers site under job id 707388.

Here is an excerpt from the middle (emphasis mine):
"...
We are in search of a talented and innovative senior lead software design engineer to own development of the debugging tools for data parallelism (including GP-GPU) and HPC Clusters being built by our team.

To be successful, you need to be able to guide careers, design and architect well, communicate and share the best development practices, collaborate with your peers, contribute to the vision, and code significant portions of the solution. We want to hear from you if you're passionate about making your mark in the parallel development space, improving people, and building world-class tools."

Responsibilities include:
Managing a team of senior and junior developers
Design and coding high-quality software
..."

For the full background story, requirements, qualifications and responsibilities please visit the official page.

Best of "The Moth" 2009

Fri, January 1, 2010, 01:01 AM under Personal
Not wanting to break the tradition (2004, 2005, 2006, 2007, 2008) below are some blog posts I picked from my blogging last year. As you can see by comparing with the links above, 2009 marks my lowest output yet with only 64 posts, but hopefully the quality has not been lowered ;-)
1. Parallel Computing was a strong focus of course. You can find links to most of that content aggregated in the post where I shared my entire parallelism session. Related to that was the link to the screencast I shared of the Parallel Computing Features Tour.

2. Parallel Debugging is obviously part of the parallel computing links above, but I created more in depth content around that area of Visual Studio 2010 since it is the one I directly own. I aggregated all the links to that content in my post: Parallel Debugging.

3. High Performance Computing through clusters is an area I'll be focusing more next year (besides parallelism on a single node on the client captured above) and I started introducing the topic on my blog this year. Read the (currently) 6 posts bottom up from my category on HPC.

4. Windows 7 Task Manager. In April I shared a screenshot which was the most "borrowed" item from my blog (I should have watermarked it ;-)

5. Windows Phone non-support in VS2010. Did my bit to spread clarification of the story.

6. Window positions in Visual Studio is a long post, but one that I strongly advise all VS users to read and benefit from.

7. Bug Triage gives you a glimpse into one thing all (Microsoft) product teams do.


If you haven't yet, you can subscribe via one of the options on the left. Either way, thank you for staying tuned… Happy New Year!

Bug Triage

Sun, December 6, 2009, 07:56 PM under SoftwareProcess
In this blog post brain dump, I'll attempt to describe the process my team tries to follow when dealing with new bug reports (specifically, code defect reports). This is not official Microsoft policy, just the way we do things… if you do things differently and want to share, you can do so at the bottom in the comments (or on your blog).

Feature Triage Team
A subset of the feature crew, the triage team (which has representations from the PM, Dev and QA disciplines), looks at all unassigned bugs at regular intervals. This can be weekly or daily (or other frequency) dependent on which part of the product cycle we are in and what the untriaged bug load looks like. They discuss each bug considering the evidence and make a decision of whether the bug goes from Not Yet Assigned to Assigned (plus the name of the DEV to fix this) or whether it goes from Active to Resolved (which means it gets assigned back to the requestor for closure or further debate if they were not present at the triage meeting). Close to critical milestones, the feature triage team needs to further justify bugs they take to additional higher-level triage teams.

Bug Opened = Not Yet Assigned
Someone (typically an SDET from the QA team) creates the bug item (e.g. in TFS), ensuring they populate all the relevant fields including: Title, Description, Repro Steps (including the Actual Result at the end of the steps), attachments of code and/or screenshots, Build number that they observed the issue in, regression details if applicable, how it was found, if a test case exists or needs to be created etc. They also indicate their opinion on the Priority and Severity. The bug status is left as Not Yet Assigned.

"Issue" versus "Fix for issue"
The solution to some bugs is easy to determine, e.g. "bug: the column name is misspelled". Obviously the fix is to correct the spelling – still, the triage team should be explicit and enter the correct spelling in the bug's Description. Note that a bad bug name here would be "bug: fix the spelling of the column" (it describes the solution, rather than the problem).

Other solutions are trickier to establish, e.g. "bug: the column header is not accessible (can only be clicked on with the mouse, not reached via keyboard)". What is the correct solution here? The last thing to do is leave this undetermined and just assign it to a developer. The solution has to be entered in the description. Behind this type of a bug usually hides a spec defect or a new feature request.

The person opening the bug should focus on describing the issue, rather than the solution. The person indicates what the fix is in their opinion by stating the Expected Result (immediately after stating the Actual Result). If they have a complex suggested solution, that should be split out in a separate part, but the triage team has the final say before assigning it. If the solution is lengthy/complicated to describe, the bug can be assigned to the PM. Note: the strict interpretation suggests that any bug with no clear, obvious solution is always a hole in the spec and should always go to the PM. This also ensures the spec gets updated.

Not Yet Assigned -> Not Yet Assigned (on someone else's plate)
If the bug is observed in our feature, but the cause is actually another team, we change the Area Path (which is the way we identify teams in TFS) and leave it as Not Yet Assigned. The triage team may add more comments as appropriate including potentially changing the repro steps. In some cases, we may even resolve the bug in our area path and open a new bug in the area path of the other team.

Even though there is no action on a dev on the team, the bug still needs to be tracked. One way of doing this is to implement some notification system that informs the team when the tracked bug changed status; another way is to occasionally run a global query (against all area paths) for bugs that have been opened by a member of the team and follow up with the current owners for stale bugs.

Not Yet Assigned -> Resolved
This state transition can only be made by the Feature Triage Team.

0. Sometimes the bug description is not clear and in that case it gets Resolved as More Information Needed, so the original requestor can provide it.
After understanding what the bug item is about, the first decision is to determine whether it needs to go to a dev.

1. If it is a known bug, it gets resolved as "Duplicate" and linked to the existing bug.

2. If it is "By Design" it gets resolved as such, indicating that the triage team does not think this is a bug.

3. If the bug does not repro on latest bits, it is resolved as "No Repro"

4. The most painful: If it is decided that we cannot fix it for this release it gets resolved as "Postponed" or "Won't Fix". The former is typically due to resources and time constraints, while the latter is due to deciding that it is not important enough to consume our resources in any release (yes, not all bugs must be fixed!). For both cases, there are other factors that contribute to the decision such as: existence of a reasonable workaround, frequency we expect users to encounter the issue, dependencies on other team to offer a solution, whether it breaks a core scenario, whether it prohibits customer feedback on a major feature, is it a regression from a previous release, impact of the fix on other partner teams (e.g. User Education, User Experience, Localization/Globalization), whether this is the right fix, does the fix impact performance goals, and last but not least, severity of bug (e.g. loss of customer data, security threat, crash, hang). The bar for fixing a bug goes up as the release date approaches. The triage team becomes hardnosed about which bugs to take, while the developers are busy resolving assigned bugs thus everyone drives for Zero Bug Bounce (ZBB). ZBB is when you have 0 active bugs older than 48 hours.

Not Yet Assigned -> Assigned
If the bug is something we decide to fix in this release and the solution is known, then it is assigned to a DEV. This is either the developer that will do the work, or a Lead that can further assign it to one of his developer team based on a load balancing algorithm of their choosing.

Sometimes, the triage team needs the dev to do some investigation work before deciding whether to take the fix; similarly, the checkin for the fix may be gated on code review by the triage team. In these cases, these instructions are provided in the comments section of the bug and when the developer is done they notify the triage team for final decision.

Additionally, a Priority and Severity (from 0 to 4) has to be entered, e.g. a P0 means "drop anything you are doing and fix this now" whereas a P4 is something you get to after all P0,1,2,3 bugs are fixed.

From a testing perspective, if the bug was found through ad-hoc testing or an external team, the decision is made whether test cases should be added to avoid future regressions. This is communicated to the QA team.

Assigned -> Resolved
When the developer receives the bug (they should be checking daily for new bugs on their plate looking at bugs in order of priority and from older to newer) they can send it back to triage if the information is not clear. Otherwise, they investigate the bug, setting the Sub Status to "Investigating"; if they cannot make progress, they set the Sub Status to "Blocked" and discuss this with triage or whoever else can help them get unblocked. Once they are unblocked, they set the Sub Status to "Working on Solution"; once they are code complete they send a code review request, setting the Sub Status to "Fix Available". After the iterative code review process is over and everyone is happy with the fix, the developer checks it in and changes the state of the bug from Active (and Assigned to them) to Resolved (and Assigned to someone else).

The developer needs to ensure that when the status is changed to Resolved that it is assigned to a QA person. For example, maybe the PM opened the bug, but it should be a QA person that will verify the fix - the developer needs to manually change the assignee in that case. Typically the QA person will send an email to the original requestor notifying them that the fix is verified.

Resolved -> ??
In all cases above, note that the final state was Resolved. What happens after that? The final step should be Closed. The bug is closed once the QA person verifying the fix is happy with it. If the person is not happy, then they change the state from Resolved to Active, thus sending it back to the developer. If the developer and QA person cannot reach agreement, then triage can be brought into it. An easy way to do that is change the status back to Not Yet Assigned with appropriate comments so the triage team can re-review.

It is important to note that only QA can close a bug. That means that if the opener of the bug was a PM, when the bug gets resolved by the dev it may land on the PM's plate and after a quick review, the PM would re-assign to an SDET, which is the only role that can close bugs. One exception to this is if the person that filed the bug is external: in that case, we leave it Resolved and assigned to them and also send them a notification that they need to verify the fix. Another exception is if specialized developer knowledge is needed for verifying the bug fix (e.g. it was a refactoring suggestion bug typically not observable by the user) in which case it is fine to have a developer verify the fix, and ideally a different developer to the one that opened the bug.

Other links on bug triage
A quick search reveals that others have talked about this subject, e.g. here, here, here, here and here.

Your take?
If you have other best practices your team uses to deal with incoming bug reports, feel free to share in the comments below or on your blog.

Parallel Computing Features Tour in VS2010

Tue, November 17, 2009, 03:26 PM under ParallelComputing
Just realized that I have not linked from here to a screencast I recorded a couple weeks ago that shows the API, parallel debugger and concurrency visualizer in VS2010. Take a few minutes to watch the VS2010 Parallel Computing Features Tour.

MPI Cluster Debugger launch integration in VS2010

Sat, November 14, 2009, 11:55 PM under ParallelComputing | HPC
Let's assume that you have all the HPC bits installed and that you have existing MPI code (or you created a "Hello World" project using the MPI project template). Of course, you create a single MPI application and at runtime it will correspond to multiple processes (of the same app) launched on multiple nodes (i.e. machines) on the cluster. So how do you debug such a situation by simply hitting the familiar "F5" keystroke (i.e. Debug -> Start Debugging)?

WATCH IT INSTEAD OF READING ABOUT IT
If you can't bear to read through all the details below, just watch this 19-minute screencast explaining this VS2010 feature. Alternatively, or even additionally, keep on reading.

REQUIREMENT
When you debug an MPI application, you would want the copying of resources from your client machine (where Visual Studio is installed) to each compute node (where Windows HPC Server is installed) to take place automatically for you. 'Resources' in the previous sentence includes your application binary, plus any binary or data dependencies it may have, plus PDBs if needed, plus the debug CRT of the correct bitness, plus msvsmon for remote debugging to work. You would also want, after copying is complete, to have your app and msvsmon launched and attached so that you can hit breakpoints back in Visual Studio on your client machine. All these thing that you would want are delivered in VS2010.

STEPS TO F5
1. In your MPI project where you have placed a breakpoint go to Project Properties -> Configuration Properties -> Debugging. Ensure the "Debugger to launch" combo box value is set to MPI Cluster Debugger.

2. There are a whole bunch of properties here and typically you can ignore all of them except one: Run Environment. By default it is set to run 1 process on your local machine and if you change the number after that to, for example, 4 it will launch 4 processes of your app on your local machine.

You want this to run on your cluster though, so go to the dropdown arrow at the end of the Run Environment cell and open it to expose the "Edit Hpc node" menu which opens the Node Selector dialog:

In this dialog you can enter (or pick from a list) the cluster head node name and then the number of processes you want to execute on the cluster and then hit OK and… you are done.

3. Press F5 and watch your breakpoint get hit (after giving it some time for copying, remote execution, attachment and symbol resolution to take place).

GOING DEEPER
In the MPI Cluster Debugger project properties above, you can see many additional properties to the Run Environment. They are all optional, but you may want to understand them in order to fine tune your cluster debugging. Read all about each one of these on the MSDN page Configuration Properties for the MPI Cluster Debugger.

In the Node Selector dialog above you can see more options than just the Head Node name and Number of Process to run. They should be self-explanatory but I also cover them in depth in my screencast showing you an example of why you would choose to schedule processes per core versus per node. You can also read about these options on MSDN as part of the page How to: Configure and Launch the MPI Cluster Debugger.

To read through an example that touches on MPI project creation, project properties, node selector, and also usage of MPI with OpenMP plus MPI with PPL, read the MSDN page Walkthrough: Launching the MPI Cluster Debugger in Visual Studio 2010.

Happy MPI debugging!

Parallel Debugging

Thu, November 12, 2009, 09:06 PM under ParallelComputing
Using Visual Studio 2010 parallel debugging is easy. Two new debugging windows provide a total view of the internals of your PPL and TPL applications with hints on where to start investigations. These are not mere extensions to VS, but tightly integrated with the rest of the debugger experience, so you don't need to learn many new techniques. Use them in your program to eclipse bugs from existence!

One of the most FAQ I receive is links to VS2010 parallel debugging content and rather than keep sending many, I decided to gather them all under one permalink, hence this multi link blog post.

- MSDN Magazine article on Parallel Debugging.
- Screencast of sample code from the article.

- MSDN Walkthrough: Debugging a Parallel Application (VB, C++, C#).
- Screencast of walkthrough for Parallel Stacks.
- Screencast of walkthrough for Parallel Tasks.

- MSDN "How To" on Parallel Tasks.
- MSDN "How To" on Parallel Stacks.

- Detailed blog post on Parallel Tasks.
- Detailed blog post on Parallel Stacks.
- Detailed blog post on Parallel Stacks - Tasks View.
- Detailed blog post on Parallel Stacks - Method View.

- Download slides on Parallel Tasks and Parallel Stacks (pptx).

If you have questions on these, please post to any of the parallel computing forums or the debugging forum (your question will be routed to me if nobody else can answer it).

"Parallel Programming Talk" show

Wed, November 11, 2009, 08:09 PM under Links
Over at the Intel Software Network Aaron Tersteeg runs a "Parallel Programming Talk" audio show on which I was invited as a guest (for the 55th episode) to talk about Microsoft's parallelism offerings in Visual Studio 2010. The call started at 7:45AM, so if my voice sounds croaky to you, now you know why ;)

Check out the 20-minute chat (and related hyperlinks) on Aaron's blog.

Message Passing Interface (MPI)

Wed, November 11, 2009, 04:01 PM under ParallelComputing | HPC
So you have installed your cluster and you are done with introductory material on Windows HPC. Now you want to develop an application with the most common programming model: Message Passing Interface.

The MPI programming model is a standard with implementations from many vendors. For newbies (like myself!), I have aggregated below links for getting started.

Non-Microsoft MPI resources (useful even if you are not on the Windows platform)
1. Message Passing Interface on wikipedia.

2. The MPI standard.
3. MPICH2 - an MPI implementation.
4. Tutorial on MPI by William Gropp.
5. MPI patterns presented as a tutorial with sample code.

6. THE official MPI Forum (maintains the standard) including the wiki discussing the MPI future.

7. Great MPI tutorial including at the end the MPI Exercise.

8. C++ MPI Exercises by John Burkardt.

9. Book online: MPI The Complete Reference.


MS-MPI
10. Windows HPC Server 2008 - Using MS-MPI whitepaper (15 page doc).

11. Tracing MPI applications (27 page doc).

12. Using Microsoft MPI (TechNet section).

13. Windows HPC Server MPI forum (for posting questions).


MPI.NET
14. MPI.NET Home Page (not owned by Microsoft).
15. MPI.NET Tutorial.

16. HPC Development using F# using MPI.NET (38 page doc).


Next time I'll post resources for the Microsoft Cluster SOA programming model - happy coding...