Why Care About Parallelism aka The Inevitable Shift

Sun, February 15, 2009, 09:56 PM under ParallelComputing
That was the title of a 45-minute (inc. questions) presentation I gave last December. It was a basic introduction to the manycore shift including what is parallelism and why software developers should care. The session ended by touching at a very high level on what Microsoft is doing in this space.

It was slides only (well, there were 3 demo/sample apps shown but no code) and you can download the deck in a ZIP file (the slides are a montage from many other decks of other Microsoft employees).

The basic flow has as follows
slide 3: Understanding Moore's Law
slide 4-7: Moore's law is still alive, but is not translated to higher frequencies. That is mainly due to the power wall, which at the end of the day means more heat than the CPU manufacturers can deal with
slide 8: So instead the manycore shift enters with CPU manufacturers adding more cores to the die rather than making a single one go faster. Predictions are for 100-core machines within 5 years (for the record, these are not my predictions)
slide 9: For us software developers, to take advantage of the increased total horsepower of manycore machines (on the client/desktop) you must employ parallelism. No other magic bullet or automatic gain. It is naïve to think that we will not need the increased speed or that we don’t know what to do with it:
a. We have been taking advantage (implicitly or explicitly) of increased CPU speeds over our entire existence. Why do we think we'll stop now?
b. Every shift in the software industry (whether it is the move from console to GUI apps, or desktop to mobile apps or even the recent client side web programming advancements) has been partly possible due to being able to take advantage of higher processor speeds. Why will the next shift in computing (whatever that is) be different?
slide 10: DEMO the morphing application (same one I showed at Tech Ed EMEA)
slide 11: Important to note that not all of the additional cores will be as fast as today's CPUs – they will more likely be of lower frequency; so to get even the same output that we get from one core today, we'll have to use parallelism to leverage more than one cores.
slide 12: Also important to note that it isn’t just Microsoft telling this story. Virtually every industry player is predicting the same situation.
slide 13: So the question is: what do I do with all those cores? Besides the same goals that good multithreading has (responsiveness, scalability and latency-awareness) parallelism takes it to the next level.
slide 14 to 15: Obey Amdahl's Law: do the same thing, but genuinely faster
slide 16: Obey Gustafson's law: do more stuff in the same time!
slide 17: Use speculative execution.
slide 18: DEMO the RayTracer application (same one I showed at PDC)
slide 19: "OK, I am sold. I must use parallelism. Show me how"… "Well, actually it is darn hard today if you try and use traditional multithreading to achieve parallelism"
slide 20: Microsoft established the Parallel Computing Initiative to address the goals/symptoms above
slide 21: Not the only team in Microsoft thinking about this problem. Attacking it from many angles.
slide 22: DEMO Baby Names application
slide 23: I bet you want to see some code… Read the Summary slide, and let's move on to the next session.

Give a session on Parallel Programming (or just learn from it)

Thu, February 5, 2009, 02:15 PM under ParallelComputing
Last year gave the same session on Parallel Programming twice: at PDC2008 and Tech Ed EMEA 2008 (identical content). The fact that those sessions ended up on the #3 and #2 spots in the ranking order, speaks to the fact that people are really interested and accepting of this topic. It is also testament that the technology Microsoft is releasing with Visual Studio 2010 is very compelling. So I invite you here to take my content and reuse it in your local regions!

The recordings (and slides) of the two identical sessions are available so you can learn by watching them: links from here and here.

I have also captured the session content on this blog:

1. Briefly introduce the manycore shift and clarify the release vehicle for Parallel Extensions.
2. Run one of the samples that ship with Parallel Extensions to demonstrate the end user benefit (no code shown at this point).
3. Clarify the potential difference between parallelism and multi-threading.
4. DEMOnstrate Fine Grained Parallelism via the Task-based Programming model built on the new ThreadPool.
5. DEMOnstrate Debugging Parallel Applications.
6. DEMOnstrate Structured Parallelism via the static Parallel class, e.g. Imperative Data Parallelism.
7. DEMOnstrate Declarative Data Parallelism: PLINQ.

Many conferences/user groups are interested in technical sessions on Parallel Programming in .NET 4.0 and Visual Studio 2010 so use the links above to learn and share.

PLINQ

Sun, January 25, 2009, 06:59 PM under ParallelComputing | LINQ
With VS2008 (more specifically .NET Framework 3.5) a wonderful thing was introduced: LINQ. Given the declarative nature of Language Integrated Query it was a prime candidate for trying to inject automatic parallelization in it (i.e. to run faster by seamlessly taking advantage of multiple cores). The result of those efforts is what I mentioned 18 months ago (Parallel LINQ) and followed with a screencast 4 months later: see the 2nd link in the list here. In this post I'll do a written overview based on the latest bits. Before we continue, you should understand that PLINQ applies only to LINQ to Objects (i.e. IEnumerable-based sources where lambdas are bound to delegates, not IQueryable-based sources where the lambdas are bound to expressions). It also does not interfere with the deferred execution principles of LINQ, of course.

PLINQ as a black box
PLINQ is really simple if you want to treat it as a black box; all you do as a user is add the .AsParallel extension method to the source of you LINQ query and you are done! The following query
var result =
from x in source
where [some condition]
select [something]
...can be parallelized as follows:
var result =
from x in source.AsParallel()
where [some condition]
select [something]
Notice that the only difference is the AsParallel method call appended to the source and we can of course use this pattern with more complex queries.

Why Does It Work
To understand why the above compiles we have to remind ourselves of how LINQ works and that the first version of the code above is really equivalent to:
var result =  source.Where(x => [some condition]).Select(x => [something]);
...so when we parallelize it we are simply changing it to be the following:
var result =  source.AsParallel().Where(x => [some condition]).Select(x => [something]);
In other words the call to AsParallel returns something that also has the typical extension methods of LINQ (e.g. Where, Select and the other 100+ methods). However, with LINQ these methods live in the static System.Linq.Enumerable class whereas with PLINQ they live in the System.Linq.ParallelEnumerable class. How did we transition from one to the other? Well, AsParallel is itself an extension method on IEnumerable types and all it does is a "smart" cast of the source (the IEnumerable) to a new type which means the extension methods of this new type are picked up (instead of the ones directly on IEnumerable). In other words, by inserting the AsParallel method call, we are swapping out one implementation (Enumerable) for another (ParallelEnumerable). And that is why the code compiles fine when we insert the AsParallel method. For a more precise understanding, in the VS editor simply right click on AsParallel, choose Go To Definition and follow your nose from there…

How Does It Work
OK, so we can see why the above compiles when we change the original sequential query with our parallelised query, which we now understand is based on the introduction of new .NET 4 types such as ParallelQuery and ParallelEnumerable – all in System.Core.dll in the System.Linq namespace. But how does the new implementation take advantage (by default when it is worth it) of all the cores on your machine? Remember our friendly task-based programming model? The implementation of the methods of the static ParallelEnumerable class uses Tasks ;-). Given that the implementation is subject to change and more importantly given that we have not shipped .NET 4 yet, I will not go into exactly how it uses the Tasks, but I leave that to your imagination (or to your decompiler-assisted exploration ;)).

Simple Demo Example
Imagine a .NET 4 Console project with a single file and 3 methods, 2 of which are:
  static void Main()
{
Stopwatch sw = Stopwatch.StartNew();
DoIt();
Console.WriteLine("Elapsed = " + sw.ElapsedMilliseconds.ToString());
Console.ReadLine();
}
static bool IsPrime(int p)
{
int upperBound = (int)Math.Sqrt(p);
for (int i = 2; i <= upperBound; i++)
{
if (p % i == 0) return false;
}
return true;
}
…without worrying too much about the implementation details of IsPrime (I stole this method from the walkthrough you get in the VS2010 CTP). So the only question is where is the 3rd method, which clearly must be named Doit. Here you go:
  static void DoIt()
{
IEnumerable arr = Enumerable.Range(2, 4000000);
var q =
from n in arr
where IsPrime(n)
select n.ToString();
List list = q.ToList();
Console.WriteLine(list.Count.ToString());
}
Now if you run this you will notice that on your multi-core machine only 1 core gets used (e.g. 25% CPU utilization on my quad core). You'll also notice in the console the number of milliseconds it took to execute. How can you make this execute much faster (~4 times faster on my machine) by utilizing 100% of your total CPU power? Simply change one line of code in the Doit method:
from n in arr.AsParallel()
How cool is that?

Can It Do More
What the PLINQ implementation does is it partitions your source container into multiple chunks in order to operate on them in parallel. You can configure things such as the degree of parallelism, control ordering, specify buffering options, whether to run parts of the query sequentially etc. To experiment with all that, just explore the other new extension methods (e.g. AsOrdered, AsUnordered) and, finally, the new enumerations (e.g. ParallelQueryMergeOptions). I leave that experimentation to you dear reader ;)

Insert and Format Images in Your pptx

Sun, January 18, 2009, 11:54 AM under AboutPresenting
I have given quite a few technical presentations in my time and anyone that has attended one will tell you that I believe in demo-driven sessions (I have *never* given a session that had less than 50% demo time and most of them met my goal to be close to 75%+). Having said that, the few slides that a session has are important and what is equally important in my opinion is to strive for an image per slide!

If you can't find an image that conveys the message of the slide, then maybe your slide is trying to convey too much; if the image does not fit on your slide, then maybe your slide is too busy; if you can't tie an image to the message, then maybe you can insert some humorous image. So, I think of it as a quality gate for my slide: if I can’t insert an image, there is something wrong with the slide. If you don’t agree with that, then still insert an image in order to please the people that think more visually than others and also to add some color to your deck ;-)

After you have inserted an image, please use the tools offered by PowerPoint to make it aesthetically pleasing. When you select the image, a new tab appears in the PowerPoint 2007 ribbon with tons of options - explore them:


It is surprising how many times people ask me how I created a glow effect or a reflection (aka mirroring) effect etc. Depending on your personal preferences and the theme of your deck, some options work better than others, but by far my favorite and the one I start with as a default is preset 5:

Please try it now on a slide: insert an image twice and apply the preset on one and leave the other "plain/default". Can you see the difference in quality? Try it projected on a huge screen and you'll never go back…

There you have it! I shared the secret to the images in my decks ("big deal" I know, but oddly it took me some time to be comfortable sharing this nonetheless ;-)

Moth Calendar 2009

Sat, January 17, 2009, 02:57 AM under Random
When I lived in the UK I was always part of the developer community: in the early days of my career as an attendee, later as an MVP and, finally, as a Microsoft person when I joined the company.

It sounds like the community people in the UK miss my interactions as much as I do, because the other day my approval was sought for a 2009 calendar of community events where in each month there is a picture of me (sounds weird I know!). I gave my permission and Craig posted the result on his blog.

Besides 12 photos of my ugly mug accompanied by (what they think are) funny captions, each page has details of the UK community events taking place that month (I suspect that is the main purpose of the 2009 Community Calendar ;-)

Windows 7 and Server 2008 R2

Fri, January 9, 2009, 12:01 AM under Windows
The Beta of Windows 7 and Windows Server 2008 R2 are available to download. This is the release which supports up to 256 cores, and you can see a screenshot of a machine running the OS with 96 cores (!) on Mike's blog.

Parallelising Loops in .NET 4

Wed, January 7, 2009, 05:58 AM under ParallelComputing
Often the source of performance issues in our code is loops, e.g. while, for, foreach. With .NET 4 it becomes easy to make such code perform better by taking advantage of multiple cores.

Parallel.ForEach
For example, given:
IEnumerable<string> arr = ...
foreach (string item in arr){
// Do something
}
, we can parallelise it is as follows:
Parallel.ForEach<string>(arr, delegate (string item){
// Do something
});
, or the tidier directly equivalent version (dropping the superfluous generic which can be inferred and turning the anonymous method syntax into a lambda statement)
Parallel.ForEach (arr, (string item) =>{
// Do something
});

Visual Distinctions
Notice the obvious visual similarities that make it almost automatic to parallelise a loop: the only difference in the parallel version is the modification in the first line (rearranging the "arr" and "string item", which are the real pieces of information) and the fact that after the closing brace at the end there is a closing parenthesis and semicolon. The crucial visual observation here is that the body of the loop remains intact.

Why Does It Work
Let's drill into why the modified code compiles and why it is equivalent in intent (even if it is obvious to some). We turned a block of code into a library method call. The .NET 4 (mscorlib) library offers the static class Parallel that (among others) offer the ForEach method. One of its overloads (its simplest) accepts 2 parameters: a source IEnumerable of TSource and a body of code (in the form of the Action of TSource delegate, of course) that accepts a single parameter which is also of TSource, of course. The method will take the body and call it once for each element in the source. If you reread the last 2 sentences you'll find that is exactly what the original loop construct does as well. The real difference here is that the original runs serially (using only a single core) while the modified runs in parallel (using, by default, all cores).

How Does It Work
Those of you that don't like black magic boxes will ask: what does that method actually do inside in order to run things in parallel? My answer: what do you think it needs to do? Remember our friendly task-based programming model? The implementation of the methods of the static Parallel class uses Tasks (and specifically SelfReplicating tasks). Given that the implementation is subject to change and more importantly given that we have not shipped .NET 4 yet, I will not go into exactly how it uses the Tasks, but I leave that to your imagination (or to your decompiler-assisted exploration ;)).

Trivial Demo Example
In a .NET 4 Console project paste the following in the Program.cs file:
  static string[] arr = Directory.GetFiles(@"C:\Users\Public\Pictures\Sample Pictures", "*.jpg");
static void SimulateProcessing() {
Thread.SpinWait(100000000);
}
static string TID {
get {
return " TID = " + Thread.CurrentThread. ManagedThreadId.ToString();
}
}
Now in the empty Main function paste the following:
    foreach (string ip in arr) {
Program.SimulateProcessing();
Console.WriteLine(ip + TID);
}
Console.ReadLine();
Run it and notice how long it takes as well as that only one thread gets used of course and in Task Manager notice the CPU usage. Now change the loop construct so they are as follows:
Parallel.ForEach(arr, (string ip) => {
Program.SimulateProcessing();
Console.WriteLine(ip + TID);
});
Re-run it and notice how much faster it runs and how the number of threads equals the number of cores on your machine and in Task Manager the CPU usage being at 100%.

Why Not Do It Automatically
Many that see this technology ask "Why not automatically change all loops to run parallelised?". The answer is that you cannot blindly apply a Parallel.ForEach wherever you have a foreach loop. If the body of the loop depends on some shared state, or if each loop iteration is not independent of every other iteration, then race conditions may arise by blindly parallelising. Ultimately, it is multiple threads that execute the body in parallel so there is no room for shared state etc. The static methods of the Parallel class have no magic to deal with that – it is still down to you. If you find yourself needing synchronization in the loop body, be sure to measure the performance because locks and such in a parallelisation scenario potentially negate (or severely limit) the benefits of parallelisation. It is for these reasons that parallelising a loop is an opt-in decision today that only you can make for your code.

A related question arises of why not embedding this functionality in the language (the obvious suggestion being introducing a pfor loop construct). The answer is that having it as a library offering instead of built-in to the language allows it to be used by *all* .NET languages instead of restricting it to a few. Also, once something is embedded into the language it typically stays there forever so we take great care about such decisions e.g. it is too early to tie C# or VB to the System.Threading.Tasks namespace.

For the distant imaginary future, we are thinking about automatically parallelising bits of code if we can (with hints from the application developer with regards to purity and side-effect-free regions of code) and also embedding parallel constructs deeper into the language. Too early to know if anything will come of that...

Can It Do More
Yes! We only saw above one of the overloads of one of the methods. Parallel.ForEach has ~20 other overloads, some of them taking up to 5 arguments and all of them having a return type too; what I am trying to say is that there is much more flexibility and richness even in this simple API. I encourage you to explore the other overloads and also the other two methods on the Parallel class: For and Invoke.

Best of "The Moth" 2008

Thu, January 1, 2009, 01:25 AM under Personal
Happy New Year! Regular readers know that on this day I gather links to my own favorite blog posts of the past year (like I did in 2004, 2005, 2006 and 2007). Enjoy the 18 links below (out of the 122 blog entries I made in 2008)!

01. Visual Studio 2008
At the start of the year I completed my multi-month series on VS2008 and .NET 3.5 topics by writing a short article for TechNet and a longer one for QBS. I also recorded more screencasts on this topic including about Client App Services, Sync Services and the MAF. I linked to those 3 from the resources post of the session I performed/delivered most in 2008: Five VS2008 Smart Client Features.

02. Silverlight 2 Beta 1
After putting VS2008 behind me, I spent a lot of my time getting up to speed on Silverlight 2 and creating a (what turned out to be a very popular and highly ranked :-) session in the Beta 1 timeframe. I blogged a lot about the technology and most of my posts are linked to from this single Silverlight post.

03. Presentation Tips
Early in the year I wrote 2 posts to help you with the basics of setting up your machine for the most important part of a presentation (the demos): Setting Up the Laptop and Setting Up Visual Studio.

04. Other non-technical
This was the year I transitioned from Europe to the US, the side effects including a blog post with a list for settling in that others found useful: Getting a USA life. The transition was also to a new role joining the hordes of Microsoft people that spend a lot of time in Outlook – this inspired me to come up with some Email Rules.

05. Debugging
After settling in, I found myself living in the Visual Studio debugger quite a bit and sharing (via the blog) findings, advice and tips. For example: name your threads, 2 cool tips, make object id, debuggerdisplayattribute and, my favorite, understanding the terminology behind active and current stack frame (and current thread).

06. Parallelism
No surprise that parallelism is featured on this blog this year (as it was last year) and it should be no surprise that it will continue to be prominent here next year. My goal is to deliver shorter posts in the future, but for now you can use a cup of your favorite beverage while consuming my thoughts on: Threading vs Parallelism, Fine Grained Parallelism, Not Explicitly Using threads for Parallelism, the CLR 4 ThreadPool engine and the new Task type.

Thank you for reading, make sure you don't miss a post in 2009 by subscribing to this blog – click on the link on the left.

Introducing the new Task type

Tue, December 30, 2008, 06:46 AM under ParallelComputing
In a previous post I made the point about the need to finely partition our compute bound operations and enumerated the benefits of fine grained parallelism. In another post I showed how it is a mistake to directly use Threads to achieve fine grained parallelism. The problem was that the unit of partitioning in our user mode app was also the unit of scheduling of the OS.

System.Threading.Tasks.Task
We are introducing in mscorlib of .NET 4 the System.Threading.Tasks.Task type that represents a lightweight unit of work. The code from my previous post would look like this with Tasks (and it does not suffer from any of the 3 problems that the original code suffers from):
static void WalkTree(Tree tree) 
{
if (tree == null) return;
Task left = new Task((o) => WalkTree(tree.Left));
left.Start();
Task righ = new Task((o) => WalkTree(tree.Righ));
righ.Start();
left.Wait();
righ.Wait();
ProcessItem(tree.Data);
}
Tasks run on the new improved CLR 4 ThreadPool engine – I will not repeat here in this post the performance and load balancing benefits, but will instead focus on the rich API itself.

Creation and Scheduling
An example of the API is what we saw above where we used the Task with the same pattern that we use threads (create and then later start). You can see another example of the creation API if we modify the original Main method to look like this:
static void Main() 
{
Tree tr = Tree.CreateSomeTree(9, 1);
Stopwatch sw = Stopwatch.StartNew();
Task t =Task.StartNew(delegate { WalkTree(tr); });
t.Wait();
Console.WriteLine("Elapsed= " + sw.ElapsedMilliseconds.ToString());
Console.ReadLine();
}
Notice how we can create Tasks and start them with a single statement (StartNew), which is similar to how we use the ThreadPool.QueueUserWorkItem with the added benefit of having the reference to the work in the form of the variable 't'.

Waiting
Also notice above how we preserve the semantics of the code prior to the change by waiting for the work to complete before the Console.WriteLine statement. We saw this method further above in the method WalkTree. In fact in WalkTree, we can change the two calls (left.Wait and righ.Wait) with the more flexible Task.WaitAll(left, right) and there are other options such as a WaitAny method that would block only until any one of the tasks you pass into it complete.

Continuations
We can further change the body of the Main method as follows:
Tree tr = Tree.CreateSomeTree(9, 1);   
Stopwatch sw = Stopwatch.StartNew();
Task t = Task.StartNew(delegate{ WalkTree(tr);});
t.ContinueWith(tt => Console.WriteLine("Done"), TaskContinuationKind.OnAny);
t.Wait(2500);
Console.WriteLine("Elapsed= " + sw.ElapsedMilliseconds.ToString());
Notice how we are waiting with a timeout this time which means that after 2.5 seconds we will see on the console "Elapsed..." (given that our WalkTree work takes longer than that to complete). However, at that point the CPU usage will remain at 100% as our work is still being executed. When it completes, as the CPU usage drops down again, we will also see in the console "Done". This should verify your expectation of the self explanatory ContinueWith method. It is a very powerful method (more here) that enables patterns such as pipelining. You can have many continuations off the same task and you can configure the circumstances under which to continue via the TaskContinuationKind that I encourage you to explore along with the various overloads.

Cancellation
Cancellation is well integrated in the API. Cancelling a task that is scheduled in some queue and has not executed yet means that it will not be executed at all. For a task that is already running, cooperation is needed which means that the task can check a boolean property (IsCancellationRequested) to see if cancellation was requested and act accordingly. Finally, you can see if a task is actually cancelled via another boolean property (IsCanceled) on the Task type. If we modify the 2 lines of code above as follows:
    t.ContinueWith(tt => Console.WriteLine("done"));
t.Wait(2500);
t.Cancel();
...we will see the "Elapsed" message followed immediately by a drop in CPU utilization and the "Done" message.
Note that for the cancelation above to behave as expected, we are assuming that when we cancel a Task, all tasks created in that scope also get cancelled, i.e. when we cancel 't' all the tasks created in WalkTree also get cancelled. This is not the default, but we can easily configure it as such by changing the ctor call in WalkTree for both left and right to be as follows:
...= new Task((o) => WalkTree(tree.Left), TaskCreationOptions.RespectParentCancellation);

Parent Child Relationships
The above correctly implies that there is a parent child relationship between tasks that are created in the scope of an executing task. It is worth noting that parent tasks implicitly wait for their children to complete which is why the waiting worked as expected further above. If we wanted to opt out of that we can create detached children via the TaskCreationOptions.Detached option. I encourage you to experiment with the other TaskCreationOptions...

Task with Result
Let's go way back and peek at the original serial implementation of WalkTree and let's modify it so it actually returns a result:
static int WalkTree(Tree tree) 
{
if (tree == null) return 0;
int left = WalkTree(tree.Left);
int righ = WalkTree(tree.Righ);
return ProcessItem(tree.Data) + left + righ;
}
...as we ponder the question of "How do we parallelize that?" take look again at the code we have at the top of this post that parallelized the version that did not return results.
We can change it to return 0 when there are no more leaf nodes and change it to return the results of ProcessItem, but we have an issue with how to obtain the results of the WalkTree(righ) and WalkTree(left) and add them to our return results. In other words: we are passing a delegate to the Task ctor that returns a result and we need a way to store it somewhere. The obvious place to store it is the Task itself! However, we want this strongly typed so we use generics and we have type that inherits from Task which is Task<T> (in the CTP bits it is called a Future<T>). This new type has a property for returning the Value and the call will block if the task is still executing or it will return immediately if it has executed and the value is already stored. So the code can be modified as follows:
static int WalkTree(Tree tree) 
{
if (tree == null) return 0;
Task<int> left = new Task<int>((o) => WalkTree(tree.Left), TaskCreationOptions.RespectParentCanellation);
left.Start();
Task<int> righ = new Task<int> ((o) => WalkTree(tree.Righ) , TaskCreationOptions.RespectParentCanellation);
righ.Start();
return ProcessItem(tree.Data) + left.Value + righ.Value;
}
Note that if we did not want to block on Value then we could have queried the IsCompleted property of the Task.

In Summary
Above I have given you a brief glimpse of the rich API that Task exposes (and there is a lot more such as a nice exception handling model that aggregates exceptions thrown in parallel into a single AggregateException). Combined with my other posts referenced above, you should feel comfortable (if not compelled) to use this new Task API in all parallelism scenarios where previously you considered using directly Threads or the ThreadPool. Furthermore, the rich API has hopefully enticed you to use it even if you had not considered the ThreadPool or threads before.

OOF

Sat, December 13, 2008, 12:32 PM under Communication
It is the season where many people are Out of Office (which is not acronymed as OOO, but as OOF – read here for explanation).

Our team has a shared calendar on our SharePoint site where everyone adds their holidays/vacations so the PM can take actions accordingly. In addition, it is customary for individuals to send a Meeting Request (called S+ in homage of the Schedule+ product) to the team's distribution list's (DL's) alias that describe the OOF schedule. It is here where you need to be careful.

1. Do not request responses
2. Show the time as Free with no Reminder (remember, your OOF ends up in my calendar so I don’t need a reminder or my calendar to show as anything other than FREE for your vacation)
3. Create a separate appointment for your own calendar that shows the time as "Out of Office" (so people can see that when trying to schedule a meeting with you)
4. Setup your Out Of Office Assistant (from the Tools menu) with an appropriate message (so they understand you will not be responding promptly – also touched on in #33 of my email rules)

There is a more detailed blog post (and the comments section is useful too) here.

Speaking of OOF, I will be out of the physical office starting now and I'll be working from home in Greece until mid-January (with a few holiday and vacation days thrown in for good measure). So, unless you work with me on daily basis, you'll see no change… if you were planning on visiting me in-person, use email instead ;-)