Fri, April 9, 2010, 07:33 AM under
Blogging
Due to blogger.com dropping FTP support, I've had to move my blog. If you are in a similar situation, this post will help you by showing you the necessary steps to take.
Goals
No loss on blog posts, comments AND all existing permalinks continue to work (redirect to the correct place).
Steps
- Download the XML files corresponding to your blogger.com content and store them in a folder.
- Install and configure dasBlog on your local machine.
- Configure your web.config file (will need updating once you run step 4).
- Use the tool I describe further down to generate the content and place it at the right place.
- Test your site locally. Once you are happy, repeat step 2 on your hosting provider of choice. Remember to copy up your dasBlog theme folder if you created one.
- Copy up the local web.config file and the XML dasBlog content files generated by the tool of step 4.
- Test your site on the server. Once you are happy, go live (following instructions from your hoster). In my case, I gave the nameservers from my new hoster to my existing domain registrar and they made the switch.
Tool (code)
At step 4 above I referred to a tool. That is an overstatement, it is simply one 450-line C#code file that you can download here: BloggerToDasBlog.cs. I used this from a .NET 2.0 console app (and I run it under the Visual Studio debugger, i.e. F5) like this: Program.cs. The console app referenced the dasBlog 2.3 ASP.NET Blogging Engine i.e. the newtelligence.DasBlog.Runtime.dll assembly.
Let me describe what the code does:
Input:
- A path to a folder where the XML files from the old blogger.com blog reside. It can deal with both types of XML file.
- A full file path to a file where it creates XML redirect input (as required by the rewriteMap mentioned here).
- The blog URL. The author's email. The blog author name.
- A path to an empty folder where the new XML dasBlog content files will get created.
- The subfolder name used after the domain name in the URL.
- The 3 reg ex patterns to use. You can use the same as mine, but will need to tweak the monthly_archive rule.
Again, to see what values I passed for all the above, see my Program.cs file.
Output:
- It creates dasBlog XML files in the folder specified. It creates those by parsing the old blogger.com XML files that reside in the folder specified. After that is generated, copy it to the "Content" folder under your dasBlog installation.
- It creates an XML file with a single ignorable root element and a bunch of inner XML elements. You can copy paste these in the web.config file as discussed in this post.
Other notes:
- For each blog post, it detects outgoing links to itself (i.e. to the same blog), and rewrites those to point to the new URLs. So internal links do not rely on the web.config redirects.
- It deals with duplicate post titles; it does not deal with triplicates and higher.
- Removes all references to blogger.com (e.g. references to noreply@blogger.com, the injected hidden footer for statistics that each blog post has and others – see the code).
- It creates a lot of diagnostic output (in the Output window) and indeed the documentation for the code is in the Debug.WriteLine statements ;)
This is not code I will maintain or support – it was a throwaway one-use project that I am sharing here as a starting point for anyone finding themselves in the same boat that I was. Enjoy "as is".
Fri, April 9, 2010, 07:22 AM under
Blogging
One of the things that gets me on a rant is websites that break permalinks. If you have posted something somewhere and there is a public URL pointing to it, that URL should never ever return a 404. You are breaking all websites that ever linked to you and you are breaking all search engine links to your content (that others will try and follow). It is a pet peeve of mine.
So when I had to move my blog, obviously I would preserve the root URL (www.danielmoth.com/Blog/), but I also wanted to preserve every URL my blog has generated over the years. To be clear, our focus here is on the URL formatting, not the content migration which I'll talk about in my next post. In this post, I'll describe my solution first and then what it solves.
1. The IIS7 Rewrite Module and web.config
There are a few ways you can map an old URL to a new one (so when requests to the old URL come in, they get redirected to the new one). The new blog engine I use (dasBlog) has built-in functionality to do that (Scott refers to it here). Instead, the way I chose to address the issue was to use the IIS7 rewrite module.
The IIS7 rewrite module allows redirecting URLs based on pattern matching, regular expressions and, of course, hardcoded full URLs for things that don't fall into any pattern. You can configure it visually from IIS Manager using a handy dialog that allows testing patterns against input URLs. Here is what mine looked like after configuring a few rules:
To learn more about this technology check out this video, the reference page and this overview blog post; all 3 pages have a collection of related resources at the bottom worth checking out too.
All the visual configuration ends up in a web.config file at the root folder of your website. If you are on a shared hosting service, probably the only way you can use the Rewrite Module is by directly editing the web.config file. Next, I'll describe the URLs I had to map and how that manifested itself in the web.config file. What I did was create the rules locally using the GUI, and then took the generated web.config file and uploaded it to my live site. You can view my web.config here.
2. Monthly Archives
Observe the difference between the way the two blog engines generate this type of URL
- Blogger: /Blog/2004_07_01_mothblog_archive.html
- dasBlog: /Blog/default,month,2004-07.aspx
In my web.config file, the rule that deals with this is the one named "monthlyarchive_redirect".
3. Categories
Observe the difference between the way the two blog engines generate this type of URL
- Blogger: /Blog/labels/Personal.html
- dasBlog: /Blog/CategoryView,category,Personal.aspx
In my web.config file the rule that deals with this is the one named "category_redirect".
4. Posts
Observe the difference between the way the two blog engines generate this type of URL
- Blogger: /Blog/2004/07/hello-world.html
- dasBlog: /Blog/Hello-World.aspx
In my web.config file the rule that deals with this is the one named "post_redirect".
Note: The decision is taken to use dasBlog URLs that do not include the date info (see the description of my Appearance settings). If we included the date info then it would have to include the day part, which blogger did not generate. This makes it impossible to redirect correctly and to have a single permalink for blog posts moving forward. An implication of this decision, is that no two blog posts can have the same title. The tool I will describe in my next post (inelegantly) deals with duplicates, but not with triplicates or higher.
5. Unhandled by a generic rule
Unfortunately, the two blog engines use different rules for generating URLs for blog posts. Most of the time the conversion is as simple as the example of the previous section where a post titled "Hello World" generates a URL with the words separated by a hyphen. Some times that is not the case, for example:
- /Blog/2006/05/medc-wrap-up.html
- /Blog/MEDC-Wrapup.aspx
or
- /Blog/2005/01/best-of-moth-2004.html
- /Blog/Best-Of-The-Moth-2004.aspx
or
- /Blog/2004/11/more-windows-mobile-2005-details.html
- /Blog/More-Windows-Mobile-2005-Details-Emerge.aspx
In short, blogger does not add words to the title beyond ~39 characters, it drops some words from the title generation (e.g. a, an, on, the), and it preserve hyphens that appear in the title. For this reason, we need to detect these and explicitly list them for redirects (no regular expression can help here because the full set of rules is not listed anywhere).
In my web.config file the rule that deals with this is the one named "Redirect rule1 for FullRedirects" combined with the rewriteMap named "StaticRedirects".
Note: The tool I describe in my next post will detect all the URLs that need to be explicitly redirected and will list them in a file ready for you to copy them to your web.config rewriteMap.
6. C# code doing the same as the web.config
I wrote some naive code that does the same thing as the web.config: given a string it will return a new string converted according to the 3 rules above. It does not take into account the 4th case where an explicit hard-coded conversion is needed (the tool I present in the next post does take that into account).
static string REGEX_post_redirect = "[0-9]{4}/[0-9]{2}/([0-9a-z-]+).html";
static string REGEX_category_redirect = "labels/([_0-9a-z-% ]+).html";
static string REGEX_monthlyarchive_redirect = "([0-9]{4})_([0-9]{2})_[0-9]{2}_mothblog_archive.html";
static string Redirect(string oldUrl)
{
GroupCollection g;
if (RunRegExOnIt(oldUrl, REGEX_post_redirect, 2, out g))
return string.Concat(g[1].Value, ".aspx");
if (RunRegExOnIt(oldUrl, REGEX_category_redirect, 2, out g))
return string.Concat("CategoryView,category,", g[1].Value, ".aspx");
if (RunRegExOnIt(oldUrl, REGEX_monthlyarchive_redirect, 3, out g))
return string.Concat("default,month,", g[1].Value, "-", g[2], ".aspx");
return string.Empty;
}
static bool RunRegExOnIt(string toRegEx, string pattern, int groupCount, out GroupCollection g)
{
if (pattern.Length == 0)
{
g = null;
return false;
}
g = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled).Match(toRegEx).Groups;
return (g.Count == groupCount);
}
Fri, April 9, 2010, 07:08 AM under
Blogging
Some people like blogging on a site that is completely managed by someone else (e.g. http://wordpress.com/) and others, like me, prefer hosting their own blog at their own domain. In the latter case you need to decide what blog engine to install on your web space to power your blog. There are many free blog engines to choose from (e.g. the one from http://wordpress.org/). If, like me, you want to use a blog engine that is based on the .NET platform you have many choices including BlogEngine.NET, Subtext and the one I picked: dasBlog.
In this post I'll describe the steps I took to get going with the open source dasBlog (home page, source page).
A. Installing
First I installed dasBlog on my local Windows 7 machine where I have IIS7 installed. To install dasBlog, I started by clicking the "Install" button on its web gallery page. After that I went through configuration, theming and adding content as described below.
Once I was happy that everything was working correctly on the local machine, I set this up on a hosting service. I went for a Windows IIS7 shared hosting 3 month Economy plan from GoDaddy. The dasBlog site lists a bunch of other hosts. You can read the installation instructions for dasBlog, and with GoDaddy I just had to click one button since it is available as part of their quick-install apps. With GoDaddy I had a previewdns option that allowed me to play around and preview my site before going live.
B. Configuring
After it was installed (on local machine and/or hosting provider), I followed the obvious steps to create an admin user and logged in. This displays an admin navigation bar with the following options:
1. Navigator Links: I decided I was not going to use this feature. I manage links on the side of my blog manually elsewhere as part of the theme. So, I deleted every entry on this page and ignored it thereafter.
2. Blogroll: Ditto - same comment as for Navigator Links.
3. Content Filters: I did not delete (or add) these, but I did ensure both checkboxes are not checked. I.e. I am not using this feature now, but I may return to it in the future.
4. Activity: This is a read-only view of various statistics. So nothing to configure here, but useful to come back to for complementary statistics to whatever other statistical package you use (e.g. free stats as part of the hosting and I also use feedburner for syndication stats).
5. Cross-posting: I did not need that, so I turned it off via the Configuration Settings discussed next.
6. Configuration Settings: This is where the bulk of the configuration for the blog takes place and they are stored in a single XML file: Site.Config file. There are truly self-explanatory options to pick for Basic Settings, Services Settings and Services to Ping, Syndication Settings (this is where you link to your feedburner name if you have one) and Mail to Weblog Settings (I keep this turned off). There are also "Xml Storage System Settings" (I keep this turned off), "OpenId Settings" (I allow OpenID commenters), "Spammer Settings" (Enable captcha, never show email addresses) and "Comment settings" (Enable comments, don't allow on older posts, don't allow html). There are also Appearance Settings (I checked the "Use Post Title for Permalink", replaced spaces with hyphen and unchecked the "Use Unique Title"). Finally, there are also Notification Settings, but they are a bit of hit and miss in my case, in that I don’t always get the emails (still investigating this).
C. Adding Content
You can add content via the "Add Entry" link on the admin navigation bar or by configuring the "Mail to Weblog" settings and sending email or, do what I've started doing, use Live Writer (also the team has a blog).
Another way to add content is programmatically if, for example, you are migrating content from another blog (and I'll cover that in separate post sharing the code). What you should know is that all blog content (posts and comments) live in XML files in a folder called "content" under your dasBlog installation.
D. Theming
There is a very good guide about themes for dasBlog, there is also a similar guide with screenshots (scroll down to "So how do I create a theme") and the dasBlog macro reference.
When you install dasBlog, there are many themes available; each theme is in its own folder (representing the folder name) under the themes folder. You may have noticed that you can switch between these via the "Appearance Settings" described above (look for the combobox after the Default Theme label).
I created my own theme by copy-pasting an existing theme folder, renaming it and then switching to it as the default. I then opened the folder in Visual Studio and hacked around the HTML in the 3 files (itemTemplate, homeTemplate and dayTemplate). These files have a blogtemplate file extension, which I temporarily renamed to HTML as I was editing them. There is no more advice I can offer here as this is a matter of taste and the aforementioned links is all I used. Personally, I had salvaged the CSS (and structure) from my previous blog and wanted to make this one match it as closely as possible - I think I have succeeded.
E. If you run into any issue with dasBlog...
...use your favorite search engine to find answers. Many bloggers have been using this engine for a while and have documented issues and workarounds over time. One such example is ScottHa's dasBlog category; another example is therightstuff where I "borrowed" the idea/macro for the outlook-style on-page navigation. If you don't find what you want through searching, try posting a question to the forums.
Fri, April 9, 2010, 06:58 AM under
Blogging
Due to blogger.com deprecating FTP users I've decided to move my blog.
When I think of the content of a blog, 4 items come to mind: blog posts, comments, binary files that the blog posts linked to (e.g. images, ZIP files) and the CSS+structure of the blog.
1. Binaries
The binary files you used in your blog posts are sitting on your own web space, so really blogger.com is not involved with that. Nothing for you to do at this stage, I'll come back to these in another post.
2. CSS and structure
In the best case this exists as a separate CSS file on your web space (so no action for now) or in a worst case, like me, your CSS is embedded with the HTML. In the latter case, simply navigate from you dashboard to "Template" then "Edit HTML" and copy paste the contents of the box. Save that locally in a txt file and we'll come back to that in another post.
3. Blog posts and Comments
The blog posts and comments exist in all the HTML files on your own web space. Parsing HTML files to extract that can be painful, so it is easier to download the XML files from blogger's servers that contain all your blog posts and comments.
3.1 Single XML file, but incomplete
The obvious thing to do is go into your dashboard "Settings" and under the "Basic" tab look at the top next to "Blog Tools". There is a link there to "Export blog" which downloads an XML file with both comments and posts. The problem with that is that it only contains 200 comments - if you have more than that, you will lose the surplus. Also, this XML file has a lot of noise, compared to the better solution described next. (note that a tool I will refer to in a future post deals with either kind of XML file)
3.2 Multiple XML files
First you need to find your blog ID. In case you don't know what that is, navigate to the "Template" as described in section 2 above. You will find references to the blog id in the HTML there, but you can also see it as part of the URL in your browser: blogger.com/template-edit.g?blogID=YOUR_NUMERIC_ID. Mine is 7 digits.
You can now navigate to these URLs to download the XML for your posts and comments respectively:
blogger.com/feeds/YOUR_NUMERIC_ID/posts/default?max-results=500&start-index=1
blogger.com/feeds/YOUR_NUMERIC_ID/comments/default?max-results=200&start-index=1
Note that you can only get 500 posts at a time and only 200 comments at a time. To get more than that you have to change the URL and download the next batch. To get you started, to get the XML for the next 500 posts and next 200 comments respectively you’d have to use these URLs:
blogger.com/feeds/YOUR_NUMERIC_ID/posts/default?max-results=500&start-index=501
blogger.com/feeds/YOUR_NUMERIC_ID/comments/default?max-results=200&start-index=201
...and so on and so forth. Keep all the XML files in the same folder on your local machine (with nothing else in there).
4. Validating the XML aka editing older blog posts
The XML files you just downloaded really contain HTML fragments inside for all your blog posts. If you are like me, your blog posts did not conform to XHTML so passing them to an XML parser (which is what we will want to do) will result in the XML parser choking. So the next step is to fix that. This can be no work at all for you, or a huge time sink or just a couple hours of pain (which was my case).
The process I followed was to attempt to load the XML files using XmlDocument.Load and wait for the exception to be thrown from my code. The exception would point to the exact offending line and column which would help me fix the issue. Rather than fix it in the XML itself, I would go back and edit the offending blog post and fix it there - recommended! Then I'd repeat the cycle until the XML could be loaded in the XmlDocument.
To give you an idea, some of the issues I encountered are: extra or missing quotes in img and href elements, direct usage of chevrons instead of encoding them as <, missing closing tags, mismatched nested pairs of elements and capitalization of html elements. For a full list of things that may go wrong see this.
5. Opportunity for other changes
I also found a few posts that did not have a category assigned so I fixed those too. I took the further opportunity to create new categories and tag some of my blog posts with that. Note that I did not remove/change categories of existing posts, but only added.
In an another post we'll see how to use the XML files you stored in the local folder…
Fri, April 9, 2010, 06:41 AM under
Blogging
History (you can safely ignore)
Back in 2002 I came across some (almost) free Linux/Apache space and set up my first manually-created HTML-based home page, which still exists: http://www.danielmoth.com/. In 2004 I wanted to have a blog that would be hosted on a sub-folder of my domain, and at the same time I did not want to mess with setting up a blog engine myself. I found the perfect solution in blogger.com, which offered a web interface for creating blog posts (and managing the pages' template) and it would then use FTP to upload HTML pages to my space (no server-side programming/installation required at all)!
FTP feature dropped by blogger.com
Unfortunately, along the way Google purchased blogger.com and a couple of months ago they announced that they decided to kill the FTP feature, and they are forcing customers using that feature to have their content hosted (in an opaque way) on Google's servers.
Even though I prefer having my content on my own space, I would have considered moving it to Google's servers if I could host my blog in a sub-folder and preserve my full blog URL: http://www.danielmoth.com/Blog/ (including my home pages being hosted at the root of the domain). Sadly, that is not possible.
What now
So I decided to move my blog somewhere else. I'll document on the next few posts how I did that (inc. a tool I wrote) in case it helps someone else in the same situation and also as a reminder to me if I need to do something like this again in the future.
Fri, February 19, 2010, 04:00 PM under
ParallelComputing |
GPGPU
In my previous blog post I introduced the concept of
GPGPU ending with:
On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the Direct X API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming: DirectCompute. You use this API on the CPU side, to set up and execute the kernels on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as "kernel".
In this post I want to share some links to get you started with DirectCompute and HLSL.
1. Watch the recording of the PDC 09 session:
DirectX11 DirectCompute.
2. If session recordings is your thing there are two more on DirectCompute from nvidia's GTC09 conference
1015 (
pdf,
mp4) and
1411 (
mp4 plus the presenter's
written version of the session).
3. Over at gamedev there is an old
Compute Shader tutorial. At the same site, there is a 3-part blog post on Compute Shader:
Introduction,
Resources and
Addressing.
4. From PDC, you can also download the
DirectCompute Hands On Lab.
5. When you are ready to get your hands even dirtier, download the latest
Windows DirectX SDK (at the time of writing the latest is dated Feb 2010).
6. Within the SDK you'll find a
Compute Shader Overview and samples such as:
Basic,
Sort,
OIT,
NBodyGravity,
HDR Tone Mapping.
7. Talking of DX11/DirectCompute samples, there are also a
couple of good ones on this URL.
8. The documentation of the various APIs is available online. Here are just some good (but far from complete) taster entry points into that:
numthreads,
SV_DispatchThreadID,
SV_GroupThreadID,
SV_GroupID,
SV_GroupIndex,
D3D11CreateDevice,
D3DX11CompileFromFile,
CreateComputeShader,
Dispatch,
D3D11_BIND_FLAG,
GSSetShader.
Fri, February 19, 2010, 03:58 PM under
ParallelComputing |
GPGPU
WhatGPU obviously stands for Graphics Processing Unit (the silicon powering the display you are using to read this blog post). The extra GP in front of that stands for General Purpose computing.
So, altogether
GPGPU refers to computing we can perform on GPU for purposes beyond just drawing on the screen. In effect, we can use a GPGPU a bit like we already use a CPU: to perform some calculation (that doesn’t have to have any visual element to it). The attraction is that a GPGPU can be orders of magnitude faster than a CPU.
WhyWhen
I was at the SuperComputing conference in Portland last November, GPGPUs were all the rage. A quick online search reveals many articles introducing the GPGPU topic. I'll just share 3 here:
pcper (ignoring all pages except the first, it is a good consumer perspective),
gizmodo (nice take using mostly layman terms) and
vizworld (answering the question on "what's the big deal").
The GPGPU programming paradigm (from a high level) is simple: in your CPU program you define functions (aka kernels) that take some input, can perform the costly operation and return the output. The kernels are the things that execute on the GPGPU leveraging its power (and hence execute faster than what they could on the CPU) while the host CPU program waits for the results or asynchronously performs other tasks.
However, GPGPUs have different characteristics to CPUs which means they are suitable only for certain classes of problem (i.e. data parallel algorithms) and not for others (e.g. algorithms with branching or recursion or other complex flow control). You also pay a high cost for transferring the input data from the CPU to the GPU (and vice versa the results back to the CPU), so the computation itself has to be long enough to justify the overhead transfer costs. If your problem space fits the criteria then you probably want to check out this technology.
HowSo where can you get a graphics card to start playing with all this? At the time of writing, the two main vendors ATI (owned by AMD) and NVIDIA are the obvious players in this industry. You can read about GPGPU on
this AMD page and also on
this NVIDIA page. NVIDIA's website also has a free chapter on the topic from the "GPU Gems" book:
A Toolkit for Computation on GPUs.
If you followed the links above, then you've already come across some of the choices of programming models that are available today. Essentially, AMD is offering their ATI Stream technology accessible via a language they call Brook+; NVIDIA offers their CUDA platform which is accessible from CUDA C. Choosing either of those locks you into the GPU vendor and hence your code cannot run on systems with cards from the other vendor (e.g. imagine if your CPU code would run on Intel chips but not AMD chips). Having said that, both vendors plan to support a new emerging standard called OpenCL, which theoretically means your kernels can execute on any GPU that supports it. To learn more about all of these there is a website:
gpgpu.org. The caveat about that site is that (currently) it completely ignores the Microsoft approach, which I touch on next.
On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the
DirectX API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming:
DirectCompute. You use this API on the CPU side, to set up and execute the kernels that run on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as a "kernel". For a comprehensive collection of links about this (including tutorials, videos and samples) please see my blog post:
DirectCompute.
Note that there are many efforts to build even higher level languages on top of DirectX that aim to expose GPGPU programming to a wider audience by making it as easy as today's mainstream programming models. I'll mention here just two of those efforts:
Accelerator from MSR and
Brahma by Ananth.