Fri, April 9, 2010, 07:33 AM under
Blogging
Due to blogger.com dropping FTP support, I've had to move my blog. If you are in a similar situation, this post will help you by showing you the necessary steps to take.
Goals
No loss on blog posts, comments AND all existing permalinks continue to work (redirect to the correct place).
Steps
- Download the XML files corresponding to your blogger.com content and store them in a folder.
- Install and configure dasBlog on your local machine.
- Configure your web.config file (will need updating once you run step 4).
- Use the tool I describe further down to generate the content and place it at the right place.
- Test your site locally. Once you are happy, repeat step 2 on your hosting provider of choice. Remember to copy up your dasBlog theme folder if you created one.
- Copy up the local web.config file and the XML dasBlog content files generated by the tool of step 4.
- Test your site on the server. Once you are happy, go live (following instructions from your hoster). In my case, I gave the nameservers from my new hoster to my existing domain registrar and they made the switch.
Tool (code)
At step 4 above I referred to a tool. That is an overstatement, it is simply one 450-line C#code file that you can download here: BloggerToDasBlog.cs. I used this from a .NET 2.0 console app (and I run it under the Visual Studio debugger, i.e. F5) like this: Program.cs. The console app referenced the dasBlog 2.3 ASP.NET Blogging Engine i.e. the newtelligence.DasBlog.Runtime.dll assembly.
Let me describe what the code does:
Input:
- A path to a folder where the XML files from the old blogger.com blog reside. It can deal with both types of XML file.
- A full file path to a file where it creates XML redirect input (as required by the rewriteMap mentioned here).
- The blog URL. The author's email. The blog author name.
- A path to an empty folder where the new XML dasBlog content files will get created.
- The subfolder name used after the domain name in the URL.
- The 3 reg ex patterns to use. You can use the same as mine, but will need to tweak the monthly_archive rule.
Again, to see what values I passed for all the above, see my Program.cs file.
Output:
- It creates dasBlog XML files in the folder specified. It creates those by parsing the old blogger.com XML files that reside in the folder specified. After that is generated, copy it to the "Content" folder under your dasBlog installation.
- It creates an XML file with a single ignorable root element and a bunch of inner XML elements. You can copy paste these in the web.config file as discussed in this post.
Other notes:
- For each blog post, it detects outgoing links to itself (i.e. to the same blog), and rewrites those to point to the new URLs. So internal links do not rely on the web.config redirects.
- It deals with duplicate post titles; it does not deal with triplicates and higher.
- Removes all references to blogger.com (e.g. references to noreply@blogger.com, the injected hidden footer for statistics that each blog post has and others – see the code).
- It creates a lot of diagnostic output (in the Output window) and indeed the documentation for the code is in the Debug.WriteLine statements ;)
This is not code I will maintain or support – it was a throwaway one-use project that I am sharing here as a starting point for anyone finding themselves in the same boat that I was. Enjoy "as is".
Fri, April 9, 2010, 07:22 AM under
Blogging
One of the things that gets me on a rant is websites that break permalinks. If you have posted something somewhere and there is a public URL pointing to it, that URL should never ever return a 404. You are breaking all websites that ever linked to you and you are breaking all search engine links to your content (that others will try and follow). It is a pet peeve of mine.
So when I had to move my blog, obviously I would preserve the root URL (www.danielmoth.com/Blog/), but I also wanted to preserve every URL my blog has generated over the years. To be clear, our focus here is on the URL formatting, not the content migration which I'll talk about in my next post. In this post, I'll describe my solution first and then what it solves.
1. The IIS7 Rewrite Module and web.config
There are a few ways you can map an old URL to a new one (so when requests to the old URL come in, they get redirected to the new one). The new blog engine I use (dasBlog) has built-in functionality to do that (Scott refers to it here). Instead, the way I chose to address the issue was to use the IIS7 rewrite module.
The IIS7 rewrite module allows redirecting URLs based on pattern matching, regular expressions and, of course, hardcoded full URLs for things that don't fall into any pattern. You can configure it visually from IIS Manager using a handy dialog that allows testing patterns against input URLs. Here is what mine looked like after configuring a few rules:
To learn more about this technology check out this video, the reference page and this overview blog post; all 3 pages have a collection of related resources at the bottom worth checking out too.
All the visual configuration ends up in a web.config file at the root folder of your website. If you are on a shared hosting service, probably the only way you can use the Rewrite Module is by directly editing the web.config file. Next, I'll describe the URLs I had to map and how that manifested itself in the web.config file. What I did was create the rules locally using the GUI, and then took the generated web.config file and uploaded it to my live site. You can view my web.config here.
2. Monthly Archives
Observe the difference between the way the two blog engines generate this type of URL
- Blogger: /Blog/2004_07_01_mothblog_archive.html
- dasBlog: /Blog/default,month,2004-07.aspx
In my web.config file, the rule that deals with this is the one named "monthlyarchive_redirect".
3. Categories
Observe the difference between the way the two blog engines generate this type of URL
- Blogger: /Blog/labels/Personal.html
- dasBlog: /Blog/CategoryView,category,Personal.aspx
In my web.config file the rule that deals with this is the one named "category_redirect".
4. Posts
Observe the difference between the way the two blog engines generate this type of URL
- Blogger: /Blog/2004/07/hello-world.html
- dasBlog: /Blog/Hello-World.aspx
In my web.config file the rule that deals with this is the one named "post_redirect".
Note: The decision is taken to use dasBlog URLs that do not include the date info (see the description of my Appearance settings). If we included the date info then it would have to include the day part, which blogger did not generate. This makes it impossible to redirect correctly and to have a single permalink for blog posts moving forward. An implication of this decision, is that no two blog posts can have the same title. The tool I will describe in my next post (inelegantly) deals with duplicates, but not with triplicates or higher.
5. Unhandled by a generic rule
Unfortunately, the two blog engines use different rules for generating URLs for blog posts. Most of the time the conversion is as simple as the example of the previous section where a post titled "Hello World" generates a URL with the words separated by a hyphen. Some times that is not the case, for example:
- /Blog/2006/05/medc-wrap-up.html
- /Blog/MEDC-Wrapup.aspx
or
- /Blog/2005/01/best-of-moth-2004.html
- /Blog/Best-Of-The-Moth-2004.aspx
or
- /Blog/2004/11/more-windows-mobile-2005-details.html
- /Blog/More-Windows-Mobile-2005-Details-Emerge.aspx
In short, blogger does not add words to the title beyond ~39 characters, it drops some words from the title generation (e.g. a, an, on, the), and it preserve hyphens that appear in the title. For this reason, we need to detect these and explicitly list them for redirects (no regular expression can help here because the full set of rules is not listed anywhere).
In my web.config file the rule that deals with this is the one named "Redirect rule1 for FullRedirects" combined with the rewriteMap named "StaticRedirects".
Note: The tool I describe in my next post will detect all the URLs that need to be explicitly redirected and will list them in a file ready for you to copy them to your web.config rewriteMap.
6. C# code doing the same as the web.config
I wrote some naive code that does the same thing as the web.config: given a string it will return a new string converted according to the 3 rules above. It does not take into account the 4th case where an explicit hard-coded conversion is needed (the tool I present in the next post does take that into account).
static string REGEX_post_redirect = "[0-9]{4}/[0-9]{2}/([0-9a-z-]+).html";
static string REGEX_category_redirect = "labels/([_0-9a-z-% ]+).html";
static string REGEX_monthlyarchive_redirect = "([0-9]{4})_([0-9]{2})_[0-9]{2}_mothblog_archive.html";
static string Redirect(string oldUrl)
{
GroupCollection g;
if (RunRegExOnIt(oldUrl, REGEX_post_redirect, 2, out g))
return string.Concat(g[1].Value, ".aspx");
if (RunRegExOnIt(oldUrl, REGEX_category_redirect, 2, out g))
return string.Concat("CategoryView,category,", g[1].Value, ".aspx");
if (RunRegExOnIt(oldUrl, REGEX_monthlyarchive_redirect, 3, out g))
return string.Concat("default,month,", g[1].Value, "-", g[2], ".aspx");
return string.Empty;
}
static bool RunRegExOnIt(string toRegEx, string pattern, int groupCount, out GroupCollection g)
{
if (pattern.Length == 0)
{
g = null;
return false;
}
g = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled).Match(toRegEx).Groups;
return (g.Count == groupCount);
}
Fri, April 9, 2010, 07:08 AM under
Blogging
Some people like blogging on a site that is completely managed by someone else (e.g. http://wordpress.com/) and others, like me, prefer hosting their own blog at their own domain. In the latter case you need to decide what blog engine to install on your web space to power your blog. There are many free blog engines to choose from (e.g. the one from http://wordpress.org/). If, like me, you want to use a blog engine that is based on the .NET platform you have many choices including BlogEngine.NET, Subtext and the one I picked: dasBlog.
In this post I'll describe the steps I took to get going with the open source dasBlog (home page, source page).
A. Installing
First I installed dasBlog on my local Windows 7 machine where I have IIS7 installed. To install dasBlog, I started by clicking the "Install" button on its web gallery page. After that I went through configuration, theming and adding content as described below.
Once I was happy that everything was working correctly on the local machine, I set this up on a hosting service. I went for a Windows IIS7 shared hosting 3 month Economy plan from GoDaddy. The dasBlog site lists a bunch of other hosts. You can read the installation instructions for dasBlog, and with GoDaddy I just had to click one button since it is available as part of their quick-install apps. With GoDaddy I had a previewdns option that allowed me to play around and preview my site before going live.
B. Configuring
After it was installed (on local machine and/or hosting provider), I followed the obvious steps to create an admin user and logged in. This displays an admin navigation bar with the following options:
1. Navigator Links: I decided I was not going to use this feature. I manage links on the side of my blog manually elsewhere as part of the theme. So, I deleted every entry on this page and ignored it thereafter.
2. Blogroll: Ditto - same comment as for Navigator Links.
3. Content Filters: I did not delete (or add) these, but I did ensure both checkboxes are not checked. I.e. I am not using this feature now, but I may return to it in the future.
4. Activity: This is a read-only view of various statistics. So nothing to configure here, but useful to come back to for complementary statistics to whatever other statistical package you use (e.g. free stats as part of the hosting and I also use feedburner for syndication stats).
5. Cross-posting: I did not need that, so I turned it off via the Configuration Settings discussed next.
6. Configuration Settings: This is where the bulk of the configuration for the blog takes place and they are stored in a single XML file: Site.Config file. There are truly self-explanatory options to pick for Basic Settings, Services Settings and Services to Ping, Syndication Settings (this is where you link to your feedburner name if you have one) and Mail to Weblog Settings (I keep this turned off). There are also "Xml Storage System Settings" (I keep this turned off), "OpenId Settings" (I allow OpenID commenters), "Spammer Settings" (Enable captcha, never show email addresses) and "Comment settings" (Enable comments, don't allow on older posts, don't allow html). There are also Appearance Settings (I checked the "Use Post Title for Permalink", replaced spaces with hyphen and unchecked the "Use Unique Title"). Finally, there are also Notification Settings, but they are a bit of hit and miss in my case, in that I don’t always get the emails (still investigating this).
C. Adding Content
You can add content via the "Add Entry" link on the admin navigation bar or by configuring the "Mail to Weblog" settings and sending email or, do what I've started doing, use Live Writer (also the team has a blog).
Another way to add content is programmatically if, for example, you are migrating content from another blog (and I'll cover that in separate post sharing the code). What you should know is that all blog content (posts and comments) live in XML files in a folder called "content" under your dasBlog installation.
D. Theming
There is a very good guide about themes for dasBlog, there is also a similar guide with screenshots (scroll down to "So how do I create a theme") and the dasBlog macro reference.
When you install dasBlog, there are many themes available; each theme is in its own folder (representing the folder name) under the themes folder. You may have noticed that you can switch between these via the "Appearance Settings" described above (look for the combobox after the Default Theme label).
I created my own theme by copy-pasting an existing theme folder, renaming it and then switching to it as the default. I then opened the folder in Visual Studio and hacked around the HTML in the 3 files (itemTemplate, homeTemplate and dayTemplate). These files have a blogtemplate file extension, which I temporarily renamed to HTML as I was editing them. There is no more advice I can offer here as this is a matter of taste and the aforementioned links is all I used. Personally, I had salvaged the CSS (and structure) from my previous blog and wanted to make this one match it as closely as possible - I think I have succeeded.
E. If you run into any issue with dasBlog...
...use your favorite search engine to find answers. Many bloggers have been using this engine for a while and have documented issues and workarounds over time. One such example is ScottHa's dasBlog category; another example is therightstuff where I "borrowed" the idea/macro for the outlook-style on-page navigation. If you don't find what you want through searching, try posting a question to the forums.
Fri, April 9, 2010, 06:58 AM under
Blogging
Due to blogger.com deprecating FTP users I've decided to move my blog.
When I think of the content of a blog, 4 items come to mind: blog posts, comments, binary files that the blog posts linked to (e.g. images, ZIP files) and the CSS+structure of the blog.
1. Binaries
The binary files you used in your blog posts are sitting on your own web space, so really blogger.com is not involved with that. Nothing for you to do at this stage, I'll come back to these in another post.
2. CSS and structure
In the best case this exists as a separate CSS file on your web space (so no action for now) or in a worst case, like me, your CSS is embedded with the HTML. In the latter case, simply navigate from you dashboard to "Template" then "Edit HTML" and copy paste the contents of the box. Save that locally in a txt file and we'll come back to that in another post.
3. Blog posts and Comments
The blog posts and comments exist in all the HTML files on your own web space. Parsing HTML files to extract that can be painful, so it is easier to download the XML files from blogger's servers that contain all your blog posts and comments.
3.1 Single XML file, but incomplete
The obvious thing to do is go into your dashboard "Settings" and under the "Basic" tab look at the top next to "Blog Tools". There is a link there to "Export blog" which downloads an XML file with both comments and posts. The problem with that is that it only contains 200 comments - if you have more than that, you will lose the surplus. Also, this XML file has a lot of noise, compared to the better solution described next. (note that a tool I will refer to in a future post deals with either kind of XML file)
3.2 Multiple XML files
First you need to find your blog ID. In case you don't know what that is, navigate to the "Template" as described in section 2 above. You will find references to the blog id in the HTML there, but you can also see it as part of the URL in your browser: blogger.com/template-edit.g?blogID=YOUR_NUMERIC_ID. Mine is 7 digits.
You can now navigate to these URLs to download the XML for your posts and comments respectively:
blogger.com/feeds/YOUR_NUMERIC_ID/posts/default?max-results=500&start-index=1
blogger.com/feeds/YOUR_NUMERIC_ID/comments/default?max-results=200&start-index=1
Note that you can only get 500 posts at a time and only 200 comments at a time. To get more than that you have to change the URL and download the next batch. To get you started, to get the XML for the next 500 posts and next 200 comments respectively you’d have to use these URLs:
blogger.com/feeds/YOUR_NUMERIC_ID/posts/default?max-results=500&start-index=501
blogger.com/feeds/YOUR_NUMERIC_ID/comments/default?max-results=200&start-index=201
...and so on and so forth. Keep all the XML files in the same folder on your local machine (with nothing else in there).
4. Validating the XML aka editing older blog posts
The XML files you just downloaded really contain HTML fragments inside for all your blog posts. If you are like me, your blog posts did not conform to XHTML so passing them to an XML parser (which is what we will want to do) will result in the XML parser choking. So the next step is to fix that. This can be no work at all for you, or a huge time sink or just a couple hours of pain (which was my case).
The process I followed was to attempt to load the XML files using XmlDocument.Load and wait for the exception to be thrown from my code. The exception would point to the exact offending line and column which would help me fix the issue. Rather than fix it in the XML itself, I would go back and edit the offending blog post and fix it there - recommended! Then I'd repeat the cycle until the XML could be loaded in the XmlDocument.
To give you an idea, some of the issues I encountered are: extra or missing quotes in img and href elements, direct usage of chevrons instead of encoding them as <, missing closing tags, mismatched nested pairs of elements and capitalization of html elements. For a full list of things that may go wrong see this.
5. Opportunity for other changes
I also found a few posts that did not have a category assigned so I fixed those too. I took the further opportunity to create new categories and tag some of my blog posts with that. Note that I did not remove/change categories of existing posts, but only added.
In an another post we'll see how to use the XML files you stored in the local folder…
Fri, April 9, 2010, 06:41 AM under
Blogging
History (you can safely ignore)
Back in 2002 I came across some (almost) free Linux/Apache space and set up my first manually-created HTML-based home page, which still exists: http://www.danielmoth.com/. In 2004 I wanted to have a blog that would be hosted on a sub-folder of my domain, and at the same time I did not want to mess with setting up a blog engine myself. I found the perfect solution in blogger.com, which offered a web interface for creating blog posts (and managing the pages' template) and it would then use FTP to upload HTML pages to my space (no server-side programming/installation required at all)!
FTP feature dropped by blogger.com
Unfortunately, along the way Google purchased blogger.com and a couple of months ago they announced that they decided to kill the FTP feature, and they are forcing customers using that feature to have their content hosted (in an opaque way) on Google's servers.
Even though I prefer having my content on my own space, I would have considered moving it to Google's servers if I could host my blog in a sub-folder and preserve my full blog URL: http://www.danielmoth.com/Blog/ (including my home pages being hosted at the root of the domain). Sadly, that is not possible.
What now
So I decided to move my blog somewhere else. I'll document on the next few posts how I did that (inc. a tool I wrote) in case it helps someone else in the same situation and also as a reminder to me if I need to do something like this again in the future.
Sun, January 27, 2008, 02:06 PM under
Blogging
Last Sunday I posted a
survey for my blog (followed by two others, identical but hosted on different sites). A week later I thought you might want to know what the results are from the 300 responses (which
as I hinted wasn't straightforward to consolidate, hence the delay). Below are the questions, summary of responses and some interspersed commentary on some of my plans.
Q1. Which version(s) of Visual Studio do you mostly use? – Multiple answers allowed.The vast majority of votes went to VS2005 (67%) and VS2008 (62%). The percentages for the other answers are VS6orOlder (7%), VS.NET2002 (0%,
1 out of 300), VS.NET 2003 (10%) and some people stated: "Eclipse for Linux, MonoDevelop, Rhapsody, Vim, Macromedia". Note that most responses selected VS2005 in addition to anything else which suggests they are using the older IDEs for older projects rather than using them exclusively. FYI I do not plan on focusing on other IDEs other than VS2008 (inc. any service packs) and of course Visual Studio vNext as soon as a public CTP VPC becomes available.
Q2. Which language(s) do you predominantly program in? – Multiple answers allowed.The most popular language by far was C# (84%). Even people checking other languages would do it in addition to checking C# as their answer! The results for the other languages are: C++ (15%), VB6 (6%), VB.NET (22%) and some people additionally entered: "Powershell, COBOL, Java, Ruby, ColdFusion, Classic ASP, PHP, Perl, Fortran, javascript". FYI my samples are mostly in C#, but I do throw in VB occasionally and in fact have covered many VB-specific features that would not have been of interest to C# devs. I will continue to blog for both managed developer types and in the future may expand on more dynamic and functional languages supported on the .NET platform.
Q3. What type of .NET applications do you primarily focus on? – Multiple answers allowed.Looking at the results, it is hard to deduce any info because almost everybody checked more than two answers and many areas score well. I think the conclusion is that few people build just one type of .NET solution so there is no point narrowing down the focus – and I don't plan to. FYI, here are the percentages: Client (64%), Web (50%), Server (24%), Mobile (17%), Rich Web (10%), Office (6%), Embedded (6%) and additional entries were: "SharePoint, libraries, prototypes not systems, not .NET, classic ASP, client-side SDK, my own n-tier environment, Extension to Visual Studio, Smart client, Plugins, Microsoft CRM, straight forward Windows cross platform apps, .NET 2 WebServices".
Q4. What OS do you run on your development machine? – Multiple answers allowed.Unsurprisingly Windows XP (62%) and Windows Vista (53%) came up top followed by Windows Server 2003 (11%) and then Windows Server 2008 (2%). Additionally some of you wrote: "Linux, Mac OS X, Windows 2000 server".
Q5. Do you have an active blog (more than 5 posts per month)?78% of my readers do not have a blog. From the 22% that do, not everybody left their URL but I have visited the ones that did. It was interesting that I did not know about some of these at all, which means that they never linked to my blog (because I know who links here and always check out a blog that does). Interesting fact (to me)...
Q6. Do you currently live in the UK most of your time?I did open the 3 separate surveys at different timezones and over multiple days to give everyone a chance and the result is that under half of the respondents live in the UK (39%). I will continue to talk about UK-specific news (e.g. events) and I will continue to make that clear in the title of the blog posts so the other 61% can easily ignore.
Q7. Besides reading my blog, do you also watch my screencasts?This was a big surprise to me. Only 50% watch the screencasts I produce (I was expecting it to be closer to 100%). I will be producing many more of these and will make sure people reading the blog are aware by pointing to them. Screencasts are a quality medium and I have tons of positive feedback about them in my inbox. I can only deduce (wish I had a specific question on the survey) that people who said "No" do not watch screencasts in general – you guys are MISSING OUT. More on this topic in a future blog post, in the meantime the screencasts link is always on the left.
Q8. What would you like my blog to focus on?This was the question where you could enter whatever you wanted in 4 optional textboxes:
Continue to do (137 suggestions),
Stop doing (30 suggestions),
Start doing (54 suggestions) and
Other feedback (30 suggestions). The previous hyperlinks take you to a text file for each that includes ALL the verbatim (stripping out anything that could identify individuals). There are definitely some action items I have taken from your feedback (e.g.
this) and others that I will, but the grand theme here (explicitly and implicitly) is "keep doing what you are doing" – I love it, thank you, stay tuned!