Note: This is a post transferred from Laurii for historical and consolidation purposes.
A common problem I have to deal with quite often is to remove all HTML tags from a document. While this is easy for XML (well formatted etc.) and you could do it by hand with a