I've been interested in the semantic and accessible web for over five years, now, and the benefits I have developed on various web sites are clear to see. Semantic content is where web pages are 'marked-up' according to their structural significance, as opposed to their presentational significance. So, a heading is marked up as a heading, now a bit of text with bold and a larger font. The formatting of the heading is achieved using Cascading-Style-Sheets. Of course, I'm preaching to the converted I'm sure. Semantic content has key benefits not only for coding cleanliness, but also increases the accessibility of a page to disabled users, availability of the page to users of alternative platforms such as mobile phones and, most importantly, 'readability' by search engines.
Search Engine Optimisation is currently the sexy business of the web, and companies are making thousands of pounds from making out that they can understand how the major search engines aggregate content for users searches. How they can claim this, when much of the internal procedures and practices of the big search engines are hidden and closely guarded secrets, is quite unknown and I say to people who ask me that unless you have money to burn, it really isn't worth the investment. It is better to make sure you have a well structured, semantic and accessible web site from the outset than to waste money on a service which, ultimately, cannot be guaranteed.
In my experience of developing well structured, semantic sites, it can take a little longer to achieve "Page 1" or even "Top 5" search result status, but this is organic growth, which proves the quality of the content and therefore endorses the work and practices put behind the site development. Arrival at the prestigious position on the search result page in an organic manner can also help cement these development practices in business leaders' minds as being worth the extra effort or thought, as opposed to settling with randomly pushing the latest marketing drive or topic they feel users would or should be interested in. Thought at every point in the process of developing web content is essential.
Whatever "Web 2.0" actually means, some of it certainly involves user-generated content. User-generated content, despite what the seminars will badge as new "Web 2.0", is not a new development. Though the accessible, semantic web is relatively new. More and more sites are being re-developed to be platform portable and highly accessible, which is often an expensive project involving work from the ground up. As soon as you open up a site to user-generated content, however, you lose control of this high-quality of web content which makes the quality of your web-site markup so accessible and understandable to search engines, etc. Fair enough, comments to blogs are quite simple and are well aggregated by search engines. But this is because there is often very little control handed over to the user in adding their content.
In my experience of working with sites that involve user-generated content, it is very difficult to hand over the power of the web - including ability to format text, insert images and even obejcts such as Flash or video content - without compromising the quality of mark-up in a page. Two reasons are clear for this: the user is not aware (and should not need to be) of the need to structure their content in the correct and optimised manner and the tools available for this user-generated content are often poor.
A case in point is a major site I have recently been involved in. The site, based on the Isle of Man but clearly with a global reach allows users to post their own content in the form of advertisements, or requests for services along with ability to create their own profile. In order to implement this rich editing capability by users, I needed to be able to find a good editor that works like a word-processor, but creates very high quality mark-up for the web. Unfortunately, the options are very limited. Most rich-text editors for the web have the unfortunate reliance on the rich text controls within each browser. Therefore, when using a rich text editor on Internet Explorer, it exposes parts of the rich text functionality of the browser itself. As you can imagine, this doesn't work with XHTML, but a bizarre hybrid of XHTML+HTML+MS-HTML. This results in very poor markup. Paste your work from a word processor that claims to support web content, such as Microsoft Word, and you get even worse content. (Indeed, many editors have special parsers dedicated to extracting the rubbish these packages insert into the content.)
For example, take the following content of an advertisement:
<p style="TEXT-ALIGN: justify">
<b>
<span style="FONT-SIZE: 11pt">
<span style="TEXT-DECORATION: none">
<u>
</u>
</span>
....
<p style="TEXT-ALIGN: justify">
<span style="FONT-SIZE: 11pt">
</span>
</p>
....
<p style="TEXT-ALIGN: justify">
<u>
<span style="FONT-SIZE: 11pt">
<span style="FONT-FAMILY: Times New Roman">The Role</span>
</span>
</u>
</p>
....
<p style="TEXT-ALIGN: justify">
<span style="FONT-SIZE: 11pt; FONT-FAMILY: Symbol">
<span>· </span>
<span style="FONT-SIZE: 11pt">
<span style="FONT-FAMILY: Times New Roman">…have at least 6 months to 1 year plus proven sales experience (ideally within recruitment, yet field sales, business development and account management are also desirable). </span>
</span>
<p style="TEXT-ALIGN: justify">
<span style="FONT-SIZE: 11pt; FONT-FAMILY: Symbol">
<span>· </span>
<span style="FONT-SIZE: 11pt">
<span style="FONT-FAMILY: Times New Roman">…will be professional and organised to manage the workload and the needs and expectations of the clients. </span>
</span>
<p style="TEXT-ALIGN: justify">
<span style="FONT-SIZE: 11pt; FONT-FAMILY: Symbol">
<span>· </span>
<span style="FONT-SIZE: 11pt">
<span style="FONT-FAMILY: Times New Roman">…will have excellent communication skills and show proven negotiation experience and be enthusiastic to drive their team.</span>
</span>
</span>
</p>
</span>
</p>
</span>
</p>
</span>
</b>
</p>
Anybody with the basics of knowledge of XHTML and semantic content can see that the mark-up there is quite poor. A real shame, as the content has been added in a logical manner, as bullet points. This example shows a number of violations of semantic XHTML, such as use of deprecated and meaningless tags (STRONG should be used instead of B), nested P tags (disallowed) and the bullet points are manual in that they are special characters of a particular font, not the XHTML UL/LI tags that should be used.
The fault doesn't lie with the user, or the site. The fault lies chiefly in the editor used to create this content. The editor is the radEditor from Telerik. Itself, a very functional and advanced editor. The editor claims XHTML support (though I dispute this) and is one of the better editors on the market. It has an attractive licensing package, and other than the poor XHTML output I have no complaints. This editor is one of the series of editors that rely on the built-in rich text capabilities of the browser, and any cleansing of the markup is performed in a mixture of server-side and Javascript code. To be fair, it is very difficult to apply automatic cleansing of human generated content. Therefore, with the best will in the world, these editors set themselves up to fail to achieve the most logical and smeantic content structure. Another editor does exist, XStandard, which does provide much higher quality content as it is an ActiveX control written from the ground up and deals in pure XML. Thie editor, however, is expensive to license in some scenarios and doesn't offer the cross-browser support that other editors provide.
So, here, the user as created their own content and posted it to the site, believing that their content will be effectively handled by search engines. While the site itself enjoys good search rankings and employs W3C compliant code in the surrounding mark-up, the weak nature of the generated markup lets down the content. It can be very frustrating as a developer to develop a site with high quality W3C compliant code, and then have dirty code be inserted by users via the available editors. So, it was with a heavy heart that I recommended that the owner of the site should withdraw any claims that the site is W3C compliant. This weakens it's marketing position, particularly when sold to a technical audience.
We have discovered a number of problems in user generated content:
-
Availability of quality tools is poor and unsupported
-
Knowledge of how to effectively present web content is poor within the user-base
-
Effectively merging static surrounding web site code with user generated code can be difficult
-
Controlling user content is difficult in order to minimise errors, accessibility concerns and quality
In order to be able to effectively publish user generated content, amid a Web 2.0 environment, and have it effectively aggregated and understood by various platform ans search engines, doesn't it mean that we have to exert a greater degree of control over the content - thereby detracting from the point of user-generated content in the first place? If submissions are moderated, and then cleaned up, users will feel as if their contributions are being edited which is clearly not an impression a site wants to portray.
Maybe a way of structuring users submission is needed. Provide the user with a template, into which users can add images, paragraphs, headings and other content in a semantic and controlled manner. The user can therefore create their content, how they want, but the content will be inserted in a well managed manner by virtue of the web site editing software. Maybe an idea for my next project ....
Read the complete post at http://bloggingabout.net/blogs/program.x/archive/2008/03/23/semantic-web-2-0-user-generated-content-is-difficult-to-achieve.aspx
Posted
03-23-2008 11:01
by
Nathan J Pledger