Joe Brinkman posted a technique to pass parameters to Javascript script files. The approach is simple — append a standard querystring to the script URL and obtain the parameters by locating the script element using the ID attribute, and parsing the key-value pairs from the “src” attribute. A sample usage is like this:

<script id="MyScript" src="http://www.foo.com/myscript.js?obj1=ABC&obj2=XYZ" type="text/javascript"></script>

This is a great approach and works well for most scenarios, except in those situations where HTML 4.01 Strict markup is desired. In this case there are two problems that would cause validation to fail:

1) The HTML “script” element does not support an ID attribute. Yes, this is true. Verify it for yourself here.

2) The ampersand (“&”) character is a predefined entity and must be expressed as an entity reference “&”

So, is there a simple way to work around this issue. Of course, otherwise the title of this post would be totally false advertising.

Here is the revised code snippet I propose in lieu of the one above:

<script src="http://www.foo.com/myscript.js#obj1/ABC/obj2/XYZ" type="text/javascript">

Let’s break it down.

1) The ID attribute has been removed. Even without the ID attribute there is still a simple way to reference the script element. The trick is to take advantage of the fact that as the browser is rendering the page, it executes JS code immediately when encountered. As a result, if the code in your JS file queries the DOM, the last script element in the DOM is a self-reference (i.e. a reference to the script element that loaded the code being executed). Thus, using document.getElementsByTagName(“script”) and referencing the last element in the array, provides an easy way to get the current script reference.

2) The standard querystring separator “?” has been replaced by “#”. This is not necessary, but something I did for semantic reasons. To overcome the “&” entity issue, I used a slash (“/”) character for separating not only key-value pairs, but keys and values too. The reason for this will become evident in a bit, but since I modified the querystring to something non-standard, I felt it was important to also remove the “?” separator which is a prefix for querystring. Instead, I used a hash (“#”) character which is a pointer to a document fragment and does not have any semantic value in the context of a JS script URL. (In fact the JS URL hash technique is a common way to pass data in cross-domain XMLHttp requests for this reason and also because it does not cause a page re-load.)

3) Key-Value pairs. The last change I made is to use a path-based approach to key-value pairs instead of Key=Value format. I did this because it is a common technique in RESTful URL’s, is easier to read and much simpler to parse. In fact, using a single “for…” loop, it becomes a trivial task to create an associative array of all the parameters for ready reference.

Here’s the code wrapped into a function with a sample call:


var myParams = getScriptUrlParams();
alert(myParams["obj1"]);
alert(myParams["obj2"]);

function getScriptUrlParams()
{
	var scriptTags = document.getElementsByTagName("script");

	// This code is assumed to be in a file so the "src" attribute
	// is guaranteed to be present...no error-checking is needed
	var urlFrags = scriptTags[scriptTags.length-1].src.split("#");

	var urlParams=[];
	var urlParamRaw = [];
	if (urlFrags.length > 1)
	{
	    urlParamRaw = urlFrags[1].split("/");
	    if (urlParamRaw.length >= 2)
	    {
	    	for(var param=0;param<urlParamRaw.length;param+=2)
	            urlParams[urlParamRaw[param]] = (urlParamRaw.length >= param + 1 ? unescape(urlParamRaw[param+1]) : null);
    	    }
	}

	return(urlParams);
}

What do you think? Is this a simpler approach or is it more complicated? Are there other techniques for working around the XHTML validation issue.

On the LinkedInBloggers group, there is an interesting discussion on how to prevent blog harvesting. Turning off RSS feeds and subscription feeds seemed to be the suggested solution. I think this is an impractical solution and makes your blog harder to find and harder to consume with RSS readers.

I wonder if disabling RSS/subscription widgets is the only way? What if there was a simple way to ensure that your content only displays in a browser if it is being served from your site and when displayed on a harvester’s site, it simply redirects the browser back to your site?

I came up with a solution that might work. My solution is based on two assumptions:

1) RSS readers ignore Javascript

2) Most blog engines have a templating feature that allow the URL of the blog post to be injected anywhere on the page containing the post

The solution is pretty simple:

Embed a simple script in your blog post that checks to see if the location where your blog content is being displayed is valid (i.e. your blog) or invalid (i.e. harvester site). If it is invalid, then redirect the browser to your blog.

Not only does this approach thwart harvesters (at least until they filter out the script), but it has the added benefit of getting the search traffic from the harvester’s site back to your blog.

Let’s walk through the changes you would make to your blog’s template in order to enable this capability:

Original blog HTML:

This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog.

Steps for modifying blog HTML:

Step 1: Add DIV element wrapper for content

<div id="BlogContent" title="http://www.yourblogsite.com/URL-of-your-blog-post.htm">This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog.</div>

I used an id of “BlogContent” but you can use anything you want. If your blog displays the entire contents of more than blog post on a page, you will want this entry to be changed for each blog. In that case, try using “BlogContent{{ ID }}” where {{ ID }} is your blog engine’s token for some unique identifier associated with your blog. If you take this approach, be sure to modify the “BlogContent” string in Step 2 also.

Also, note the URL in the value of the “title” attribute of the

containing the blog content. You should not actually type in a URL there, but instead use the token feature of your blog engine that will inject the URL of the blog post page. Something like:

<div id="BlogContent" title="{{ PostURL }}">

({{ ID }} and {{PostURL}} are not an actual tokens…I just made them up. You will need to look at your blog engine’s documentation to figure out the tokens you should use.)

This URL serves two purposes:
- It provides a standards-compliant way to include the original URL of your blog in the blog content so that no matter where the content is posted, the original URL is always in the HTML source code, and
- It provides the script in Step 2 to have a known place to find the original URL

Step 2: Embed script to foil harvesters

The script to embed is:

<script type="text/javascript">// <![CDATA[

var blogContent = document.getElementById("BlogContent");
if (location.href.toLowerCase().indexOf(blogContent.title.toLowerCase()) != 0) location.href = blogContent.title;
// ]]></script>

Here’s what the script is doing:

a) Find the HTML element containing the blog content

var blogContent = document.getElementById("BlogContent");

b) Test if the content is running on the original site, if not, then redirect to the original site

if (location.href.toLowerCase().indexOf(blogContent.title.toLowerCase()) != 0) location.href = blogContent.title;

Here’s what the final content might look like:

<div id="BlogContent" title="{{PostURL}}">This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog.
<script type="text/javascript">// <![CDATA[

var blogContent = document.getElementById("BlogContent");
if (location.href.toLowerCase().indexOf(blogContent.title.toLowerCase()) != 0) location.href = blogContent.title;
// ]]></script>

Step 3: (Optional) Putting the script in a separate file

Instead of placing the script in each blog post as described above, you can also put the script into a separate file such as harvestblock.js. This will reduce the page size as the entire script will not be repeated for each blog post. You only need to include this part of the script in the file

var blogContent = document.getElementById("BlogContent");
if (location.href.toLowerCase().indexOf(blogContent.title.toLowerCase()) != 0) location.href = blogContent.title;

If you do this, the revised content might look like:

<div id="BlogContent" title="{{PostURL}}">This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog.
<script src="http://www.yourblog.com/harvestblock.js" type="text/javascript"></script></div>

Note: The URL used for the script must be a fully-qualified URL because it must work no matter whether the content is running on your site or on the harvester’s site.


Let’s look at what happens when you make this change:

1) User looking at content on your site

The script will detect a match between the URL being displayed in the browser and the URL of the blog post. As a result, it will do nothing and there will be no change in behavior from what your users are already seeing.

2) User looking at content in their RSS reader

The script will not run and as a result there will be no change in behavior from what your users are already seeing.

3) User looking at content on harvester site

The script will detect a mis-match between the URL being displayed in the browser and the URL of the blog post. As a result, it will redirect the user to the original blog post.

This solution is not fool-proof. If a harvester is stripping script embedded in a blog post then it will not work. I highly doubt this will happen very often because most harvested content is simply the content from the RSS feed as-is.

If you employ this solution please provide information on the specific token you use with your blogging engine in the comments.

© 2012 TechBubble Suffusion theme by Sayontan Sinha