The inner page

This article is not intended as an example of good programming or prudent life decisions.

The other day I needed to pull a full HTML page into the middle of another page at set intervals. This second page had JavaScript that needed to execute on load, and I didn’t want to use an iframe or object because of the gross scrollbars. I more-or-less succeeded through a couple of filthy hacks in which I take inordinate pride.

It was one of those situations where you think it’s going to be an easy thing to do, but are soon proven very wrong by all the complexities you failed to consider in the beginning. My solution started out innocently enough.

function getPage()
{
	var xmlhttp = new XMLHttpRequest();
	xmlhttp.onload = function() {
	if (this.readyState == 4 && this.status == 200) {
		var pagecontainer = document.getElementById("page-container");
		pagecontainer.innerHTML = this.responseText
		}
	};
	xmlhttp.open("GET", "/page.html")
	xmlhttp.send();
}

Yes, this is pretty much the w3schools XHR example. It didn’t quite work, so I looked into the complexities of handling responses with a big chunk of HTML rather than a little bit of text and made the following modifications:

function getPage()
{
	var xmlhttp = new XMLHttpRequest();
	xmlhttp.onload = function() {
	if (this.readyState == 4 && this.status == 200) {
		var pagecontainer = document.getElementById("page-container");
		pagecontainer.innerHTML = this.responseXML
		}
	};
	xmlhttp.open("GET", "/page.html")
	xmlhttp.responseType = "document";
	xmlhttp.send();
}

Following the creation of this version, I realised I didn’t actually want to pull in the whole document, just the body. The only bits I needed outside the body were a couple of stylesheet and script imports in the head, and I knew those would remain static between the different pages I was pulling, so I just dumped them into the parent page. A more general solution would probably want to extract and include the head separately.

function getPage()
{
	var xmlhttp = new XMLHttpRequest();
	xmlhttp.onload = function() {
	if (this.readyState == 4 && this.status == 200) {
		var pagecontainer = document.getElementById("page-container");
		pagecontainer.innerHTML = this.responseXML.body.innerHTML
		}
	};
	xmlhttp.open("GET", "/page.html")
	xmlhttp.responseType = "document";
	xmlhttp.send();
}

At this point, I was getting in all the HTML I wanted, I could verify it was appearing in the page through my browser’s devtools, but nothing was actually displaying. After spinning my wheels a bit, I realised the page I was pulling in was extremely JS-heavy, and none of the JS was firing. I theorised this was because everything in the inner page was supposed to execute at page load, which didn’t really happen with the way I was doing things.

Luckily I soon found this helpful article from 2005 about executing JavaScript you’ve dropped into some element’s innerHTML. The idea is this: you tack a one-pixel image onto the end of your blob of HTML and pull some code in its onload handler (which should fire of pretty much as soon the HTML is inserted). This code does what you want to do and then removes the image. I used this trick to run fn(), the descriptively named function that was supposed to execute right after my inner page loaded.

function getPage()
{
	var xmlhttp = new XMLHttpRequest();
	xmlhttp.onload = function() {
	if (this.readyState == 4 && this.status == 200) {
		var pagecontainer = document.getElementById("page-container");
		pagecontainer.innerHTML = this.responseXML.body.innerHTML+"<img src='/img/tracker.png' onload='fn();this.parentNode.removeChild(this);'></img>";
		}
	};
	xmlhttp.open("GET", "/page.html")
	xmlhttp.responseType = "document";
	xmlhttp.send();
}

It didn’t work. fn() was undefined. After looking at the HTML for my inner page, I thought this might be because fn() was defined inside an anonymous function I’d overlooked earlier and therefore out of my scope. Changing the HTML of this page itself was not an option, so I used regex to remove the anonymous wrapper.

function getPage()
{
	var xmlhttp = new XMLHttpRequest();
	xmlhttp.onload = function() {
	if (this.readyState == 4 && this.status == 200) {
		var pagecontainer = document.getElementById("page-container");
		pagecontainer.innerHTML = ""

		var page = this.responseXML.body.innerHTML;
		var page = page.replace("(function() {", "");
		var page = page.replace("})();", "");

		pagecontainer.innerHTML = page+"<img src='/img/tracker.png' onload='fn();this.parentNode.removeChild(this);'></img>";
		}
	};
	xmlhttp.open("GET", "/page.html")
	xmlhttp.responseType = "document";
	xmlhttp.send();
}

Same error. After some search, I discovered that scope wasn’t the real problem – fn() was never evaluated in the first place, because as per the spec, scripts that come from XMLHTTPRequest don’t get executed.

Luckily, there’s a workaround for that as well: the magical and highly dangerous eval(). I just had to run the page’s JS through that before calling my function. So I extracted the two scripts I knew would be in every inner page and ran them through window.eval() (to ensure functions defined would appear in the global scope). A for loop would have been more proper, but I saw no need to put lipstick on this kludge.

function getPage()
{
	var xmlhttp = new XMLHttpRequest();
	xmlhttp.onload = function() {
	if (this.readyState == 4 && this.status == 200) {
		var pagecontainer = document.getElementById("page-container");
		pagecontainer.innerHTML = ""

		var page = this.responseXML.body.innerHTML;
		var page = page.replace("(function() {", "");
		var page = page.replace("})();", "");

		pagecontainer.innerHTML = page
		window.eval(pagecontainer.getElementsByTagName("script")[0].innerHTML)
		window.eval(pagecontainer.getElementsByTagName("script")[1].innerHTML)
		pagecontainer.innerHTML = pagecontainer.innerHTML+"<img src='/img/tracker.png' onload='fn();this.parentNode.removeChild(this);'></img>";
		}
	};
	xmlhttp.open("GET", "/page.html")
	xmlhttp.responseType = "document";
	xmlhttp.send();
}

That did it. My script worked. And by that point, this was enough.

Later on, I realised I could do everything I needed to with the eval()s and didn’t really need the regex or the tracker image, so I removed those and made the hack slightly less majestically awful.

In the end, though, we threw out the code because of the unpleasant way the inner page flashed every time it was pulled, and because we decided to go back and rearchitect properly by putting all the content and logic into the outer page and pulling a JSON object instead of an inner page, like God intended.