Writing a URL Shortener with WebDev

unclepetecorner

At TeamAlogy.com we do data analytics for customer and team member engagement. As part of that process we send out survey request. Initially all of our request were going by email so it wasn’t a big deal to embed this link http://www.teamalogy.com/Survey.awp?P1=TEST0F2D-DD0E-43ED-96F6-7914F15C1558 into the email. However now that we are starting to offer the survey request as SMS messages, using up nearly 80 of the allowed 160 characters just for the link, seemed a bit excessive!

So I started researching URL Shortener Services, like Bitly or Google URL Shortener. However once I started researching them I found out  just how simple it was to build my own (Got to love Google and Wikipedia). Writing our own not only allowed us to retain our branding, but to avoid some of the issues of the other shortener services that I will cover shortly.

So now our survey request looks like this http://TeamAlogy.com/ABO. Much better branding, and 24 characters versus 76 is one heck of an reduction.

So read on to see how this was accomplished with a total of 50 lines of WX code and 3 lines of configuration in my Apache setup file!!!

If you were exacting this article to be an demonstration of elaborate and highly technical coding, you are going to be disappointed. Once you do a little research into URL shorteners you find out the concepts are very simple and with WX the coding is a breeze. Once you see just how simple it is, you might like be be in awe of the fact that companies like TinyURL and later Bitly created not only companies but an entire industry based on such a simple concept.

The “secret” behind URL shorteners is converting a number into a something of than base 10. What URL Shortener services do is take the long URL you give them and create a database record with that URL and a an autoincrementing unique key.

For instance: Lets pretend that we are starting our own URL shortener service. Lets call it UnclePetesSuperCoolURLShortenerService.com. (Hey who said I was a marketing genius!). So we have just launched the service and the its already gotten popular. And somebody comes along and wants a short URL for a link to their Blog Post. So they come to the site and enter their long URL “http://www.thenextage.com/wordpress/uncle-petes-corner-writing-a-url-shortener-with-webdev”. We add that to the database getting a unique id of 17230 (told you it was popular already). In Base 10 it requires 5 characters to represent 17230, but if we use a different base we can decrease that.

Most URL shortener services use Base62 to convert the record ID into a shortcode. Base62 is all 10 digits, all lower case, and all upper case letters, so 17230 becomes 4Tu.

And that was my first issue with most 3rd party URL shorteners, they use Base62, meaning that the short code is case sensitive, so 4Tu and 4TU are 2 very different codes. The services do this so they get the maximum amount of codes they can pack into a short code. And that is fine if you are sending someone  a link they can click on, but if they are going to have to type it in, say from an SMS to their computer etc. making it case sensitive can be an big issue.

A little more research and you will find out that there are customizable services that offer Base36 instead of Base62. That is all 10 digits plus all uppercase case letters, which avoids the case sensitive issue. As you will see when we examine the code, I ended up creating my own Base33, but more on that later.

So just how many numbers can be represented in short code when you use a different base? It becomes exponential. Take Base 62. A five character Base62 short code can represent, 62x62x62x62x62, which in case you can’t do that in your head is 916,132,832. A five character Base36 short code would allow 60,466,176. As mentioned I ended up going with a Base33 short code and I am allowing up to 10 characters for a short code, so once we send 1,531,578,985,264,449 survey request I will need to make a change, but by then I should be able to pay someone else to deal with it! We will be at 39,135,393 surveys before we ever get to 6 character codes.

So now that you have a little background on how short codes work, you have probably guessed the bulk of the coding involves converting a number from Base10 to whatever Base you decide to use.

So how is that done?

You first establish your Base. Base 62 is 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.

To convert our 17230 example to Base 62 we simply.

Take 17230 / 62 = 277 with a Remainder of 56

We use the remainder to get the the 57th character from the base. We added 1 to the remainder to allow for 0. That gives us the lower case “u”

Next we take 277 / 62 = 4 with Remainder of 29

The 30th character (again adding 1 to allow for 0), is the upper case “T”

Our integer part “4” is no longer divisible by our base so we just convert it using the 5th position of our base. to get “4”

And the number is put together as 4Tu.

To translate it back to a number is even easier.

We take the first character and find its position in the base so “4” = 5, then subtract one, again to allow for 0 to give us 4.

Next we take the second character and find its position 30 – 1 = 29.

We take our result from the last pass (4) times our base (62) plus our new remainder (29): (4*62) + 29 = 277

Then we take the 3rd character and find its position 57 – 1 = 56.

We take our result from the last pass (277) times our base (62) plus our new remainder (56): (277*62) + 56 = 17230

So now you know all the math behind the conversion, its almost time to look at some code. But one more concept to cover first.

I mentioned that I ended up creating my own Base33. Why you ask?  Well as discussed I didn’t want to deal with case sensitive issues so that took me to Base36. But have you ever gotten a serial number from someone and said. Is that an zero or an “o”, is that a one, an “L” or an “I”? For us that would be just as bad as the case sensitive issues so I removed 0,1 and “I” from my Base, giving me Base33 instead of Base36. Next I scrambled up my alphabet so that the numbers generated were quite so sequential.

So my Base33 is  “K4RAF2ZMD7HQV6EOX5CTW9GPU3LBN8SJY”

So Finally on to the code!

The first procedure is used to Encode the survey request id when we send out a survey.

PROCEDURE Encode33(LOCAL inNumber is int)

// Using Base 33, Using All Caps and Removing I, 0, 1 so there are no similar characters

BaseAlphabet is string = "K4RAF2ZMD7HQV6EOX5CTW9GPU3LBN8SJY"
Base is int = 33

ShortURL is string

WHILE inNumber > 0 
 Remainder is int = modulo(inNumber,Base)
 ShortURL += BaseAlphabet[[Remainder +1]]
 inNumber = IntegerPart(inNumber/Base)
END

RESULT ShortURL

If you look closely you will notice that there is one other difference between my routine and the traditional base conversion math. In the above examples my logic would generate uT4 not 4Tu. To generate the code in the expected order you should change the line above to read.

ShortURL = BaseAlphabet[[Remainder +1]] + ShortURL

In my case it is fine as I am not trying to match up to anyone else base system. And in fact you can use what ever logic you like just as long as your Decode procedure can correct Decode the value.

The second procedure Decode the value.

PROCEDURE Decode33(LOCAL ShortUrl is string)

// Using Base 33, Using All Caps and Removing I, 0, 1 so there are no similar characters

BaseAlphabet is string = "K4RAF2ZMD7HQV6EOX5CTW9GPU3LBN8SJY"
Base is int = 33

ShortUrl = Upper(ShortUrl) // Force UpperCase
ShortUrl = Replace(ShortUrl,"0","O") // 0 Must have meant O
ShortUrl = Replace(ShortUrl,"1","L") // 1 Must have meant L
ShortUrl = Replace(ShortUrl,"I","L") // I Must have meant L

outNumber is int
Remainder is int

FOR x = Length(ShortUrl) _TO_ 1 STEP -1
 Remainder = Position(BaseAlphabet,ShortUrl[[x]]) - 1 
 IF Remainder = -1 THEN BREAK // Bad Short URL
 outNumber = (outNumber * Base) + Remainder 
END
IF Remainder = -1 THEN
 RESULT = Remainder // Signal Bad Short URL
ELSE
 RESULT outNumber 
END

Since I didn’t “correctly” flip the characters in the Encode routine, I have to use the STEP -1 in this logic. You should remove that if you made the change I mentioned above.

So now you see the code to encode and decode the short code. How is it actually used to provide a short code link?

This portion of the site is AWP. Which is very common for me to use a hybrid approach with the front end of my site being AWP and then backend being WebDev Dynamic.

My Index page accepts the shortcode as a parameter.

2014-05-15_1320

This would require the site to be called like this: http://TeamAlogy.com/index.awp?P1=ABO. To make it a little easier for the next Apache configuration, I used another feature of WebDev, URL Rewriting.

2014-05-15_1325

2014-05-15_1326

That allows the call to be http://teamalogy.com/index-ABO.awp.

And then the final bit of WX code is to perform a PageDisplay to call the actual survey page with the “real” survey request id.

2014-05-15_1329

This simply takes the SurveyCode if it was passed in, uses our Decode function to translate that to the Survey Request Id, looks up that record and uses the information to pass to the survey page. If you were writing a traditional URL shortener this would be where you would use the ID to get the long URL out of the database and then redirect to that URL.

That is it for the WX side of things. It took me way longer to explain it than it doesn’t to just look at the code!

At this point we could send out links like this http://teamalogy.com/index-ABO.awp but remember we want to send this http://TeamAlogy.com/ABO. The last piece of the puzzle to accomplish that is a few lines in my Apache configuration file, to rewrite the URL.

 RewriteEngine On
 RewriteCond %{REQUEST_URI} !^/Teamalogy [NC]
 RewriteRule ^/([d|w]+)$ http://%{HTTP_HOST}/index-$1.awp [L]

The first line turn on the Rewrite Rules Engine.

The second line ignores any request that end with /Teamalogy, which is the link to launch the WebDev dynamic side of the website.

And the third line is the actual rule. Hope you speak regular expression, because that is the key to this. Basically it is taking any request for the main domain (without a specific page request), and taking the end code (the ABO in our example) and rewriting into the call we want http://TeamAlogy.com/ABO

Now I have a confession to make. I speak regular expression only slightly better than my French! I am good with Apache configuration but struggle to get these rewrite rules correct. But no problem a quick post on ODesk and $75 and less than an hour later a freelancer named Ahmed, had given me the exact lines to put in my config file for the rewrite rules. I have mentioned ODesk before and strongly encourage you to check it out if you ever need quick server admin, graphic work, or other types of general short term technical help.

Now PLEASE PLEASE PLEASE do not ask me how you do the rewrite rules in IIS. I would rather take a sharp stick in the eye, than to work on an IIS machine! I am sure it can be done, and if you can’t figure out how I refer you back to ODesk!

So there you have it. When you take this article and create your own URL shortener that knocks Bitly off their perch, and have an IPO that rivals Facebooks, just remember your good old Uncle Pete, gave you the seeds <G>

Be sure to go over to wxLive.us and watch the Uncle Pete’s corner webinar that is the companion to this article. 

Uncle Pete’s Corner is webinar series on all things WX, to watch the recorded version of this webinar, many other WX related webinars or to watch future ones live go to wxLive.us

[suffusion-the-author display=’author’]

Pete Halsted[suffusion-the-author display=’description’]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s