In 1990, Sir Tim Berners-Lee invented what will forever change the face of our world: the World Wide Web. Twenty-two years later, here I am, using this extraordinary tool to spread knowledge instantaneously throughout the world. Yet, I still believe that the full potential of the Internet is far from being reached. Twenty-two years only after the initial launch, our source of information is no longer called the television. It’s called Google. Our contact agenda is no more a scrappy book with deletions and overwriting. It’s known as Facebook. Our encyclopedia isn’t a dusty old large book forgotten in our libraries anymore. It is named Wikipedia. Hopefully, in a few years, your source of scientific information will no longer be some monthly magazine. It will be Science4All. After all, the initial purpose of the Web was to facilitate knowledge sharing for scientists.
In this article, I will explain to you how the languages of this amazing tool have been created. And how they work. I hope you’ll take advantage of that to create our next unavoidable tool in the future! But I’m not going to give you a tutorial of what codes to write to make your website cool. One reason for that is that I’m still learning. What I’ll do is give you a global understanding of how languages of the Web work so that you can understand this amazing new world, talk to specialists with their concepts and exploit the infinite possibilities of the Web for your own purposes.
Just like I’m using the World Wide Web to write and publish this article, Sir Tim Berners-Lee didn’t start from scratch to design the World Wide Web. Indeed, his work used the structure of the Internet.
I used to confuse those two concepts too! To understand the Internet, let’s go back in time. As computers started to arise, scientists started to design interactions between them in the 1960s. This was a first conceptual step. Quickly enough, computers started to interact, but this was usually done locally, that is, among computers of a same room, or of a same company, hence forming isolated networks, aka intranets. But in the 1970s, larger intranets started to be built by merging intranets, slowly transforming them into one single net linking most intranets. This single net is what we call the Internet, which emerged in the 1980s.
Connecting people is not enough to make them interact. Try to put in the same room farmers from different countries. They’ll have trouble to interact and I’ll let you guess why…
HTML invents Hyperlinks
Exactly! Just like humans, computers need to define a language they’ll speak to understand them well. That’s what Tim Berners-Lee defined in 1991. A language. This language is called HyperText Markup Language, better known as HTML. It’s a little bit hard to read for humans (especially if you’ve never learnt it nor practiced it, just like any other language), but you can read it on the browser you are currently using to read this article, by right clicking on the page and choosing “read source code” (or something similar, depending on your browser). Fortunately, the computer (well, the browser) translates the message of a web page from the HTML language to the visual page you are seeing.
But not just any language. In particular, Tim Berners-Lee, by inventing the HTML, designed a new concept that would revolutionize the structure of information: Hyperlinks. Before the HTML, we could only navigate with folders, just like your computer’s whether you’re on Windows, Mac OS or Linux. But now, thanks to the HTML, the navigation is made much simpler, as hyperlinks naturally appear in the content.
This very concept is actually the reason why the Web is called “web”. As a matter of fact, think of every web page as a dot, and link any two page A to a page B, if page A contains a link towards B. Then, you’ll have drawn a graph, aka network. This graph is the Web. Note that this graph is different from the Internet that links computers. Understanding this network and helping users to find their ways in the Web then becomes crucial. That’s what Google does. That’s what made and will make this company one of most influent company in the world.
Let’s add the fact that web pages can be found with an address called URL (Uniform Ressource Locator), like for instance http://www.Science4All.org. Well, that is not entirely true. For instance, if you go to URL http://www.facebook.com, you won’t get the same page as me, as I’ll be logged in. But that’s due to advanced languages of the web, which have been introduced later, and which we’ll discuss later as well. At the beginning of the Web, each page corresponded to one URL. Besides, even now, hyperlinks can only lead you to an URL. In particular, fortunately, they cannot lead you to my facebook page logged in as myself!
But the HTML alone wouldn’t have brought us to the Web we know today! In fact, developing web sites with the first version of the HTML today would be deeply inefficient. Indeed, all the information of a web page had to be inserted in a single file.
Mainly, the problem with that is that each web page’s code needs to be written. Yet, if you compare the html codes of two Science4All articles, you’ll see that they are very similar. Thus, somehow, we’ll have done some copy-paste two generate similar pages. And computer scientists (should) hate copy-pasting their codes.
Sure. Except, imagine I want to change the colors of the “Science 4 All”, the sizes of top-right buttons or the alignment of the texts of the sidebar. I would have to modify every html file I’ll have made. Well, so far, I only have two dozen important pages… so it would be long and boring but I guess I could do it. But imagine if Google found out that people didn’t like the color of their links… There would be absolutely no way every HTML file could be changed. That’s how emerged the idea of the introduction of a second complementary language in 1996, called CSS.
CSS creates separate formatting
Precisely! The idea of CSS is to separate the formatting from the content. This way, marketers can make wonderfully beautiful templates of pages, while journalists can focus on feeding the website with quality content. And the reason why this idea has revolutionized the Web is that we can define one CSS file that will be used for several HTML contents. To do so, HTML files simply need to have one line saying that the formatting will be done with such CSS file.
by simply modifying one CSS file! That’s exactly the reason why CSS is very efficient. Just like HTML, CSS has improved a lot since 1996. The last versions are HTML 5 (which had been highly expected because it enables the inclusion of multimedia supports!) and CSS 3.
Yes it is. But he’s not doing it all by himself. He actually founded the World Wide Web Consortium, aka W3C, which he is at the head of. The W3C’s role is to define new versions of HTML and CSS, which are then implemented by browsers to render the web pages we all enjoy.
However, we wouldn’t get far if we had to write HTML files whenever we wanted to generate a new page. In particular, Google is not having people writing HTML pages to find out the results of your searches. That’s where comes more computer science to perform better user of the Internet.
PHP provides dynamic pages
Google uses a source code that takes into account parameters such as the keywords you used for your search, but also information about websites that fit your search (and much more information!) to smartly automatically generate a web page that corresponds to what you were looking for. This generation will be done with the use of program. Now, this program is also implemented, by the web master.
Yes. And as you may have guessed, this program is not running on your computer. As you send your request to your router, the router will transfer it, and the request will finally arrive to a computer in charge of managing the website you requested. This computer is called a web server. In theory, it could be any computer, but since it needs to be always up and running, and since it might have to deal with numerous requests, web masters usually use the computers of companies specialized in providing web servers, called web hosting companies. Science4All’s web host is OVH.
The web server then receives the request, and a program runs to generate the HTML page corresponding to your request. This HTML page is then sent back to you, via your routeur. Your browser finally receives it, and displays nicely the HTML page to you.
What’s more, because of the information you send to the web server, the returned page can be personalized for you. That’s why, you’ll get a different page when searching for www.facebook.com, depending on the session you open. Similarly, on Science4All, you can log in, and the returned page will tell you that you’ll have access to the “article editing” page.
Now, the web master needs to talk some language to the web server to design the program.
The HTML language is very useful to say what a page will look like, but explaining how to customize each page can’t really be done in HTML. That’s why today’s web masters all use another language to generate HTML pages.
There are several languages which can be used to generate HTML pages. In fact, since a HTML page is simply a text message, any programming language can be used, including the most fundamental languages like C or C++. However, I would strongly recommend not to use those languages, as others are much more adapted for the purpose of creating websites, because of the frameworks constructed just for that.
A framework is a set of pre-defined useful functions. They can easily be re-used in any other program. They facilitate developers’ work as they no longer have to redefine any basic function, such as, displaying a title, designing an array, including sound… etc. Microsoft has developed, based on C#, the framework ASP .NET. Java introduced the framework Java Server Pages (JSP), which gives what’s known as JEE (and is very common for large institutional websites developed by professional web masters). Ruby defined Ruby on Rails. Python can be used with Django. Basically, for any fundamental language, an extension of functions has been created for a simpler web site development. And there’s also PHP…
Because it’s the one Science4All uses! There are also other major websites using PHP, including Facebook and Wikipedia. The main advantage of PHP is that it was made for websites. It’s also very easy to apprehend. Finally, and it’s very very important, it has a large community with plenty of ressources, so that you can easily find help whenever you need some!
PHP initial stood for Personal Home Page, since it was created in 1994 by Rasmus Lerdorf for his own website. However, for a long time, even though it was more popular because easier to use, it didn’t have the main advanced features of the other languages. In particular, it’s only in 2004 that PHP 5 included Object-Oriented Programming (which is too vast a topic for me talk about in this article). This has led to constructions of plenty of great frameworks such as Symfony or CakePHP.
If I had read this article before starting Science4All, I might have used one of those frameworks… But since I was a complete beginner in the Web, I started with a Content Management System (CMS).
A CMS is a totally already computed website, on which you’ll simply have to add your content. A Facebook fan page could almost be thought as a CMS, although not really customizable. The most famous CMS are probably Joomla and WordPress. I use WordPress, which is supposed to be suit for blogging, but can be highly adapted to most purpose. Indeed, plenty of plugins have been developed for WordPress and can easily be added, such as the Facebook box you see on the right, the Google Translate widget just above and BuddyPress that enables interactions between users through groups and forums. Without any previous knowledge, it took me a few days to get the website online thanks to WordPress.
WordPress is actually a set of files that you simply need to download on your computer or on your web server to get it working. Then, you can manage your website directly by using the generated pages. But, in fact, those files are mainly PHP files. Thus, you can modify them yourself to customize even more your website and get it doing what you want! That’s what I’m doing for Science4All. WordPress gives a great base to build the website, although it’s not as well structured as what a professional developer would do with a framework.
I’d rather not! PHP codes have got to be secret as they enable the management of any functionality of the website. Protecting them is important for safety reason. But I’ll get back to that in the next section.
What needs to be reminded from this section is that web server programs enable dynamic websites where the generated HTML page depends on information sent by the user. But that’s not all! These programs can use plenty of other information. Including those stored on the servers.
MySQL allows web storage
To go further in web developing, including adding forums, social networks and personal information, we need to store information on the Internet.
Yes we can do that. And that’s what’s being done for certain files such as pictures. But there are two problems with that. First, the structure of the information needs to be dealt with by hand, and this can get very complicated. Second, if badly stored, the information will take a lot of space. That’s why databases have been introduced, first for company information storage on their own computers, then for websites.
There are several softwares specialized in database management. The information is well structured. Its size is also highly optimized, which enables saving space on the servers. The best-known database softwares are Oracle and Microsoft SQL Server, which existed even before the Internet. Both are efficient, but both are also expensive (although training versions are free). However, there are also PostgreSQL and, of course, MySQL, invented in 1995, which Science4All uses.
Well, in fact, it came with WordPress. So I didn’t really choose. But I have to say that the combination “PHP + MySQL” is very common. Thus, once again, a large community on the Internet will be able to provide help. And, this combination has proved to work efficiently.
Yes! In practice, the database is placed on a different server than the web server. This enables web servers to really serve their purpose of quickly running web programs, while database servers can focus more on having large storage and on quickly answering database requests. Now, to access the database, the web server programs use special codes. These codes depend on the combination of your program language and the database software. That’s why this combination is important.
In WordPress, certain functions have been created to easily use the database without requiring the knowledge of actual database functions. That’s why I barely know how databases actually work. Because I don’t really need to.
Well, we are far from being done! One major problem of web pages is that they require a connection to the web server (and most of time to the database server) to be generated. This can take a while, not because the web program is slow, nor because answering database queries is long, but mainly because connecting to the web servers and transferring information can take a few seconds. A few seconds are not that much, but as speed is more and more important in today’s world, we’d like quicker interaction with web pages…
Because none of the other languages is really suitable for data transfer. XML, which uses tags, is the most common language specially designed for data transfer, because of the simplicity and efficiency of its structure. XML files can be found elsewhere, including storing data for companies. They are also a great way to store the parameters of a program. Computer scientists (should) hate inserting parameters in their codes, because they then have to get back into the code to modify parameters. Inserting parameters in a separate well structured XML file makes things so much simpler. XML has plenty of derived languages based on the same idea, including HTML, as well as your Microsoft DOCX, XLSX or PPTX files, which are compressions of XML files.
Here is a figure that recapitulates all the languages we have seen.
Let’s sum up
The Web is a new exciting world with an impressive potential. We’re still at the beginning of it. Thus, it’s crucial to know how it works to make good use of it. I hope this article has given you an overview of what’s being done and a hint at what can still be done. Unfortunately, things on the Web are very complex and it’s a difficult new world to apprehend. Only talking about its different languages is hard. Yet, this is just the tip of the iceberg, as many more important matters have not been discussed here, such as protocols of communication, routeurs, data centers, cloud computing, domain name servers, firewalls, censorship… My knowledge in these other fields is quite limited and I’d love it if someone could explain each of these simply to us all!