Skip to content
Home ยป Misc ยป Aspiring programmers, learn SQL

Aspiring programmers, learn SQL

    Learn SQL

    Sit back for a second, and think about how many websites, software or mobile applications you use everyday that are storing some data somewhere, one way or another.

    Yep, that’s right, probably all of them. There are multiple ways to store data, and those programs could store information in files on their server or directly on your computer or phone. However, in most of the cases, they will use a database.

    What are databases?

    Databases are a way to store data in a structured way and then retrieve it in a different form. To use an example, if you have a multiplayer game, you probably have to store the players, with their username, their password and the date they registered on your game. You could totally store this information in a file, but you would encounter many issues:

    • If you want to verify that a user entered the right password, you have to read the whole file until you find the corresponding username, and then verify the password. It’s not efficient at all, especially if you grow to millions of users.
    • If you want to store some related data, for example the score of each session of each user, how do you proceed? Do you create a new file for each user, or a big file with all the scores of all the users? How do you read all of this data?
    • What if you want to display in a page the list of users with their highest score? You have to read all files completely and then map the users to their scores manually in your code.
    • And many more issues!

    Database management systems (DBMS), the software that runs your database, are made to handle all this types of use cases and even much more, including very complex professional problems, like handling trillions of records, or serving data to millions of customers are the same time.

    There are various types of databases, but the two most popular are “Relational databases” and “Document-based databases”.

    Relational databases

    Relational databases are by far the most used amongst the different types of databases. The most well known relational DBMS are MySQL, SQL Server, PostgreSQL, etc. which you have maybe already heard of. All those databases will be queried using a variant of the SQL language.

    Those databases store data in the form of tables, that have columns, representing the different attributes that you are storing, and rows, which are each representing one record. It’s pretty much like an Excel sheet.

    Using again our example of a multiplayer game where you store the players and the scores, you would have something looking like this:

    usernameemailpasswordregistration_date
    Joejoe@example.comxxxxxx2020-04-18
    Antoineantoine@mindflash.orgyyyyyy2020-04-31
    Jeffjeff@gmail.comzzzzzzz2020-04-31
    The table storing the users
    usernamescoregame_date
    Joe1278643432020-04-18
    Joe43646342020-04-19
    Jeff3766436342020-05-06
    Antoine36624535792020-05-12
    The table storing the scores

    This is a very easy to consult and intuitive way to present the different records, as it isn’t different than what you are used to seeing when it comes to tables of data.

    Document-based and other NoSQL databases

    NoSQL databases are referring to all the databases that are not relational, and they can be “document based” (what I will present here), or use even other ways of storing and accessing the data. Document based means that you are not storing the data in various tables with one record per entity, but you are rather storing a full “object” that has all the data you need.

    The most famous document-based DBMS for NoSQL databases is by far MongoDB. Originally all NoSQL databases were designed to solve specific data management problems, such as high volume of data to store, high availability, etc. However, nowadays MongoDB is really marketing itself as an “all purpose” database that you can use in any situation.

    As I said, data is stored differently, in the form of documents (JSON documents in the case of MongoDB), which will represent your full object:

    {
        _id: "f75e426491fc0561b51cd01",
        username: "Joe",
        email: "joe@example.com"
        password: "xxxxxx",
        registration_date: "2020-04-18",
        scores: [
            {
                score: 127864343,
                game_date: "2020-04-18"
            },
            {
                score: 4364634,
                game_date: "2020-04-19"
            }
        ]
    }
    

    The way to store the data is completely different than with relational databases, and so is the way to access it. The DBMS developers will usually provide you with a library that you can use to query the documents.

    Which of the two to choose?

    Each of the approaches have their advantages and inconveniences, but as I said earlier, traditionally, NoSQL databases have been made for specific use cases, like handling large amounts of data that is unpredictably structured. Indeed, with a relational database, you need to know beforehand what your data will look like when designing the structure of each of your tables.

    The catch is that in the vast majority of cases, you do know how your data will be structured. To take the example of our game, we already know beforehand that we want to store players, scores, etc. and that they will have some specific fields, since we also write the code for our game and we know which data we care about and want to store.

    Unless you have a specific usecase which doesn’t fit in a relational “datamodel”, I would recommend you to go with a relational database, and the most popular choice would be MySQL. In 99% of your use cases as a new programmer, it will be the right choice. Despite what the “cool kids” will tell you, NoSQL is still a small market.

    What is SQL and why learn it?

    SQL stands for “Structured Query Language”, and it is a programming language that allows you to query the database, to store, update or retrieve the information.

    It is very different from other programming languages that you might have used (Python, Java, C, etc.), because it will usually be in the form a single big “query” (a description of what you want to retrieve or insert) instead of a succession of operations like you would have in a function.

    SELECT
        users.username,
        email,
        MAX(score)
    FROM
        users
    LEFT JOIN
        scores
        ON (users.username = scores.username)
    GROUP BY
        users.username
    ORDER BY
        score DESC;
    
    -- The same query on a single line:
    SELECT users.username, email, MAX(score) FROM users LEFT JOIN scores ON (users.username = scores.username) GROUP BY users.username ORDER BY score DESC;
    

    In this example, I am querying the username, email and highest score of all the users in my database, and I am ordering them from the highest score to the lowest.

    As you can see, beside a few instructions, the code is very easy to read, and you don’t have to know SQL to somewhat understand what this query is doing. It’s not a very complex query (we write some that are 1000 lines long at my work!), but it is already very powerful and provides us with a non-trivial set of information in a single line of code.

    Now, you might be wondering why the title of this article, “Aspiring programmers, learn SQL” is so direct, and if the introduction hasn’t made you think about it already, here are a few reasons:

    • Databases are used everywhere, and SQL databases are the vast majority of the market. It is used in websites, in games, in mobile applications (usually embedded databases like SQLite), etc. Even your car is probably using SQLite!
    • It is in very high demand, which is easily explained by the previous point, and it will be a big plus to your resume. It’s simple, I always ask SQL questions when I interview people. I guarantee you that in all of the programming, data science and data analysis jobs that you will apply to, you will have to use SQL very frequently, if not daily.
    • It’s not going anywhere, and relational databases are definitely one of those tools that are never going to disappear.
    • It’s easy to learn. Basic queries are already very powerful, and you can learn them in a matter of hours (then, don’t forget to practice), and you will be able to handle 80% of your use cases with very short queries. I don’t know if it’s really an “argument” to learn SQL, but it’s definitely an argument against not learning it.

    How to learn SQL?

    Now that we know why you should learn SQL, let’s talk about how. As I said, SQL is just another programming language. It might look different, and the logic to employ isn’t exactly the same, but the way to learn it is the same as just any other language. I would recommend the following steps:

    1. Get a tutorial online, which has lessons and interactive exercices where you type directly on the website (it might be cumbersome at the start to install MySQL and load data on your computer).
    2. Do all the exercises until you are sure that you are familiar with all the concepts.
    3. Jump into a real project where you can leverage your newly acquired knowledge. Coincidentally, I have a written an article about personal projects ideas to improve in programming, that are also useful ๐Ÿ™‚

    Here are a few course recommendations that you can follow:

    1. FreeCodeCamp offers a free course with a good syllabus and will teach you how to install MySQL locally, which will be good for the future when you want to actually use it for a project. However the organization is a bit strange because it teaches first how to insert and update data, and only later how to query it, while in practice the insert part is relatively secondary. I think it’s still a great resource.
    2. CodeAcademy offers a free course where you learn by doing with an interactive console where you will type the queries and validate them against examples. A friend of mine with no previous programming used it to learn, but the syllabus is a bit light and it might lack theoretical explanations. Still a good option, because you can do everything through the website.
    3. Udemy offers a paid course, which seems to have the best syllabus of all (I haven’t followed it), putting emphasis on the most important parts. It also has very good reviews by more than 30,000 students. One thing to note is that the tutorial uses PostgreSQL instead of MySQL, which is a bit less popular but a very good option (probably a better option!), and in any case the vast majority of what you will learn will be usable both for Postgre and MySQL.

    To conclude I want to stress again the fact that SQL will be required in most applications, and will definitely be an appreciated skill (if not required) in all programming, data science and data analysis jobs.

    I highly encourage everyone to learn it, it will never be wasted time.