Course Abstract

Training duration : 4 hours

SQL (structured query language) is used to retrieve, shape, and transform data stored in relational databases. It’s the most commonly used tool among data scientists today, and is an invaluable tool in your journey to become a data scientist. In this live session, you will learn how data in a database is stored, retrieved, and transformed for evaluation. By completing this workshop, you will develop a working knowledge of how to explore a relational database, how to use SQL to retrieve data from that database, and how to transform it to answer your data science questions.

What other students say about this session?

Great presenter and material.

I had a great time learning from Mona. She has an excellent style of instruction. I was able to grasp the concepts because of how well she paced through them.

It is very great practice to review some SQL Concept.

Learning Objectives

  • Identify the contents of and relationships between tables in a database

  • Write queries to aggregate, filter and sort data in a table

  • Join multiple tables in a relational data

  • Perform mathematical operations between columns

  • Identify the need for a subquery in order to properly transform data

  • Write subqueries in the FROM and WHERE clauses

  • Write common table expressions

  • Answer business questions using a combination of methods listed

DIFFICULTY LEVEL: BEGINNER

Instructor

Instructor Bio:

Senior Data Scientist | Greenhouse Software

Mona Khalil

Mona is a Senior Data Scientist at Greenhouse Software in New York City, where they contribute to data-informed decision making across the company and machine learning solutions to improve the hiring process for Greenhouse customers. They’ve previously worked in government, creating analytics and machine learning solutions to improve the lives of New Yorkers, and continue to be involved in civic projects through a number of volunteer and non-profit organizations. They’ve also been a statistics and data science educator with DataCamp, Emeritus, and in university settings. They hold a graduate degree in Developmental Psychology, and are passionate about contributing to the ethical use of data science methodology in the public and private sector.

Course Outline

During the 4-hours of lectures and exercises, we will cover the following topics:


Lesson 1: Relational Databases and Foundational SQL (45 minutes)

Familiarize yourself with relational databases and the SQL syntax necessary to retrieve information from tables in a database. At the end of this lesson, you will be able to comfortably explore a database and retrieve filter, and sort information from a table.


Lesson 2: Combining Data From Multiple Tables and Columns (1 hour)

Practice joining tables in a database and aggregating that information to answer simple questions about the data in your database. You will be able to identify columns used for joining/combining tables, choose the correct method for joining tables, and perform simple mathematical calculations on your data.


Lesson 3: Layering your Transformation with Subqueries (1 hour)

Often, your data needs to be transformed in multiple steps to get it in the shape necessary for a specific task. By the end of this lesson, you’ll learn how to write subqueries, common table expressions, and simple window functions necessary to shape data in multiple steps.


Lesson 4: Transforming your Data for Analysis (45 minutes)

Put your skills to the test by answering some common business questions using a relational database. You will be able to leverage existing information in a database to create new columns, conditional statements, complex filters, and prepare a dataset for evaluation by stakeholders or more complex statistical analysis.

Background knowledge

  • Being a core skill necessary to work with data effectively, the target audience is fairly wide and includes aspiring data analysts, business analysts, data scientists, and data engineers beginning to learn the core skills in their field.

  • Attendees should be familiar with the structure and manipulation of data in Excel/CSV files, but proficiency in Excel is not required.