Sessions Available

  • 01

    Pre-Bootcamp On-Demand Training

    • Statistics for Data Science by Andrew Zirm, PhD

    • SQL for Data Science by Mona Khalil

    • Programming with Data: Python and Pandas by Daniel Gerlanc

Statistics for Data Science by Andrew Zirm, PhD

Session Overview

The emergence of data science as a discipline has impacted businesses in a range of different ways. One primary impact has been to elevate the use of data in decision-making by using statistical methods to assess the ever-growing datasets companies are collecting. This workshop will review and introduce statistical techniques and touch on more advanced methods for dealing with noisy data and applying real-world constraints to analyses. This workshop assumes a working knowledge of standard statistical methods and will aim to connect theory to practice using real-world examples.

Lesson 1: Descriptive statistics and exploring data statistically

- (Re)familiarize yourself with basic descriptive statistics
- Use simple data exploration techniques to identify problems and limitations of a new dataset

Lesson 2: Statistical analyses

- Review of statistical tests to compare datasets and groups within those data
- Assessments of correlations and other qualities of the data with an eye towards modeling

Lesson 3: More advanced analyses and methods

- Linear modeling and the statistical outputs thereof
- Stats -> ML: connections and methodologies

SQL for Data Science by Mona Khalil

Session Overview

Structured Query Language (SQL) is used to retrieve, shape, and transform data stored in relational databases. It’s also the most commonly used tool among data scientists today. SQL is an invaluable tool in your journey to become a data scientist, as well as being an excellent starting place to understand how data is stored, transformed, and evaluated.

By completing this workshop, you will develop an understanding of relational models of data, how SQL is used to retrieve that data, and how to join tables, aggregate information, and answer data science questions. You will also become familiar with many of the common types of SQL databases, how to access information in a database from the command line, and how to integrate database access from within Python.

Lesson 1: Relational Databases and Foundational SQL
Familiarize yourself with relational databases and the SQL syntax necessary to retrieve information from tables in a database. At the end of this lesson, you will be able to comfortably explore a database and retrieve, filter, and sort information from a table.

Lesson 2: Combining Data From Multiple Tables
Practice joining tables in a database and aggregating that information to answer simple questions about the data in your database. You will be able to identify columns used for joining/combining tables, choose the correct method for joining tables, and perform simple mathematical calculations on your data.

Lesson 3: Transform Your Data for Analysis
Let’s answer some common analytics and business questions using our database! We’ll put our skills to the test, leveraging existing information in a database to create new columns, filter based on complex conditions, and prepare a dataset for data visualization or more complex statistical analyses.

Programming with Data: Python and Pandas by Daniel Gerlanc

Session Overview

Whether in R, MATLAB, Stata, or Python, modern data analysis, for many researchers, requires some kind of programming. The preponderance of tools and specialized languages for data analysis suggests that general purpose programming languages like C and Java do not readily address the needs of data scientists; something more is needed. 

Lesson 1: Introduction to Python and Pandas DataFrames
In this training, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for interactive data analysis

Lesson 2: Core Functionalites of Pandas
Pandas is a massive library, so we will focus on its core functionality,specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses.


Instructor Bio:

Mona is a Data Scientist at Greenhouse Software in New York City, where they contribute to data-informed decision making across the company and machine learning solutions to improve the hiring process for Greenhouse customers. They’ve previously worked in government, creating analytics and machine learning solutions to improve the lives of New Yorkers, and continue to be involved in civic projects through a number of volunteer and non-profit organizations. They’ve also been a statistics and data science educator with DataCamp, Emeritus, and in university settings. They hold a graduate degree in Developmental Psychology, and are passionate about contributing to the ethical use of data science methodology in the public and private sector.

Mona Khalil

Data Scientist | Greenhouse Software


Instructor Bio:

Andrew is a Ph.D. Astrophysicist who made the switch from academia to data science (via the Insight Data Science program) in 2014. He was the first data scientist hired at Greenhouse Software where he has worked on many internal data science projects and a few customer-facing data-powered product features. Andrew lives in New Jersey with his wife and son.

Andrew Zirm, PhD

Senior Data Scientist | Greenhouse Software


Instructor Bio:

Daniel Gerlanc is a data scientist, software engineer, and technology instructor. After started his career as a hedge fund quant, he has spent the past decade bootstrapping data science and engineering teams for organizations of all sizes. He has co-authored several open-source R packages, published in peer-reviewed journals, and been an invited speaker at conferences including ODSC and PGConf. He is the author of the Programming with Data: Python and Pandas and teaches regularly on He has a B.A. from Williams College.

Daniel Gerlanc

President & Founder| Enplus Advisors, Inc.