Creating and disseminating educational resources for SSPCloud users

Startup guide

Author

Lino Galiana and Olivier Meslin

This tutorial aims to be a practical guide for people developing educational resources in or Python and wishing to disseminate them easily through the SSPCloud.

To make this possible, a few technical prerequisites are needed - these are outlined in Important 1.

The aim of this tutorial is to get you started quickly on building educational resources with a state-of-the-art level of reproducibility. The next sections explain how to develop educational resources, and how to disseminate them on the SPPCloud. Keep in mind that this tutorial is intended for people developing resources, not for the users of these resources.

Important 1: Technical requirements to deploy educational resources
  • A minimum level of proficiency in Git is required to develop and make available online training resources. However, accessing to these resources on the SSPCloud does not require any familiarity with Git.
  • Quarto the automated report and website builder inherited from R Markdown.
  • An understanding of the fundamental difference between making resources available for execution on a local computer or on a server like SSPCloud. See SSPCloud documentation.
  • Some knowledge regarding deployment of resources using Github Actions is useful. This tutorial gives a few templates but understanding what is happening behind the stage is not in the scope of that tutorial.

1 Why share educational resources on SSPCloud?

Sharing educational resources on SSPCloud offers several advantages, particularly for educators and learners working with computational tools like R or Python. Here are the key benefits:

  1. Instant access via a simple link SSPCloud allows users to launch pre-configured workspaces through a simple HTTPS link. Doing so, there is no need for any local installation or complex setup: learners can start working immediately, directly in their browser.

  2. Standardized and reproducible environments Everyone accesses the same cloud-based environment, including all necessary packages, tools, and datasets. This ensures full consistency across learners, removing issues related to differing operating systems, versions, or missing dependencies.

  3. Scalable computing power SSPCloud offers access to robust computational resources that can scale with your needs. From beginner tutorials in Python or R to advanced workflows involving large datasets or machine learning models, the platform can support a wide range of educational use cases.

As outlined in the SSPCloud documentation, it’s important to understand that the platform separates code (Git), data (S3), and environment configuration. These components are dynamically combined when launching a compute session.

For training designers, this architecture ensures high reproducibility and fine-grained control over the environment. For users, it eliminates the need to install software, manage system permissions, or configure their local machine—making it easy to jump straight into the tutorial.

An illustration of the SSPCloud functioning

2 Which materials should be used for trainings?

When designing effective training materials, it’s essential to choose formats that actively engage learners and support comprehension. While PDFs, slide decks, and videos are all valuable for delivering content, this guide focuses on interactive environments, which are particularly well-suited for hands-on learning. These environments enable learners to apply concepts immediately, test code in real time, and actively engage with the material—leading to deeper understanding and better long-term retention.

In the sections that follow, we’ll begin by comparing various interactive environments based on the programming language they support. Then, we’ll walk through how to build and structure them using Quarto.

2.1 Which interactive environment should you use ?

Jupyter notebooks1 offer an interactive interface that allows you to write Python code, test it, and see the result below the instruction rather than in a separate console. Jupyter notebooks are essential in the fields of data science and education and research because they greatly simplify exploration and experimentation.

They allow you to combine text in Markdown format (a lighter markup text format than HTML or \(\LaTeX\)), Python code, and HTML code for visualizations and animations in a single document.

Initially, Jupyter was the only software offering these interactive features. Now, there are other ways to benefit from notebook advantages while having an IDE with more comprehensive features than Jupyter. For this reason, as of 2025, we recommend developing resources with VSCode , a general-purpose code editor but offering excellent features in Python, rather than with Jupyter. For more information on using notebooks in VSCode, refer to the official documentation.

Note

Although we recommend developing resources with Vscode rather than with Jupyter, final users will be able to open your educational resources with Jupyter if they want to.

Example with Jupyter

Example with VSCode

In , notebooks are not commonly used - even though they offer several features that are particularly valuable for educational purposes.

When designing -based training materials, two main options are available, each with its own pros and cons:

  1. Providing access to an HTML website
    • Advantages: Solutions to exercices can be hidden, interactive widgets can be used to offer hints or guide learners step by step, and the layout is optimized for reading.
    • Drawbacks: This is not an interactive environment—learners must open a separate RStudio session to try out the code, which can disrupt the flow of learning.
  2. Providing access to a Quarto Markdown (.qmd) file
    • Advantages: Fully interactive—learners can run code directly within the environment, benefiting from live feedback and hands-on practice.
    • Drawbacks: All answers and code are visible by default, which may reduce engagement and exploratory effort from the learners.

Since the first option can become cumbersome - especially when learners need to copy and paste large blocks of code - it is generally recommended to use the second option in most cases. Providing direct access to a Quarto Markdown file helps minimize the risk of hard-to-reproduce errors that often arise when learners switch between an HTML tutorial and their R session.

Exemple with RStudio
Why learnr Is Not Recommended

While learnr allows for the creation of interactive and advanced elements such as quizzes—making it well-suited for beginner tutorials—it has some important limitations.

First, it requires deployment on a Shiny server, which can be costly and complex to maintain. Additionally, in learnr, code cells do not share a global environment. This means variables and objects created in one chunk are not accessible in others, making it difficult to manage state or build on previous steps.

These restrictions limit the usefulness of learnr for more complex tutorials, where maintaining continuity and evolving context across the tutorial is essential.

2.2 Introducing Quarto to create training resources

Quarto is an open source program for creating Python and reproducible tutorials. It makes it possible to mix seamlessly code and text in the same document and can handle many output formats, including html, pdf or notebook (.ipynb extension). Quarto is strongly recommended for the development of educational resources.

This tutorial assumes that educational resources and training material will be made available in two forms:

  • Quarto websites, mixing text and code chunks;
  • Interactive environment (see ☝️)

3 A step-by-step tutorial

3.1 Step 1: create a Github repository from a template

3.1.1 What are templates and why use them?

The very first step towards developing educational resources consists in creating a Github repository that will contain them all. We recommend that you use the templates developed specifically for the AIML4OS project. There are two different templates, depending on the language you want to use:

These templates contain all what you need to produce resources that can be easily made available as websites or as interactive environments on the SSP Cloud. More precisely, these templates contain:

  • a minimal Quarto website with graphical elements reflecting AIML4OS aesthetic;
  • a minimal example of a Quarto document producing a Jupyter Notebook ( Python only);
  • a minimal example of a Quarto document mixing Markdown text and R code ( only);
  • what is needed to manage dependencies (what packages are needed to run the codes, and in which version);
  • scripts for Github Actions workflow for automated deployment (don’t be afraid, see below!).
If you are not AIML4OS member and follow this guide anyway

You can still use the templates - just remove the style components using AIML4OS aesthetic.

3.1.2 How to use a template?

Here is what to do to re-use a template:

  • Go to the Github page of the chosen template;
  • Click on the “Use this template” button and then on “Create a new repository” (see screenshot);

  • Choose carefully the owner and the name of the new repository:
    • Owner: by default the owner is the creator of the fork, but it may be preferable to choose a Github organization (for instance the AIML4OS organization);
    • Name: give the repository a meaningful name, for instance “Intro_To_Deep_Learning” or “Intro_To_Linear_Regression”.

3.2 Step 2: define your development configuration on the SSPCloud

SSPCloud is not only useful to disseminate educational resources; it is also the right place to develop them. Doing so will facilitate resource dissemination as the environment used for training will be equivalent to the one used for development. In other words, we strongly recommend that you develop on the SSPCloud because this will help a lot to make your educational resources reproducible.

3.2.1 What is a configuration and why is it useful?

The best way to develop resources on the SSP Cloud is to define your own development configuration. In technical terms, a configuration is just a service available on the SSP Cloud (eg: RStudio or VSCode) with additional user-defined settings such as: the Github repository you want to work on, your Github credentials, the size of memory and number of CPU you want to use, the initialization script you want to run… Defining a configuration has two advantage:

  • it lets you define the technical environment you want to use for a specific project;
  • you can resume working on your project at any time in only one click, and be sure that the technical environment remains exactly the same.

Your configurations are listed on the right side of the “My services” tab. To use a configuration, you just have to click on Launch (red rectangle). You can modify or delete an existing configuration by clicking on the contextual menu (green rectangle).

Warning

Defining a configuration may seem complicated the first time you do it, but you will get used to it in no time.

3.2.2 How to define a configuration

Here is how to define a development configuration the SSP-Cloud.

  • Go the “My Services” Tab and click on “New Service”;
  • Choose Rstudio and click Launch;
  • Customize the configuration by changing four settings:
    • In the “Friendly Name” field, choose a meaningful name (for instance dev_Intro_To_Linear_Regression);
    • In the “Repository” field of the “Git” tab, paste the URL of the repository you created in step 1 (for instance: https://www.github.com/AIML4OS/Intro_To_Linear_Regression);
    • In the “Network Access” tab, enable access to your service through port 5000;
    • (optional) In the “Initialization scripts” tab, paste the URL of your init script (for instance: https://www.github.com/AIML4OS/Intro_To_Linear_Regression/init.sh);
  • Click on Save configuration;
  • Click on Launch;
  • Open the Rstudio service;
  • Rename the Rproj file with a meaningful name (for instance Intro_To_Linear_Regression.Rproj);
  • Click on the Rproj file to open the RStudio project;
  • Run renv::restore() (this may take a while);
  • You’re all set!
  • Go the “My Services” Tab and click on “New Service”;
  • Choose Vscode-r-python-julia and click Launch;
  • Customize the configuration by changing four settings:
    • In the “Friendly Name” field, choose a meaningful name (for instance dev_Intro_To_Linear_Regression);
    • In the “Repository” field of the “Git” tab, paste the URL of the repository you created in step 1 (for instance: https://www.github.com/AIML4OS/Intro_To_Linear_Regression);
    • In the “Network Access” tab, enable access to your service through port 5000;
    • In the “Initialization scripts” tab, paste this URL https://raw.githubusercontent.com/InseeFrLab/AIML4OS-template-quarto-python/refs/heads/main/init.sh;
  • Click on Save configuration;
  • Click on Launch;
  • Open the Vscode service;
  • You’re all set!

3.2.3 How to use an existing configuration

Once a configuration is defined, using it is very easy and very fast:

  • Go the “My Services” Tab;
  • On the right side of the screen, find the configuration of your project and click on Launch;
  • Open the Rstudio service;
  • Click on the Rproj file to open the RStudio project;
  • Run renv::restore() (this may take a while);
  • You’re all set!
  • Go the “My Services” Tab;
  • On the right side of the screen, find the configuration of your project and click on Launch;
  • Open the VSCode service;
  • You’re all set!

3.3 Step 3: develop resources with Quarto

If you have created a repository from a template and defined a development configuration, developing resources basically means modifying and extending the minimal examples available in your repository.

Git, git, git, git, git

It is absolutely essential that you commit and push your changes on a regular basis (every 30 minutes or so), because your service (Rstudio or VScode) is not persistent, meaning that any changes that were pushed before closing the service are permanently lost.

3.3.1 Download data (if needed)

Your scripts may require some data to run (for instance: training data for an algorithm). Here is a way to download data automatically:

  • Upload your data files on the S3 storage of the SSP Cloud, and make sure that the files are public;
  • Adapt the download_data.sh file so that the data is downloaded to your folder of choice;
  • Add this script as an initialization script in your configuration (see step 2).

3.3.2 Developing educational content in Quarto format

Most of the resources you will develop will take the form of Quarto documents, that you will either provide to final users or compile to final outputs (for instance, websites or Jupyter notebooks). As a consequence, you must learn how to use Quarto. Fortunately, Quarto is easy to use, has an amazing documentation and there are plenty of resources online to help you.

To to get started with Quarto documents, there are three basic elements you should know about: the header, raw text with Markdown formatting, and code chunks. We strongly recommend that you refer to the official Quarto documentation and in particular to the beginner Quarto tutorial.

3.3.2.1 The header

Your Quarto document typically starts with a YAML header to define metadata such as author, title, and so on. Here is a simple example. You can also have a look at the headers of the Quarto files available in the templates.

---
title: "My beautiful Quarto Report"
author: "Mickey mouse"
---

3.3.2.2 Raw text with Markdown formatting

A Quarto document contains blocks of text with Markdown formatting. See here for a detailed presentation of Markdown. Here is a short example:

This is text with *italics* and **bold**.

We can define lists and sublists :

- first element;
- second element;
    - first sub-element;
    - second sub-element.

This code will result in the following formatted text:

This is text with italics and bold.

We can define lists and sublists:

  • first element;
  • second element;
    • first sub-element;
    • second sub-element.

3.3.2.3 Code chunks

A Quarto document can also contain blocks of code inside code chunks, denoted with triple backticks and the language you use. Adding a name to each chunk is a good practice (nice_code in the example below). You can use chunk options to change the behavior of the chunk (for instance if you want to show some code without executing it).

```{r nice_code}
# Load packages
library(ggplot2)

# Plot mpg vs hp
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  theme_minimal()
```
```{python nice_code}
import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
  subplot_kw = {'projection': 'polar'} 
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()
```

3.3.3 Developing and previewing a website

The first kind of output you may want to produce is a static website, consisting in a series of html documents. Developing a website with the AIML4OS templates is very easy: you just have to write text and code chunks in the existing qmd files (index.qmd and chapter1.qmd). You can also extend the structure of the website by adding new Quarto documents (preferably in the chapters subdirectory). These new chapters must then be added to the structure of the website by modifying the _quarto.yml file in two places: in the render argument, and in the contents of the sidebar argument. Importantly, there is no difference between Python and R when developing a Quarto website.

It is often convenient to have a look at what this website looks like while developing it. You can preview your website from command line by executing:

quarto preview --port 5000 --host 0.0.0.0

Then go to https://datalab.sspcloud.fr/my-services, open the README of the service you are using and click on the link to the external port.

Figure 1: Accessing website preview
What to do if you can’t access the website preview?

If you can’t access the website preview because there is no link in the README, it is likely that you forgot to open port 5000 in your configuration. This is easily solved:

  • Close your RStudio/VSCode service (after committing and pushing all changes!);
  • Modify your configuration: in the “Network Access” tab, enable access to your service through port 5000 and save this new configuration;
  • Launch again your RStudio/VSCode service.

3.3.4 Developing interactive scripts

Depending on the resources you develop, it may make more sense to provide your learners with interactive environments rather than a static website, particularly if your resources contain exercises. Although the final outputs are quite different for R and Python (Jupyter notebooks for Python, Quarto documents with R chunks for R), these resources can nonetheless be developed using Quarto.

Warning

The function install.packages() must not be used anywhere in your resources because it is not the proper way to manage package requirements (see below for recommendations on dependency management).

Depending on the resources you develop, it may make more sense to provide your learners with a Jupyter notebook rather than a static website, particularly if your resources contain exercises. The good news is that you can easily generate Jupyter notebooks from your Quarto source files: you just need to execute the following command in the command line, and the notebooks will be written in the _site folder.

quarto render --to ipynb

3.3.5 Managing dependencies when developing resources

A major challenge in any data science project is to make sure that its codes can be re-run without error by someone working in a different environment (this is called portability). Various technical requirements must be met to ensure portability; one of them is to keep track of all packages needed to run the code (and of the exact version of each package!), so that a new user can reinstall them easily. This section explains how to manage these dependencies using the right tools.

The dependency management tool depends on the language you use:

  • If you use R, we recommend that you use renv;
  • If you use Python, we recommend that you use uv.

renv is an R package that helps you manage the dependencies of each of your projects. It means that renv lets you keep track of what packages are needed to run your code, and in what exact version. More precisely, using renv adds two specific files to your project:

  • The lockfile renv.lock records information about every package used in the project, so that these packages can be re-installed on a new machine;
  • the .Rprofile project file. This file is run automatically every time you start R so that renv is used properly.

Here is how to use renv:

  • The first step in using renv is to initiate the tracking of dependencies using renv::init(). You do not need to do it if you use the R template because this was already done.
  • When developing resources, you should regularly run renv::snapshot(). This command will analyze your R scripts, detects what packages are used, and updates the renv.lock file. Do not forget to commit and push the changes to the renv.lock file!
  • When you start working on your project with a new service, you should run renv::restore(). This command will reinstall all the packages listed in the renv.lock file.
Warning

If you forget to update the renv.lock file, your Github Actions workflow are likely to fail, and you probably won’t be able to execute your R script in a new RStudio service. Keep calm, here is the solution: just run renv::snapshot(), commit and push the changes to the renv.lock file. This should solve the problem.

For more information, see the official documentation of renv.

à compléter par Lino

3.4 Step 4: set up a Github pages website

This step is optional, depending on whether your resources include a website. You can skip this step if your resources consist only in interactive environments (Jupyter notebooks for Python, interactive Quarto documents with R chunks for R).

3.4.1 What are Github Actions and Github pages and why are they useful?

If you are developing a website for educational purposes, you probably want to publish it online. If you perform the publication manually (building the website, then publishing it), you’ll have to re-do this series of tasks every time you change your website. This is time-consuming, repetitive and boring. But fortunately, you can automate this process thanks to GitHub Actions and Github Pages:

  • GitHub Actions is a tool built into GitHub that runs pre-defined tasks like testing, building, and deploying code when specific events occur in a repository. For instance, GitHub Actions may perform a series of task every time you push on the main branch of your repository, without you doing anything manually.
  • GitHub Pages is a another tool from Github that lets you publish static websites directly from a GitHub repository; it is often used for project documentation or personal portfolios.

The main message is: by combining GitHub Actions with GitHub Pages, you can set things up so that every time you make changes to your repository, GitHub Actions automatically updates your website and publishes it on GitHub Pages. This will save you a lot of time!

3.4.2 Create the gh-pages branch

When developing resources, you will most likely use the main branch. However, for Github Actions to work, you need an additional branch gh-pages used by Github to deploy websites. This one will be rewritten automatically by Github after every Github Actions workflow. However, you need to create it first. This should be done only once. Quarto documentation gives this command to create that branch. Be careful with that command, do not use before having done a first push on Github.

git checkout --orphan gh-pages
git reset --hard # make sure all changes are committed before running this!
git commit --allow-empty -m "Initialising gh-pages branch"
git push origin gh-pages

3.4.3 Define a Github Actions workflow

You need to define a Github Actions workflow to automate output construction and deployment. A GHA workflow is a list of instructions (for instance: install R, install packages…) that are executed automatically every time a certain event happens (for instance every time you push on the main branch). A GHA workflow is defined using specific yaml scripts, located in the .github/workflows directory of your repository. The templates already contain standard workflows that you can use as a starting point. We recommend that you have a careful look at these workflows; comments were added so that they are easy to follow.

A few important remarks on automated deployment:

  • The website is deployed at an URL that depends on the owner and the name of the repository: https://{REPO_OWNER}.github.io/{REPO_NAME}/.
  • You can monitor Github Actions jobs in the “Actions” tab of the Github repository of your project. Do not be afraid if your jobs fail at first; getting Github Actions to work often requires some debugging.

3.5 Step 5: make your interactive environments easily available on the SSP Cloud

This step is optional, depending on whether your resources include interactive environments (Jupyter notebooks for Python, interactive Quarto documents with R chunks for R). You can skip this step if your resources consist only in a Quarto website.

A compléter avec Inès

Footnotes

  1. Jupyter originated from the IPython project, an interactive environment for Python developed by Fernando Pérez in 2001. In 2014, the project evolved to support other programming languages in addition to Python, leading to the creation of the Jupyter project. The name “Jupyter” is an acronym referring to the three main languages it supports: Julia, Python, and R.↩︎