Skip to content

DataTalksClub/datatalksclub.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,904 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataTalks.Club Website

This repository contains the source code and content for datatalks.club, a Jekyll-based community website for data science, machine learning, AI, and data engineering practitioners.

What this repository is

  • Static website built with Jekyll
  • Content-first structure: markdown, data files, and reusable templates
  • Main entities are modeled as Jekyll collections (_posts, _podcast, _books, _people, etc.)
  • Navigation, events, announcements, and sponsors are managed via YAML files in _data

Main pages on the website

URL Source file What it means How it works
/ index.md Main landing page for the community Uses Liquid loops to aggregate data from multiple sources: upcoming events (_data/events.yaml), latest podcast episodes (_podcast), latest posts (_posts), sponsors (_data/sponsors.yaml), and active books (_books).
/articles.html articles.md Full article index Iterates over site.posts and links to each article with author references from _people.
/podcast.html podcast.md Podcast hub page Lists all episodes by season from _podcast; each episode gets its own detail page via collection permalink rules.
/books.html books.md "Book of the Week" program Splits books into upcoming vs archive using date filters (book.end > site.time and book.end < site.time).
/events.html events.md Public events calendar page Reads _data/events.yaml and divides events into upcoming and past based on event timestamp relative to site.time.
/people.html people.md Community people directory Renders all person profiles from _people, each with an auto-generated profile URL.
/slack.html slack.md Slack onboarding page Uses subscribe.html include for invite flow and documents key channels and participation guidelines.
/support.html support.md Community support and sponsorship page Static content page for funding model, sponsor principles, and contact details.
/tools.html tools.md Open-source spotlight page Iterates through _tools collection entries (tool links, demos, maintainers).
/blog/guide-to-free-online-courses-at-datatalks-club.html Post in _posts Primary courses landing page in navigation The top nav "Courses" item points here; individual Zoomcamp pages live mostly in _posts plus legacy _courses docs.

Website architecture (at a glance)

Layer Folder/files Responsibility
Content pages *.md in repo root Entry pages and hubs (index.md, events.md, podcast.md, etc.).
Blog posts _posts/*.md Long-form articles, course landing pages, and announcements; rendered under /blog/:title.html.
Domain collections _podcast, _books, _people, _courses, _tools, _conferences Structured content types with dedicated layouts and permalinks.
Data sources _data/*.yaml Site-wide data for menus, events, sponsors, and header announcements.
Layouts _layouts/*.html High-level page skeletons (home, page, post, podcast, book, author).
Reusable components _includes/*.html Shared snippets (header/footer, authors, event cards, subscribe blocks, etc.).
Assets images, assets Static media, styles, and supporting files.
Generated output _site Local build output generated by Jekyll.

How it works

Content model

Type Location URL shape Typical usage
Posts _posts/*.md /blog/:title.html Articles, guides, Zoomcamp pages, editorial content.
Podcast episodes _podcast/*.md /podcast/:title.html Episode pages linked from /podcast.html and homepage.
Books _books/*.md /books/:title.html Book of the Week detail pages and archive entries.
People _people/*.md /people/:title.html Author/speaker profiles used across posts, episodes, and events.
Courses _courses/*.md /courses/:title.html Legacy standalone course pages; many newer course pages are posts.
Tools _tools/*.md /tools/:title.html Open-source tool spotlights.
Conferences _conferences/*.md /conferences/:title.html Conference-specific pages.
Global data file Purpose Used by
_data/navigation.yaml Top and bottom navigation links header.html, footer.html includes
_data/events.yaml Event records and metadata index.md, events.md, event include
_data/header.yaml Optional announcement bar header.html include
_data/sponsors.yaml Sponsor names/logos/links Homepage sponsors section

Templating and layouts

  • Shared page layouts live in _layouts (home, page, post, podcast, book, author)
  • Reusable fragments live in _includes (header, footer, authors, event, subscribe forms, etc.)
  • Pages and collection documents combine front matter + markdown/html + Liquid loops/filters

Routing and permalink rules

  • The global permalink rule in _config.yml is /blog/:title.html for posts.
  • Collections define their own permalinks in _config.yml (/:collection/:title.html).
  • This means each content type can have both:
    • a hub/list page (e.g. podcast.md -> /podcast.html)
    • item detail pages (e.g. _podcast/*.md -> /podcast/<slug>.html)

Local development

Prerequisites

  • Ruby 2.7.0
  • Bundler
  • Python environment manager (uv) for helper scripts

Run Jekyll locally

rvm use ruby-2.7.0
gem install bundler
bundle install
bundle exec jekyll serve

Open http://localhost:4000.

Common contributor workflows

Task Edit this Notes
Publish a new article _posts Include front matter (title, description, authors, tags, layout, date).
Publish a new podcast episode _podcast Make sure season and episode are set for correct grouping on /podcast.html.
Add/update event _data/events.yaml Event type controls styling (webinar, podcast, workshop, conference).
Add/update person profile _people Required for author/speaker linking across pages and includes.
Add/update a book _books start/end dates determine upcoming vs archived display.
Update top menu links _data/navigation.yaml Header links are rendered from top entries.
Update homepage blocks index.md Homepage sections are manually structured and data-driven via Liquid.
Update announcement bar _data/header.yaml Shown in header only when announcement data exists.

Content and maintenance scripts

Install script dependencies:

uv sync
cd previews
npm install
cd ..

Run helper creator script:

uv run python scripts/create.py

This script helps create/update content entities such as people, books, and events from templates.

Script quick reference

Script/command Purpose
uv run python scripts/create.py Interactive helper to create people, books, and events using templates.
uv run python scripts/pandoc_full.py ... Generate post draft content from a DOCX source.
scripts/generate-book-preview.sh (called internally) Creates book preview assets for newly added books.

Generate a post from DOCX

uv run python scripts/pandoc_full.py \
  --input ~/Downloads/template.docx \
  --author angelicaloduca \
  --tags "mlops,devops,process"

Where to edit common things

  • Add/edit article: _posts
  • Add/edit podcast episode: _podcast
  • Add/edit person profile: _people
  • Add/edit book discussion: _books
  • Add/edit event: _data/events.yaml
  • Edit top menu/footer links: _data/navigation.yaml
  • Edit homepage content blocks: index.md
  • Edit global page structure/header/footer: _layouts and _includes

Deployment notes

  • Site URL is configured in _config.yml as https://datatalks.club
  • GitHub-specific files (like .github) are excluded from Jekyll output
  • Generated site output is in _site during local builds

Important implementation details

  • The repository includes many pages written in markdown with embedded HTML and Liquid; this is expected and used heavily for SEO and rich formatting.
  • Author references across posts/podcast/books depend on _people records; missing person entries usually cause broken attribution links.
  • Event rendering logic is date-driven (site.time comparisons), so event timestamp format consistency in _data/events.yaml is important.
  • Navigation is fully data-driven from _data/navigation.yaml, which keeps menu edits separate from template code.

About

The web page for DataTalks.Club, a global online community of data enthusiasts

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors