Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f53cfb5
init tutorial
Feb 10, 2026
01a19e3
switch to gpt-2
Feb 10, 2026
a20b522
another pass
Feb 10, 2026
4e134f8
another pass
Feb 10, 2026
00ab23a
another pass
Feb 10, 2026
dadec17
rename tutorial and add to index
Feb 10, 2026
8e198bc
another pass
Feb 10, 2026
7026cab
Merge branch 'main' into ray-data-ray-train
svekars Feb 10, 2026
a5a7e8b
ignore localhost url for link checker
Feb 10, 2026
5654b90
add url to distributed toctree
Feb 10, 2026
fc03113
add ray[train] and tiktoken to reqs
Feb 10, 2026
a88a635
add distributed training tutorial to ecosystem.rst
Feb 10, 2026
928421d
Apply suggestion from @justinvyu
crypdick Feb 11, 2026
1df47da
Apply suggestion from @justinvyu
crypdick Feb 11, 2026
2f58a6d
rewrote checkpointing and fault tolerance sections; style edits
Feb 11, 2026
77e89db
fix double-shift labels bug
Feb 11, 2026
8d45236
enable checkpointing in the tutorial
Feb 12, 2026
fcb56a1
Merge branch 'main' into ray-data-ray-train
svekars Feb 12, 2026
98d9357
reduced gpus required to 4; updated CI hardware
Feb 12, 2026
d8b40fb
revert log to driver
Feb 12, 2026
7e08a3a
Merge branch 'main' into ray-data-ray-train
svekars Feb 19, 2026
6b83385
make linter happy
Feb 23, 2026
66b176d
Merge branch 'main' into ray-data-ray-train
crypdick Feb 24, 2026
9192be5
fix machine type, auto-detect num GPU, add cpu fallback
Feb 24, 2026
5be972f
Update beginner_source/distributed_training_with_ray_tutorial.py
crypdick Feb 25, 2026
d20817f
Update beginner_source/distributed_training_with_ray_tutorial.py
crypdick Feb 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .ci/docker/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ bs4
awscliv2==2.1.1
flask
spacy==3.7.1 # Keep this version consistent with the model version in .jenkins/build.sh
ray[serve,tune]==2.52.1
ray[serve,train,tune]==2.52.1
tiktoken
tensorboard
jinja2==3.1.3
pytorch-lightning
Expand Down
3 changes: 2 additions & 1 deletion .devcontainer/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ bs4
awscli==1.16.35
flask
spacy
ray[tune]
ray[train,tune]

# PyTorch Theme
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
Expand All @@ -26,6 +26,7 @@ pandas
scikit-image
pillow==10.3.0
wget
tiktoken

# for codespaces env
pylint
3 changes: 3 additions & 0 deletions .jenkins/metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -73,5 +73,8 @@
},
"prototype_source/gpu_quantization_torchao_tutorial.py": {
"needs": "linux.g5.4xlarge.nvidia.gpu"
},
"beginner_source/distributed_training_with_ray_tutorial.py": {
"needs": "linux.16xlarge.nvidia.gpu"
}
}
3 changes: 3 additions & 0 deletions .lycheeignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ https://docs.pytorch.org/tutorials/beginner/colab*
# Ignore local host link from intermediate_source/tensorboard_tutorial.rst
http://localhost:6006

# Ignore local host link for Ray Dashboard
http://localhost:8265

# Ignore local host link from advanced_source/cpp_frontend.rst
https://www.uber.com/blog/deep-neuroevolution/

Expand Down
Loading