Clouderizer, by default, creates a folder named clouderizer under the user home directory on any machine it is run. This folder contains sub-folders for each individual project that is run on this machine. Inside each project folder are 3 important folders, data, code and output
We shall now go through each of these folders one by one.
Data is the directory which expects to house data involved in a project. This can be training/test/validation data or any other kind of static file that our project might need. By default, this path follows this schema
Datasets specified in URL Datasets and Kaggle Datasets get downloaded to data folder on the machine where the project is run for first time. After first run, data folder is backed and synced to your Google Drive, so that any changes you make while your project is running is persisted and ready for next run.
Code is the directory which houses our project files. These can be source code, scripts, practically all kinds of files that you have created/assembled to make our project work. By default, this path follows this schema
CODE field allows us to specify any Git URL to initialise code folder whenever project runs on a machine. In case URL Git authentication, we can press Auth button inside the input box to provide those credentials. Code is downloaded from Git on first run. After first run, this folder is backed and synced to your Google Drive, so that any changes you make while your project is running is persisted and ready for next run.
Out is the directory where output of our project should be saved. Output like model weights, intermediate check points should be saved in this directory. This directory also gets synced to Google Drive, allowing you to save your models during your experiments.
Clouderizer Drive is a cloud storage which is used to back up project data/code/output in cloud. Users can enable this by linking their Google Drive with Clouderizer. For every project we create, a code/data/out directory is created in Clouderizer Drive automatically.
While a clouderizer project is running on a machine, its code, data and out folders are synced up to Clouderizer Drive every minute. This sync does not delete any files on Clouderizer Drive. It only copies new and modified files from local folders to Clouderizer.
This Sync Up helps us to backup our work, any local changes done on code files, datasets, model weights and checkpoints.
While a project is running on a machine, its data folder is synced down from Clouderizer Drive every two minutes. This sync does not delete any files locally on the machine. It only copies new and modified files from Clouderizer Drive to local machine.
This Sync Down helps us to transfer datasets (or any other kind of data) to our machine running the project. At any point of time we just need to upload our datasets to the data folder of our project on Google Drive. Within two minutes this data will be downloaded in data folder of our project on the machine where project is running.
Project First start
Whenever Clouderizer project is started for first time after creation, code and data folders are downloaded from sources specified in project settings (Git URL / Kaggle Dataset / URL Datasets). These downloaded files are then completely backed up to the project folder on Google Drive. Any changes made by user on these folders also get synced up to Google Drive.
Project Subsequent starts
Every time Clouderizer projects are run subsequently, code, data and out folders are downloaded from Google Drive (instead of original source) with your latest changes. Any changes made by user on these folders get synced up to Google Drive for persistence.