

Your number of CLUSTER_MACHINES should be as many as possible per your account limits, ideally at least a few hundred.
#CELLPROFILER 4 PLUS#
Your APP_NAME variable should be set to the appname set in run_batch_general.py plus _Analysis, ie 2015_10_05_DrugRepurposing_Analysis Your APP_NAME variable should be set to the appname set in run_batch_general.py plus _AssayDev, ie 2015_10_05_DrugRepurposing_AssayDev Your SQS_MESSAGE_VISIBILITY should be 12 hours 720*60 Your number of CLUSTER_MACHINES should be set to the number of plates you have divided by 4 then rounded up, ie 6 for 22 plates Your APP_NAME variable should be set to the appname set in run_batch_general.py plus _Illum, ie 2015_10_05_DrugRepurposing_Illum Your APP_NAME variable should be set to the appname set in run_batch_general.py plus _QC, ie 2015_10_05_DrugRepurposing_QC Your SQS_MESSAGE_VISIBILITY should be short, such as 5*60 (5 minutes) Your number of CLUSTER_MACHINES should be medium-large, ie a hundred or few hundred. Your APP_NAME variable should be set to the appname set in run_batch_general.py plus _Zproj, ie 2015_10_05_DrugRepurposing_Zproj In general, as long as you are running inside a tmux session and it isn’t killed, the monitor should destroy any and all infrastructure created on AWS as part of the running Distributed-CellProfiler, but it is the user’s responsibility to check that this has completed appropriately failure to do so may lead to spot fleets generating charges after all useful work has completed. You need only absolutely change the variables stated above and below for Distributed-CellProfiler to function, but other variables may be useful, such as using a non-default profile, adjusting whether or not you would like to pre-download files and/or use plugins and/or restart only parts of a batch of data. Information on all of these steps is available in the Distributed-CellProfiler wiki. Uncomment the correct step name in your run_batch_general.py file (and ensure all other steps are commented out)Įxecute python3 run.py startCluster files/yourFleetFileName.json, where you have set the name of the fleet file previously created or locatedĮxecute python3 run.py monitor files/APP_NAMESpotFleetRequestId.json, where APP_NAME matches the APP_NAME variable set in step 1. (optional) assay development - see also section 4.6įor each step, the steps you will run will be identical: You may have as many as 5 or as few as 2 CellProfiler steps If running in a fresh clone of Distributed-CellProfiler, you will need to set the AWS_REGION, SSH_KEY_NAME, AWS_BUCKET, and SQS_DEAD_LETTER_QUEUE settings to appropriate settings for your account. Change required parameters in Distributed-CellProfiler’s config file # If running in a fresh clone of Distributed-CellProfiler, you will need to configure a single fleet file, which will be used in all subsequent steps. Configure Distributed-CellProfiler’s fleet file # If following the recommended structures and procedures, none of the not project specific section of the script should need to be adapted, but if you are making changes you may need to.Ĥ.3. Otherwise, you should make sure that the batch file names reflect your batch file names. If you are using pipeline files with the LoadData module and CSVs, you should make sure that the pipeline names reflect your pipeline names (or adjust if not). Platelist should contain a list of plates, comma separated, ie Rows, columns, and sites should reflect the imaging conditions used Topdirname and batchsuffix should match your PROJECT_NAME and BATCH_ID, respectivelyĪppname is typically the same as topdirname, but if that name is long and cumbersome you can create an abbreviated version here (ie 2015_10_05_DrugRepurposing rather than 2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad). The following variables in the project specific stuff section of the script should be configured: Run_batch_general.py can be configured once at the beginning of the run of a batch of data, and then can be run for each step simply by uncommenting the name of the step to run. If you do not wish to use it, you can adjust steps 3 and 4 in the “Run each CellProfiler step” to “Create a job file” and “Execute python3 run.py submitJob jobFileName.json” However, we find it the most efficient way to run numerous pipelines on the same data. Note that run_batch_general is not required the Distributed CellProfiler handbook lays out a number of different ways of creating jobs.
