some reports on unsuccessful notebook repo reproductions attempts

058ac28b · Sebastian Pohl · b1d9bd70 · 058ac28b · 058ac28b · 058ac28b
Commit 058ac28b authored 1 year ago by Sebastian Pohl
--- a/analysis/full_text/results/notebooks/notebook_reproductions/BodenmillerGroup_ImcSegmentationSnakemake.txt
+++ b/analysis/full_text/results/notebooks/notebook_reproductions/BodenmillerGroup_ImcSegmentationSnakemake.txt
 1. The repo is deprecated and no longer maintained. It has been replaced by a dockerized version.

-2. It says that the installation instructions are outdated and prone to errors. Maybe later.
-    Also needs an installation of many dependencies (themselves not without trouble) such as singularity ...
\ No newline at end of file
+2. After installing a number of dependencies the workflow executes, but throws an error:
+(latest_snakemake) seb@Ras:~/Fonda/notebook_reproduction/ImcSegmentationSnakemake$ snakemake --use-conda -n --use-singularity
+Building DAG of jobs...
+MissingInputException in rule download_example_data in line 605 of /home/seb/Fonda/notebook_reproduction/ImcSegmentationSnakemake/workflow/Snakefile:
+Missing input files for rule download_example_data:
+    output: resources/example_data
+    affected files:
+        polybox.ethz.ch/index.php/s/mxuWXq98MbYHgq2/download
+
+3. The link https://polybox.ethz.ch/index.php/s/mxuWXq98MbYHgq2 is dead
+
+4. I tried simply deleting the dead link from the config file, leading to error:
+ValueErrorin line 602 of /home/seb/Fonda/notebook_reproduction/ImcSegmentationSnakemake/workflow/Snakefile:
+not enough values to unpack (expected 2, got 1)
+  File "/home/seb/Fonda/notebook_reproduction/ImcSegmentationSnakemake/workflow/Snakefile", line 602, in <module>
+  File "/home/seb/Fonda/notebook_reproduction/ImcSegmentationSnakemake/workflow/Snakefile", line 602, in <dictcomp>
+
+4.1 These are not two resources, but a resource name, meant to be taken from that url
+    I run into a missing input error I don't know how to fix
--- a/analysis/full_text/results/notebooks/notebook_reproductions/CarolinaPB_single-cell-data-processing.txt
+++ b/analysis/full_text/results/notebooks/notebook_reproductions/CarolinaPB_single-cell-data-processing.txt
@@ -9,5 +9,5 @@
        https://github.com/CarolinaPB/single-cell-data-processing
    Well, not that much detail. The interaction described seems to be on a run level:
        run the entire workflow once,
-        look st aht plots (on filesystem or using the notebooks)
+        look at the plots (on filesystem or using the notebooks)
        depending on those results change workflow thresholds in config file
\ No newline at end of file
--- a/analysis/full_text/results/notebooks/notebook_reproductions/Summary.txt
+++ b/analysis/full_text/results/notebooks/notebook_reproductions/Summary.txt
+repos attempted to reproduce: 10
+missing input data: 4
+suspended reproduction attempt, no data: 1
+suspended reproduction attempt, lacking infrastructure: 1
+suspended reproduction attempt, software broken/unavailable: 1
+suspended: 1
+workflow runs, but hasn't been finished: 2
+
+workflow description in repo: 3
+CarolinaPB/single-cell-data-processing
+jwdebelius/avengers-assemble
+Noble-Lab/ppx-workflow
\ No newline at end of file
--- a/analysis/full_text/results/notebooks/notebook_reproductions/bayraktar1_RPCA.txt
+++ b/analysis/full_text/results/notebooks/notebook_reproductions/bayraktar1_RPCA.txt
+1. repository cloned
+
+2. I am meant to edit the config file, but have no idea what the paths should be
+    I seem to be missing input again
+
+    also unclear what parameters I should pass to singularity:
+How to run
+Set parameters in config.yaml
+run: snakemake -p --use-singularity --singularity-prefix "resources"  --singularity-args "--bind *" --use-conda -j ** all --configfile "config/config.yaml"
+Note * : You should provide your own directory for the --bind command so that the data is accesible from the singularity containers.
+Note ** : Specify number of available threads here.
+
+3. I tried to run the workflow without specific parameters, but I am missing any input
+
+4. The github repository also notifies us about extremely long running times of workflow
+    (8 hours for Covid DNA, multiple days for human DNA)
\ No newline at end of file
--- a/analysis/full_text/results/notebooks/notebook_reproductions/jdossgollin_2021-TXtreme.txt
+++ b/analysis/full_text/results/notebooks/notebook_reproductions/jdossgollin_2021-TXtreme.txt
+1. The repo mentions that 60GB disk space is needed to run this workflow.
+    I will try it on gruenau first. We'll see about the notebooks later.
+
+2. repo cloned, creating conda environment, which takes a long time; trying to also install mamba on gruenau
+
+3. In parallel I downloaded GPWV4 data from the NASA source and created an account for it
+    account name sebastian_huberlin
+
+4. I registered with https://cds.climate.copernicus.eu/#!/home, installed they key and api
+    still waiting for the conda environment
+
+5. I continue to be stuck at waiting for the creation of the conda environment
+    I have installed micromamba and simultaneously 'am trying to install the environment using micromamba now
+
+6. I was able to create the environment using micromamba
+    then as instructed: pip install -e .
+
+7. Off to try the workflow with snakemake --n 3; that seems deprectated
+    instead tried snakemake --cores 3
+
+8. I'm running into errors of missing output files;
+    this could be because the whole workflow requires more storage than I have available on gruenau
+
+9. I will retry by cloning all of this into the bigger gruenau memory location
+    Here the workflow seems to be able to run
+    i had to accept terms at: https://cds.climate.copernicus.eu/cdsapp/#!/terms/licence-to-use-copernicus-products
+    currently the workflow is running
+    but if it really requires 60GB of data this can take a while
+
+    it is taking a while
+
+10. Is it really responsible to do all this computation and incur all of this web traffic?
\ No newline at end of file
--- a/analysis/full_text/results/notebooks/notebook_reproductions/jwdebelius_avengers-assemble.txt
+++ b/analysis/full_text/results/notebooks/notebook_reproductions/jwdebelius_avengers-assemble.txt
+1. For now I only read the repository info:
+    a lot of dependencies have to be manually installed
+    there is a fairly detailed description of the workflow and the notebook step which comes at the end
+        reproducing this may not be worth it
+        suspended until I have looked at all remaining workflows
\ No newline at end of file
--- a/analysis/full_text/results/notebooks/types_of_positions.txt
+++ b/analysis/full_text/results/notebooks/types_of_positions.txt
@@ -12,7 +12,7 @@ notebook rule only supplies output to all rule: 22
    jdossgollin/2021-TXtreme: hdd_idf, historic_extremes, local_return_period, HDD_pop_weighted_ts
    jwdebelius/avengers-assemble: alpha_table, beta_table, compare_taxonomy, real_data_figure
    CarolinaPB/single-cell-data-processing: remove_doublets
-    nesi/papermill_dem0: model_fitting
+    nesi/papermill_demo: model_fitting
    bayraktar1/RPCA: venn_diagrams, alignment_stats, count_stats

 notebook rule within workflow, unclear interaction: 2