Skip to content
Snippets Groups Projects
Commit 058ac28b authored by Sebastian Pohl's avatar Sebastian Pohl
Browse files

some reports on unsuccessful notebook repo reproductions attempts

parent b1d9bd70
Branches
No related merge requests found
1. The repo is deprecated and no longer maintained. It has been replaced by a dockerized version.
2. It says that the installation instructions are outdated and prone to errors. Maybe later.
Also needs an installation of many dependencies (themselves not without trouble) such as singularity ...
\ No newline at end of file
2. After installing a number of dependencies the workflow executes, but throws an error:
(latest_snakemake) seb@Ras:~/Fonda/notebook_reproduction/ImcSegmentationSnakemake$ snakemake --use-conda -n --use-singularity
Building DAG of jobs...
MissingInputException in rule download_example_data in line 605 of /home/seb/Fonda/notebook_reproduction/ImcSegmentationSnakemake/workflow/Snakefile:
Missing input files for rule download_example_data:
output: resources/example_data
affected files:
polybox.ethz.ch/index.php/s/mxuWXq98MbYHgq2/download
3. The link https://polybox.ethz.ch/index.php/s/mxuWXq98MbYHgq2 is dead
4. I tried simply deleting the dead link from the config file, leading to error:
ValueErrorin line 602 of /home/seb/Fonda/notebook_reproduction/ImcSegmentationSnakemake/workflow/Snakefile:
not enough values to unpack (expected 2, got 1)
File "/home/seb/Fonda/notebook_reproduction/ImcSegmentationSnakemake/workflow/Snakefile", line 602, in <module>
File "/home/seb/Fonda/notebook_reproduction/ImcSegmentationSnakemake/workflow/Snakefile", line 602, in <dictcomp>
4.1 These are not two resources, but a resource name, meant to be taken from that url
I run into a missing input error I don't know how to fix
......@@ -9,5 +9,5 @@
https://github.com/CarolinaPB/single-cell-data-processing
Well, not that much detail. The interaction described seems to be on a run level:
run the entire workflow once,
look st aht plots (on filesystem or using the notebooks)
look at the plots (on filesystem or using the notebooks)
depending on those results change workflow thresholds in config file
\ No newline at end of file
repos attempted to reproduce: 10
missing input data: 4
suspended reproduction attempt, no data: 1
suspended reproduction attempt, lacking infrastructure: 1
suspended reproduction attempt, software broken/unavailable: 1
suspended: 1
workflow runs, but hasn't been finished: 2
workflow description in repo: 3
CarolinaPB/single-cell-data-processing
jwdebelius/avengers-assemble
Noble-Lab/ppx-workflow
\ No newline at end of file
1. repository cloned
2. I am meant to edit the config file, but have no idea what the paths should be
I seem to be missing input again
also unclear what parameters I should pass to singularity:
How to run
Set parameters in config.yaml
run: snakemake -p --use-singularity --singularity-prefix "resources" --singularity-args "--bind *" --use-conda -j ** all --configfile "config/config.yaml"
Note * : You should provide your own directory for the --bind command so that the data is accesible from the singularity containers.
Note ** : Specify number of available threads here.
3. I tried to run the workflow without specific parameters, but I am missing any input
4. The github repository also notifies us about extremely long running times of workflow
(8 hours for Covid DNA, multiple days for human DNA)
\ No newline at end of file
1. The repo mentions that 60GB disk space is needed to run this workflow.
I will try it on gruenau first. We'll see about the notebooks later.
2. repo cloned, creating conda environment, which takes a long time; trying to also install mamba on gruenau
3. In parallel I downloaded GPWV4 data from the NASA source and created an account for it
account name sebastian_huberlin
4. I registered with https://cds.climate.copernicus.eu/#!/home, installed they key and api
still waiting for the conda environment
5. I continue to be stuck at waiting for the creation of the conda environment
I have installed micromamba and simultaneously 'am trying to install the environment using micromamba now
6. I was able to create the environment using micromamba
then as instructed: pip install -e .
7. Off to try the workflow with snakemake --n 3; that seems deprectated
instead tried snakemake --cores 3
8. I'm running into errors of missing output files;
this could be because the whole workflow requires more storage than I have available on gruenau
9. I will retry by cloning all of this into the bigger gruenau memory location
Here the workflow seems to be able to run
i had to accept terms at: https://cds.climate.copernicus.eu/cdsapp/#!/terms/licence-to-use-copernicus-products
currently the workflow is running
but if it really requires 60GB of data this can take a while
it is taking a while
10. Is it really responsible to do all this computation and incur all of this web traffic?
\ No newline at end of file
1. For now I only read the repository info:
a lot of dependencies have to be manually installed
there is a fairly detailed description of the workflow and the notebook step which comes at the end
reproducing this may not be worth it
suspended until I have looked at all remaining workflows
\ No newline at end of file
......@@ -12,7 +12,7 @@ notebook rule only supplies output to all rule: 22
jdossgollin/2021-TXtreme: hdd_idf, historic_extremes, local_return_period, HDD_pop_weighted_ts
jwdebelius/avengers-assemble: alpha_table, beta_table, compare_taxonomy, real_data_figure
CarolinaPB/single-cell-data-processing: remove_doublets
nesi/papermill_dem0: model_fitting
nesi/papermill_demo: model_fitting
bayraktar1/RPCA: venn_diagrams, alignment_stats, count_stats
notebook rule within workflow, unclear interaction: 2
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment