-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.xml
509 lines (509 loc) · 134 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Home on Living Systems_</title>
<link>https://livesys.se/</link>
<description>Recent content in Home on Living Systems_</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate>Tue, 12 Nov 2024 10:28:04 +0100</lastBuildDate>
<atom:link href="https://livesys.se/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Learning Genomics and Bioinformatics in 2025</title>
<link>https://livesys.se/posts/learning-genomics-bioinformatics-in-2025/</link>
<pubDate>Tue, 12 Nov 2024 10:28:04 +0100</pubDate>
<guid>https://livesys.se/posts/learning-genomics-bioinformatics-in-2025/</guid>
<description><p><p class="image">
 <img src="ngsintro.png" alt="From the NGS introductory course at SciLifeLab Uppsala in 2015. Photo by Samuel Lampa." />
</p>
</p>
<p>Bioinformatics is growing in the clinical field, and in my job in a clinical
microbiology lab, I&rsquo;m increasingly asked for tips about how to get into
bioinformatics or genomic data science.</p>
<p>As I recently took the plunge into genomics from my PhD field of small
molecular structures and machine learning, as part of getting into my current
position, I thought to write down what I learned in the process.</p></description>
</item>
<item>
<title>Rewrite of Scicommander in Go with much improved algorithm</title>
<link>https://livesys.se/posts/rewrite-of-scicommander-in-go/</link>
<pubDate>Tue, 03 Sep 2024 11:15:02 +0200</pubDate>
<guid>https://livesys.se/posts/rewrite-of-scicommander-in-go/</guid>
<description><p>When I presented a poster about
<a href="https://github.com/samuell/scicommander" target="_blank" rel="noopener">SciCommander</a>
 at the <a href="https://livesys.se/posts/scicommander-0.3/" target="_blank" rel="noopener">Swedish bioinformatics workshop</a>
 last year,
I got a lot of awesome feedback from some great people including Fredrik
Boulund, Johannes Alneberg and others, of which I unfortunately lost the names
(please shout out if you read this!).</p>
<p>(For those new to SciCommander, it is my attempt at creating a tool that can
track complete provenance reports also for ad-hoc shell commands, not just
those included in a pipeline. The grand plan is also to integrate this
provenance tracking with those of popular pipelines, to enable seamless
provenance report generation across pipelines and ad-hoc commands).</p></description>
</item>
<item>
<title>About</title>
<link>https://livesys.se/about/</link>
<pubDate>Fri, 09 Aug 2024 04:18:22 +0200</pubDate>
<guid>https://livesys.se/about/</guid>
<description><p>Right now, this serves as a technical and research blog for me, Samuel Lampa, a
bioinformatician and data engineer in Stockholm, Sweden.</p>
<p>If you want to connect, you can find me on:</p>
<ul>
<li><a href="https://twitter.com/smllmp" target="_blank" rel="noopener">Twitter</a>
</li>
<li><a href="https://www.linkedin.com/in/smllmp" target="_blank" rel="noopener">LinkedIn</a>
</li>
</ul>
<p>By the way, do you need help with 3D CAD project and/or parametric design in
Rhino 3D and Grasshopper?<br>
If so, visit <a href="https://rillabs.com/" target="_blank" rel="noopener">RIL Labs</a>
</p></description>
</item>
<item>
<title>A few notes from the Applied Hologenomics Conference 2024</title>
<link>https://livesys.se/posts/ahc2024/</link>
<pubDate>Fri, 05 Jul 2024 16:23:00 +0200</pubDate>
<guid>https://livesys.se/posts/ahc2024/</guid>
<description><p><p class="image">
 <img src="dsc_4779_2.jpg?height=320px" alt="Projector screen at the conference" />
</p>
</p>
<p>I&rsquo;m just back from the <a href="https://www.appliedhologenomicsconference.eu/" target="_blank" rel="noopener">Applied Hologenomics Conference in 2024</a>

(See also <a href="https://x.com/search?q=%23AHC2024&amp;src=typed_query" target="_blank" rel="noopener">#AHC2024 on Twitter</a>
) in Copenhagen and
thought to reflect a little on the conference and highlight the bits that
particularly stuck with me.</p>
<p>The first thing I want to say is that a paradigm shift is happening
here.</p>
<p>I think what is happening here is a step away from the reductionist view
of the past that goes beyond the systems biology approach that has been
establishing itself during the last 10-20 years.</p></description>
</item>
<item>
<title>We need recipes for common bioinformatics tasks</title>
<link>https://livesys.se/posts/bioinformatics-recipes/</link>
<pubDate>Mon, 27 May 2024 12:44:00 +0200</pubDate>
<guid>https://livesys.se/posts/bioinformatics-recipes/</guid>
<description><p>Ad-hoc tasks in bioinformatics can contain such an immense number of operations
and tasks that need to be performed to achieve a certain goal. Often these are
all individually regarded as rather &ldquo;standard&rdquo; or &ldquo;routine&rdquo;. Despite this,
it is quite hard to find an authoritative set of &ldquo;recipes&rdquo; for how to do such
tasks.</p>
<p>Thus I was starting to think that there needs to be a collection of
bioinformatics &ldquo;recipes&rdquo;. A sort of &ldquo;cookbook&rdquo; for common
bioinformatics tasks.</p></description>
</item>
<item>
<title>Why didn't Go get a breakthrough in bioinformatics (yet)?</title>
<link>https://livesys.se/posts/golang-for-bioinformatics/</link>
<pubDate>Mon, 13 May 2024 17:05:00 +0200</pubDate>
<guid>https://livesys.se/posts/golang-for-bioinformatics/</guid>
<description><p><p class="image">
 <img src="gopherbinfie.jpg" alt="A gopher doing
bioinformatics" />
</p>
</p>
<p>As we are - <a href="https://a16z.com/the-century-of-biology/" target="_blank" rel="noopener">according to some expert
opinions</a>
 - living in the
Century of Biology, I found it interesting to reflect on Go&rsquo;s usage
within the field.</p>
<p>Go has some great features that make it really well suited for biology,
such as:</p>
<ul>
<li>A relatively simple language that can be learned in a short time
even for people without a CS background. This is super important
aspect for biologists.</li>
<li>Fantastic support for cross-compilation into all major computer
architectures and operating systems, as static, self-sufficient
executables making it extremely simple to deploy tools, something
that can&rsquo;t be said about the currently most popular bio language,
Python.</li>
<li>Fantastic support for concurrency and writing code as a set of
parallell operations that streams data between them. Again, as
opposed to Python. More on that later.</li>
<li>A large standard library that contains a lot of common needs, even
for writing user interfaces and web servers.</li>
</ul>
<p>Go has in fact garnered some use for bioinformatics tools over the years, with
some indications that its use is increasing. Examples of popular tools and
toolkits are <a href="https://github.com/shenwei356/seqkit" target="_blank" rel="noopener">SeqKit</a>
 (a veritable <em>swiss
army knife</em> for bioinformatics), the <a href="https://github.com/biogo/biogo" target="_blank" rel="noopener">BioGo
toolkit</a>
, the <a href="https://github.com/pbenner/gonetics" target="_blank" rel="noopener">Gonetics
package</a>
 and lately the <a href="https://github.com/vertgenlab/gonomics" target="_blank" rel="noopener">Gonomics
package</a>
 and finally the
<a href="https://github.com/bebop/poly" target="_blank" rel="noopener">Poly</a>
 package for synthetic biology. And this
is besides heavy use in infrastructure-oriented projects like the
<a href="https://www.benthos.dev" target="_blank" rel="noopener">Benthos</a>
 stream processing tool, the <a href="https://github.com/grailbio/reflow" target="_blank" rel="noopener">Reflow pipeline
tool</a>
 and <a href="https://pachyderm.io/" target="_blank" rel="noopener">Pachyderm orchestration
suite</a>
.</p></description>
</item>
<item>
<title>SciPipe used at NASA Glenn Research Center</title>
<link>https://livesys.se/posts/scipipe-at-nasa/</link>
<pubDate>Sat, 13 Apr 2024 12:00:00 +0200</pubDate>
<guid>https://livesys.se/posts/scipipe-at-nasa/</guid>
<description><p><p class="image">
 <img src="nasa-paper.png" alt="Nasa paper screenshot" class="align_right" />
</p>
I was happy to see the
<a href="https://www.nature.com/articles/s41526-024-00385-5" target="_blank" rel="noopener">publication finally going
online</a>
, of work done at
<a href="https://www.nasa.gov/glenn/" target="_blank" rel="noopener">NASA Glenn Research Center</a>
, where
<a href="https://scipipe.org" target="_blank" rel="noopener">SciPipe</a>
 has been used to process and track provenance of
the analyses, &ldquo;Modeling the impact of thoracic pressure on intracranial
pressure&rdquo;. I&rsquo;ve known the work existed for a couple of years, after getting
some <a href="https://github.com/scipipe/scipipe/commits?author=dwmunster" target="_blank" rel="noopener">extraordinarily useful contributions from
Drayton</a>
 fixing
some bugs I&rsquo;m not sure I&rsquo;d ever find otherwise, but cool to now also see it
published! Also a big kudos for acknowledging the tool in the paper. Not all
that common to do, but a gesture that is deeply appreciated.</p></description>
</item>
<item>
<title>Debugging inside Jinja templates using pdb/ipdb</title>
<link>https://livesys.se/posts/debugging-inside-jinja-templates-using-pdb-ipdb/</link>
<pubDate>Mon, 04 Mar 2024 14:42:00 +0100</pubDate>
<guid>https://livesys.se/posts/debugging-inside-jinja-templates-using-pdb-ipdb/</guid>
<description><p>I&rsquo;m working on a static reporting tool using the Jinja2 templating
engine for Python.</p>
<p>I was trying to figure out a way to enter into the Jinja templating code
with the pdb/ipdb commandline debugger.</p>
<p>I tried creating an <code>.ipdbrc</code> file in my local directory with the line:</p>
<pre><code>path/to/template.html:&lt;lineno&gt;
</code></pre>
<p>&hellip; but that didn&rsquo;t work.</p>
<p>What worked was to figure out <a href="https://github.com/pallets/jinja/blob/main/src/jinja2/environment.py#L1301" target="_blank" rel="noopener">the line that
says</a>
:</p>
<pre><code>return self.environment.concat(self.root_render_func(ctx))
</code></pre>
<p>&hellip; inside the the jinja codebase, and put a breakpoint on that (which
for me was on line 1299, but might vary depending on version):</p></description>
</item>
<item>
<title>SciCommander - track provenance of any shell command</title>
<link>https://livesys.se/posts/scicommander-0.3/</link>
<pubDate>Thu, 09 Nov 2023 18:38:00 +0100</pubDate>
<guid>https://livesys.se/posts/scicommander-0.3/</guid>
<description><p>I haven&rsquo;t written much about a new tool I&rsquo;ve been working on in some
extra time: <a href="https://github.com/samuell/scicommander" target="_blank" rel="noopener">SciCommander</a>
.</p>
<p>I just presented a poster about it at the <a href="https://sbw2023.nu" target="_blank" rel="noopener">Swedish Bioinformatics
Workshop 2023</a>
 , so perhaps let me first present you
the poster instead of re-iterating what it is (click to view large
version):</p>
<p><a href="scicmd-poster-export-007.png"><p class="image">
 <img src="scicmd-poster-export-007.png" alt="" />
</p>
</a>
</p>
<h3 id="new-version-not-requiring-running-the-scicmd-command">New version not requiring running the scicmd command</h3>
<p>I got a lot of great feedback from numerous people at the conference,
most of who pointed out that it would be great if one could start
scicommander as a kind of subshell, inside which one can run commands as
usual, instead of running them via the <code>scicmd -c</code> command.</p></description>
</item>
<item>
<title>Troubleshooting Nextflow pipelines</title>
<link>https://livesys.se/posts/troubleshooting-nextflow-pipelines/</link>
<pubDate>Wed, 01 Nov 2023 11:47:00 +0100</pubDate>
<guid>https://livesys.se/posts/troubleshooting-nextflow-pipelines/</guid>
<description><!-- raw HTML omitted -->
<p>We have been evaluating Nextflow before in my work at
<a href="https://pharmb.io" target="_blank" rel="noopener">pharmb.io</a>
, but that was before
<a href="https://www.nextflow.io/docs/latest/dsl1.html" target="_blank" rel="noopener">DSL2</a>
 and the support
for re-usable modules (which was one reason we needed to develop our own
tools to support our challenges, as explained <a href="https://doi.org/10.1093/gigascience/giz044" target="_blank" rel="noopener">in the
paper</a>
). Thus, there&rsquo;s
definitely some stuff to get into.</p>
<p>Based on my years in bioinformatics and data science, I&rsquo;ve seen that
the number one skill that you need to develop is to be able to
effectively troubleshoot things, because things will invariably fail in
all kinds of ways. And in the process, you will probably learn a lot
about the technology stack you are using.</p></description>
</item>
<item>
<title>Random notes from installing Debian 11 with separate accounts for work and private</title>
<link>https://livesys.se/posts/installing-and-configuring-debian-11-into-a-great-experience/</link>
<pubDate>Thu, 24 Mar 2022 04:50:00 +0100</pubDate>
<guid>https://livesys.se/posts/installing-and-configuring-debian-11-into-a-great-experience/</guid>
<description><!-- raw HTML omitted -->
<p>See especially the end for info about how to set up a nice integration
between the work and private accounts, such that one can e.g.
occasionally start the mail client or web browser from the private
account from the work one etc.</p>
<h3 id="caveats-when-installing-debian-11">Caveats when installing Debian 11</h3>
<ul>
<li>Make sure that an EFI partition is created (when I manually modified
the partition table I accidentally deleted it, and had to reinstall
to get it created properly again).</li>
<li>Had to turn off safe boot in BIOS</li>
<li>Had to set SATA settings to AHCI</li>
<li>Had to manually create a boot option with the path to
<pre tabindex="0"><code>\EFI\debian\grubx64.efi
</code></pre>in BIOS settings</li>
</ul>
<h3 id="instructions-for-configuring-debian-to-my-liking">Instructions for configuring Debian to my liking</h3>
<ul>
<li>Add current user to sudo
<ul>
<li>Switch user to root:
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>usermod -aG sudo &lt;username&gt;
</span></span></code></pre></div></li>
<li>Log out and back the user</li>
</ul>
</li>
<li>Uncomment CDROM remotes in /etc/apt/sources.list</li>
<li>Change default search engine in firefox to duckduckgo</li>
<li>Replace applications menu with whiskers menu</li>
<li>Install some packages:
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sudo apt install vim git tig tmux curl tree rsync gparted
</span></span><span style="display:flex;"><span> ecryptfs-utils redshift bash-completion bluez blueman
</span></span></code></pre></div></li>
<li>Clone rc folder:
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>git clone https://github.com/samuell/rc.git
</span></span></code></pre></div></li>
<li>Link rc files:
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>ln -s rc/.<span style="color:#000">{</span>b,v,t<span style="color:#000">}</span>* .
</span></span></code></pre></div></li>
<li>Create empty <code>.bash_aliases_local</code>:
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>touch ~/.bash_aliases_local
</span></span></code></pre></div></li>
<li>Activate in bashrc:
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>cat <span style="color:#c41a16">&#39;if [ -f ~/.bashrc_mods ]; then . ~/.bashrc_mods; fi&#39;</span> &gt;&gt; ~/.bashrc <span style="color:#000">&amp;&amp;</span> <span style="color:#a90d91">source</span> ~/.bashrc
</span></span></code></pre></div></li>
<li>Install vim-plug:
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>curl -fLo ~/.vim/autoload/plug.vim --create-dirs
</span></span><span style="display:flex;"><span> https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim
</span></span></code></pre></div></li>
<li>Run <code>:PlugInstall</code> in vim:
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>vim -c <span style="color:#c41a16">&#34;:PlugInstall&#34;</span> -c <span style="color:#c41a16">&#34;qa!&#34;</span>
</span></span></code></pre></div></li>
<li>In &ldquo;Keyboard settings &gt; Application shortcuts&rdquo; add two
shortcuts:
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>xfce4-popup-whiskermenu <span style="color:#177500"># Ctrl+Esc</span>
</span></span><span style="display:flex;"><span>xfce4-terminal --drop-down <span style="color:#177500"># Super+Space</span>
</span></span></code></pre></div></li>
<li>Open the terminal with Win+Space, and configure:
<ul>
<li>Set width to 100%</li>
<li>Uncheck always show tabs</li>
<li>Uncheck show menubar in new windows</li>
<li>Configure color to more pastel like colors according to my new method</li>
</ul>
</li>
<li>In Appearance&hellip;
<ul>
<li>Set style to &ldquo;Adwaita-dark&rdquo;</li>
<li>Set fonts to &ldquo;Adwaita&rdquo;</li>
<li>Set main font to &ldquo;Sans Regular 8&rdquo;</li>
<li>Let main monospace font be &ldquo;Monospace Regular 10&rdquo;</li>
</ul>
</li>
<li>In terminal preferences, set font to &ldquo;Monospace Regular 10&rdquo;</li>
<li>In Window manager, set shortcuts:
<ul>
<li>Tile window up/down/left/right: Super up/down/left/right</li>
<li>Move window Ctrl+Alt+F</li>
<li>Resize window Ctrl+Alt+R</li>
</ul>
</li>
<li>Install and configure Brave browser</li>
<li>Encrypt home folder</li>
</ul>
<h3 id="set-up-separate-work-and-private-accounts">Set up separate work and private accounts</h3>
<ul>
<li>Create the separate accounts</li>
<li>Encrypt folder of both</li>
</ul>
<p>In order to run private commands from the work account and vice versa:</p></description>
</item>
<item>
<title>Installing Qubes OS</title>
<link>https://livesys.se/posts/installing-qubes-os/</link>
<pubDate>Wed, 29 Dec 2021 06:59:00 +0100</pubDate>
<guid>https://livesys.se/posts/installing-qubes-os/</guid>
<description><p>I just switched to <a href="https://www.qubes-os.org/" target="_blank" rel="noopener">Qubes OS</a>
 as operating
system on my main work laptop (a Dell Latitude). Or in fact, one of the
reasons was to be able to combine work and private hobby coding
projects, that&rsquo;s increasinbly been happening on the same machine.
Anyways, these are my experiences and notes, as a way to document
caveats and quirks in case I need to do this again, while possibly also
being of use for others.</p></description>
</item>
<item>
<title>Composability in functional and flow-based programming</title>
<link>https://livesys.se/posts/composability-in-functional-and-flow-based-programming/</link>
<pubDate>Fri, 12 Feb 2021 16:24:00 +0100</pubDate>
<guid>https://livesys.se/posts/composability-in-functional-and-flow-based-programming/</guid>
<description><!-- raw HTML omitted -->
<p>An area where I&rsquo;m not so happy with some things I&rsquo;ve seen in FP, is
composability.</p>
<p>In my view, a well designed system or langauge should make functions (or
other smallest unit of computation) more easily composable, not less.</p>
<p>What strikes me as one of the biggest elephants in the room regarding
FP, is that typical functions compose fantastically as long as you are
working with a single input argument, and a single output for each
function application, but as soon as you start taking multiple input
arguments and returned outputs though, you tend to end up with very
messy trees of function application. Even handy techniques such as
currying tend to get overly complex if you want to handle all the
possible downstream dataflow paths in a structured way.</p></description>
</item>
<item>
<title>Crystal: Go-like concurrency with easier syntax</title>
<link>https://livesys.se/posts/crystal-concurrency-easier-syntax-than-golang/</link>
<pubDate>Sat, 05 Sep 2020 15:36:00 +0200</pubDate>
<guid>https://livesys.se/posts/crystal-concurrency-easier-syntax-than-golang/</guid>
<description><p>I have been playing around a lot with concurrency in Go over the years,
resulting in libraries such as <a href="https://scipipe.org/" target="_blank" rel="noopener">SciPipe</a>
,
<a href="http://flowbase.org/" target="_blank" rel="noopener">FlowBase</a>
 and
<a href="https://github.com/rdfio/rdf2smw/" target="_blank" rel="noopener">rdf2smw</a>
. My main motivation for
looking into Go has been the possibility to use it as a more performant,
scaleable and type-safe alternative to Python for data heavy scripting
tasks in bioinformatics and other fields I&rsquo;ve been dabbling in.
Especially as it makes it so easy to write concurrent and parallel code
in it. Be warned that this context is surely giving me some biases.</p></description>
</item>
<item>
<title>Viewing Go test coverage in the browser with one command</title>
<link>https://livesys.se/posts/go-test-coverage-in-browser/</link>
<pubDate>Thu, 20 Aug 2020 23:47:00 +0200</pubDate>
<guid>https://livesys.se/posts/go-test-coverage-in-browser/</guid>
<description><p>Go has some really nice tools for running tests and analyzing code. One
of these functionalities is that you can generate coverage information
when running tests, that can later be viewed in a browser using the
<code>go tool cover</code> command. It turns out though, since doing it requires
executing multiple commands after each other, it might be hard to
remember the exact commands.</p>
<p>To this end, I created a bash alias that does everything in one command,
<code>gocov</code>. It looks like this (to be placed in your <code>~/.bash_aliases</code> file
or similar:</p></description>
</item>
<item>
<title>Creating a static copy of a Drupal, Wordpress or other CMS website</title>
<link>https://livesys.se/posts/static-copy-of-cms-website/</link>
<pubDate>Thu, 20 Aug 2020 17:42:00 +0200</pubDate>
<guid>https://livesys.se/posts/static-copy-of-cms-website/</guid>
<description><!-- raw HTML omitted -->
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>wget -P . -mpck --html-extension -e <span style="color:#000">robots</span><span style="color:#000">=</span>off --wait 0.5 &lt;URL&gt;
</span></span></code></pre></div><p>To understand the flags, you can check <code>man wget</code> of course, but some
explanations follow here:</p>
<ul>
<li>-P - Tell where to store the site</li>
<li>-m - Create a mirror</li>
<li>-p - Download all the required files (.css, .js) needed to properly
render the page</li>
<li>-c - Continue getting partially downloaded files</li>
<li>-k - Convert links to enable local viewing</li>
<li>&ndash;html-extension - Add the .html extension after file names. This
is important since when serving the plain files, a web server such
as NGinx need the .html extension to know that the files should be
sent directly to the user&rsquo;s browser, not offered as a file to
download. See below for how to redirect from old to new links.</li>
<li>-e robots=off - Don&rsquo;t read the robots.txt file. Not sure exactly
how this one works, but I got a lot of errors if not including it.</li>
<li>&ndash;wait 0.5 - It is better to not overwhelm the web server where
your site is hosted, by waiting a little between each page download.</li>
</ul>
<p>After finishing this command, you will have a folder with static
HTML-files and other files, that you can just upload to your web server
instead of your CMS.</p></description>
</item>
<item>
<title>Basic PUB/SUB connection with ZeroMQ in Python</title>
<link>https://livesys.se/posts/pub-sub-with-zeromq-in-python/</link>
<pubDate>Wed, 13 Nov 2019 11:56:00 +0100</pubDate>
<guid>https://livesys.se/posts/pub-sub-with-zeromq-in-python/</guid>
<description><p><a href="https://zeromq.org" target="_blank" rel="noopener">ZeroMQ</a>
 is a great way to quickly and simply send messages
between multiple programs running on the same or different computers. It is
very simple and robust since it doesn&rsquo;t need any central server. Instead it
talks directly between the programs through sockets, TCP-connections or
similar.</p>
<p>ZeroMQ has client libraries for basically all commonly used programming
languages, but when testing out that a connection works between e.g. two
different machines, it might be good to keep things simple and test just
the connection, as simply as possible. For this purpose, I have came to
use the following two python scripts, where one sets up a &ldquo;publisher&rdquo;,
and the other a &ldquo;subscriber&rdquo; process. I&rsquo;m documenting them here since
I tend to forget the syntax from time to time, and also some details,
such as that you have to make sure the subscriber subscribes to one or
all topics.</p></description>
</item>
<item>
<title>Table-driven tests in C#</title>
<link>https://livesys.se/posts/table-driven-tests-in-csharp/</link>
<pubDate>Sat, 02 Nov 2019 21:24:00 +0100</pubDate>
<guid>https://livesys.se/posts/table-driven-tests-in-csharp/</guid>
<description><p>Folks in the Go community have championed so called table-driven tests
(see e.g. <a href="https://dave.cheney.net/2019/05/07/prefer-table-driven-tests" target="_blank" rel="noopener">this post by Dave
Cheney</a>

and the <a href="https://github.com/golang/go/wiki/TableDrivenTests" target="_blank" rel="noopener">Go wiki</a>
)
as a way to quickly and easily writing up a bunch of complete test cases
with inputs and corresponding expected outputs, and looping over them to
execute the function being tested. In short, the idea is to suggest a
maximally short and convenient syntax to do this.</p>
<p>For example, given that we have a function like this in mylibrary.go:</p></description>
</item>
<item>
<title>SciPipe paper published in GigaScience</title>
<link>https://livesys.se/posts/scipipe-paper-published-in-gigascience/</link>
<pubDate>Sat, 27 Apr 2019 14:48:00 +0200</pubDate>
<guid>https://livesys.se/posts/scipipe-paper-published-in-gigascience/</guid>
<description><p>We just wanted to share that the paper on our Go-based workflow library,
SciPipe, was just published in GigaScience:</p>
<p><a href="https://doi.org/10.1093/gigascience/giz044" target="_blank" rel="noopener"><p class="image">
 <img src="selection_999_198.png" alt="" />
</p>
</a>
</p>
<h2 id="abstract">Abstract</h2>
<h3 id="background">Background</h3>
<p>The complex nature of biological data has driven the development of
specialized software tools. Scientific workflow management systems
simplify the assembly of such tools into pipelines, assist with job
automation, and aid reproducibility of analyses. Many contemporary
workflow tools are specialized or not designed for highly complex
workflows, such as with nested loops, dynamic scheduling, and
parametrization, which is common in, e.g., machine learning.</p></description>
</item>
<item>
<title>Structured Go-routines or framework-less Flow-Based Programming in Go</title>
<link>https://livesys.se/posts/structured-go-routines-or-framework-less-flow-based-programming-in-go/</link>
<pubDate>Sat, 02 Mar 2019 13:52:00 +0100</pubDate>
<guid>https://livesys.se/posts/structured-go-routines-or-framework-less-flow-based-programming-in-go/</guid>
<description><p>I was so happy the other day to find someone else who found the great
benefits of a little pattern for how to structure pipeline-heavy
programs in Go, which I described in a few posts before. I have been
surprised to not find more people using this kind of pattern, which has
been so extremely helpful to us, so I thought to take this opportunity
to re-iterate it again, in the hopes that more people might get aware of
it.</p></description>
</item>
<item>
<title>Setting up a reasonable and light-weight Linux-like (non-WSL) terminal environment on Windows</title>
<link>https://livesys.se/posts/linux-like-non-wsl-terminal-env-on-windows/</link>
<pubDate>Thu, 29 Nov 2018 16:36:00 +0100</pubDate>
<guid>https://livesys.se/posts/linux-like-non-wsl-terminal-env-on-windows/</guid>
<description><p>I was looking for was a no-fuss, lightweight, robust and as simple as
possible solution to running my normal Bash-based workflow inside the
main Windows filesystem, interacting with the Windows world. Turns out
there are some solutions. Read on for more info on that.</p>
<h2 id="windows-subsystem-for-linux-too-heavy">Windows Subsystem for Linux too heavy</h2>
<p>First, I must mention the impressive work by Microsoft on the <a href="https://docs.microsoft.com/en-us/windows/wsl/about" target="_blank" rel="noopener">Windows
Subsystem for Linux (aka.
WSL)</a>
, which more or
less lets you run an almost full-blown distribution of popular Linux
distros like Ubuntu and Fedora. WSL is awesome, but also kind of heavy,
easily taking something closer to an hour to install. It also has some
odd behaviours like opening files in Windows from WSL will give you
troubles with line endings (as you might know Windows uses <code>\r\n</code> for
line endings while Linux uses just <code>\n</code>). Finally, the default bash
terminal for WSL did not even have zoomable text. All in all, this made
it clear that this is not the light-weight simple solution I was looking
for. WSL is awesome, but too heavy that I will be comfortable to quickly
install it on any Windows machine I need to spend time working on.</p></description>
</item>
<item>
<title>Linked Data Science - For improved understandability of computer-aided research</title>
<link>https://livesys.se/posts/linked-data-science/</link>
<pubDate>Fri, 21 Sep 2018 01:43:00 +0200</pubDate>
<guid>https://livesys.se/posts/linked-data-science/</guid>
<description><p><em>This is an excerpt from the &ldquo;future outlook&rdquo; section of my thesis
titled <a href="http://uu.diva-portal.org/smash/record.jsf?pid=diva2%3A1242336&amp;dswid=2522" target="_blank" rel="noopener">&ldquo;Reproducible Data Analysis in Drug Discovery with Scientific
Workflows and the Semantic
Web&rdquo;</a>

(click for the open access full text), which aims to provide various
putative ways towards improved reproducibility, understandability and
verifiability of computer-aided research.</em></p>
<hr>
<p><a href="linkeddatascience.png"><p class="image">
 <img src="linkeddatascience.png" alt="" />
</p>
</a>
</p>
<p>Historically, something of a divide has developed between the metadata
rich datasets and approaches in the world of Semantic
Web/Ontologies/Linked Data, versus in the Big Data field in particular,
which has been at least initially mostly focused on large unstructured
datasets.</p></description>
</item>
<item>
<title>Preprint on SciPipe - Go-based scientific workflow library</title>
<link>https://livesys.se/posts/scipipe-preprint/</link>
<pubDate>Thu, 02 Aug 2018 01:01:00 +0200</pubDate>
<guid>https://livesys.se/posts/scipipe-preprint/</guid>
<description><p>A pre-print for our Go-based workflow libarary
<a href="http://scipipe.org" target="_blank" rel="noopener">SciPipe</a>
, is out, with the title <em><a href="https://www.biorxiv.org/content/early/2018/08/01/380808" target="_blank" rel="noopener">SciPipe - A
workflow library for agile development of complex and dynamic
bioinformatics
pipelines</a>
,</em>
co-authored by me and colleagues at <a href="https://pharmb.io/" target="_blank" rel="noopener">pharmb.io</a>
:
<a href="https://pharmb.io/people/dahlo/" target="_blank" rel="noopener">Martin Dahlö</a>
, <a href="https://pharmb.io/people/jonalv/" target="_blank" rel="noopener">Jonathan
Alvarsson</a>
 and <a href="https://pharmb.io/people/olas/" target="_blank" rel="noopener">Ola
Spjuth</a>
. Access it
<a href="https://www.biorxiv.org/content/early/2018/08/01/380808" target="_blank" rel="noopener">here</a>
.</p>
<p><a href="https://www.biorxiv.org/content/early/2018/08/01/380808" target="_blank" rel="noopener"><p class="image">
 <img src="selection_864.png" alt="" />
</p>
</a>
</p>
<p>It has been more than three years since the first commit on the <a href="https://github.com/scipipe/scipipe" target="_blank" rel="noopener">SciPipe
Git repository</a>
 in March, 2015, and
development has been going in various degrees of intensity during these
years, often besides other duties at <a href="https://pharmb.io/" target="_blank" rel="noopener">pharmb.io</a>
 and
<a href="https://nbis.se/" target="_blank" rel="noopener">NBIS</a>
, and often at a lower pace than I might have
wished. On the other hand, this might also have helped to let design
ideas mature well before implementing them.</p></description>
</item>
<item>
<title>Make your commandline tool workflow friendly</title>
<link>https://livesys.se/posts/make-your-commandline-tool-workflow-friendly/</link>
<pubDate>Fri, 25 May 2018 23:59:00 +0200</pubDate>
<guid>https://livesys.se/posts/make-your-commandline-tool-workflow-friendly/</guid>
<description><p><em>Update (May 2019):</em> A paper incorporating the below considerations is published:</p>
<p>Björn A Grüning, Samuel Lampa, Marc Vaudel, Daniel Blankenberg, &ldquo;<a href="https://doi.org/10.1093/gigascience/giz054" target="_blank" rel="noopener">Software
engineering for scientific big data
analysis</a>
&rdquo; GigaScience, Volume 8,
Issue 5, May 2019, giz054, <a href="https://doi.org/10.1093/gigascience/giz054" target="_blank" rel="noopener">https://doi.org/10.1093/gigascience/giz054</a>
</p>
<hr>
<p>There are a number of pitfalls that can make a commandline program
really hard to integrate into a workflow (or &ldquo;pipeline&rdquo;) framework.
The reason is that many workflow tools use output file paths to keep
track of the state of the tasks producing these files. This is done for
example to know which tasks are finished and can be skipped upon a
re-run, and which are not.</p></description>
</item>
<item>
<title>To make computational lab note-taking happen, make the journal into a todo-list (a "Todournal")</title>
<link>https://livesys.se/posts/todournal/</link>
<pubDate>Fri, 13 Apr 2018 16:15:00 +0200</pubDate>
<guid>https://livesys.se/posts/todournal/</guid>
<description><h2 id="good-lab-note-taking-is-hard">Good lab note-taking is hard</h2>
<p>Good note-taking is in my opinion as important for computational
research as for wet lab research. For computational research it is much
easier though to forget doing it, since you might not have a physical
notebook lying on your desk staring at you, but rather might need to
open a specific software or file, to write the notes. I think this is
one reason why lab note taking seems to happen a lot less among
computational scientists than among wet lab ditto.</p></description>
</item>
<item>
<title>Semantic Web ❤ Data Science? My talk at Linked Data Sweden 2018</title>
<link>https://livesys.se/posts/semantic-web-data-science-my-talk-at-linked-data-sweden-2018/</link>
<pubDate>Tue, 10 Apr 2018 11:40:00 +0200</pubDate>
<guid>https://livesys.se/posts/semantic-web-data-science-my-talk-at-linked-data-sweden-2018/</guid>
<description><p>During the last months, I have had the pleasure work together with
<a href="https://twitter.com/matthiaspalmer" target="_blank" rel="noopener">Matthias Palmér</a>
 (<a href="http://metasolutions.se/" target="_blank" rel="noopener">MetaSolutions
AB</a>
) and <a href="https://twitter.com/DataDrivenDorea" target="_blank" rel="noopener">Fernanda
Dórea</a>
 (<a href="http://www.sva.se/" target="_blank" rel="noopener">National Veterinary
Institute</a>
), to prepare for and organize this
year&rsquo;s version of the annual <a href="http://lankadedata.se/LDSV/2018/" target="_blank" rel="noopener">Linked Data Sweden
event</a>
, which this year was held in
Uppsala hosted by the <a href="https://www.scilifelab.se/data/" target="_blank" rel="noopener">SciLifeLab Data
Centre</a>
.</p>
<p>Thanks to engaged speakers and attendees, it turned into an interesting
day with great discussions, new contacts, and a lot of new impressions
and insights.</p></description>
</item>
<item>
<title>Parsing DrugBank XML (or any large XML file) in streaming mode in Go</title>
<link>https://livesys.se/posts/parsing-drugbank-xml-or-any-large-xml-file-in-streaming-mode-in-go/</link>
<pubDate>Thu, 15 Mar 2018 15:19:00 +0100</pubDate>
<guid>https://livesys.se/posts/parsing-drugbank-xml-or-any-large-xml-file-in-streaming-mode-in-go/</guid>
<description><p>I had a problem in which I thought I needed to parse the full
<a href="https://www.drugbank.ca" target="_blank" rel="noopener">DrugBank</a>
 dataset, which comes as a <a href="https://www.drugbank.ca/releases/5-0-11/downloads/all-full-database" target="_blank" rel="noopener">(670MB) XML file</a>
</p>
<p>(For open access papers describing DrugBank, see: <!-- raw HTML omitted -->[1]<!-- raw HTML omitted -->, <!-- raw HTML omitted -->[2]<!-- raw HTML omitted -->, <!-- raw HTML omitted -->[3]<!-- raw HTML omitted --> and <!-- raw HTML omitted -->[4]<!-- raw HTML omitted -->).<!-- raw HTML omitted --> It turned out what I needed was available as CSV
files under &ldquo;<a href="https://www.drugbank.ca/releases/latest#structures" target="_blank" rel="noopener">Structure External
Links</a>
&rdquo;. There is probably
still some other uses of this approach though, as the XML version of DrugBank
seems to contain a lot more information in a single format. And in any case,
this forced me to figure out how to parse large XML files in a streaming
fashion in Go, as older tools like
<a href="http://xmlstar.sourceforge.net/" target="_blank" rel="noopener">XMLStarlet</a>
 chokes for many minutes upon the
DrugBank file (trying to read it all into memory?), killing any attempt at an
iterative development cycle. And, it turns out Go&rsquo;s support for streaming XML
parsing is just great!</p></description>
</item>
<item>
<title>Equation-centric dataflow programming in Go</title>
<link>https://livesys.se/posts/equation-centric-dataflow-programming-in-go/</link>
<pubDate>Wed, 27 Dec 2017 14:05:00 +0100</pubDate>
<guid>https://livesys.se/posts/equation-centric-dataflow-programming-in-go/</guid>
<description><h2 id="mathematical-notation-and-dataflow-programming">Mathematical notation and dataflow programming</h2>
<p>Even though computations done on computers are very often based on some
type of math, it is striking that the notation used in math to express
equations and relations is not always very readily converted into
programming code. Outside of purely <a href="https://en.wikipedia.org/wiki/Symbolic_programming" target="_blank" rel="noopener">symbolic
programming</a>

languages like <a href="http://www.sagemath.org/" target="_blank" rel="noopener">sage math</a>
 or the
(proprietary) <a href="https://www.wolfram.com/language" target="_blank" rel="noopener">Wolfram language</a>
,
there seem to always be quite a divide between the mathematical notation
and the numerical implementation.</p></description>
</item>
<item>
<title>What is a scientific (batch) workflow?</title>
<link>https://livesys.se/posts/what-is-a-scientific-batch-workflow/</link>
<pubDate>Thu, 07 Dec 2017 00:57:00 +0100</pubDate>
<guid>https://livesys.se/posts/what-is-a-scientific-batch-workflow/</guid>
<description><h2 id="dependency-graph-in-luigi---a-dag-representing-tasks-not-processes-or-workflow-stepsdependencygraphnew_without_shadow-1pngdependencygraphnew_without_shadow-1png"><a href="dependencygraphnew_without_shadow-1.png"><p class="image">
 <img src="dependencygraphnew_without_shadow-1.png" alt="Dependency graph in Luigi - A DAG representing tasks (not processes or workflow steps)" />
</p>
</a>
</h2>
<h2 id="workflows-and-dags---confusion-about-the-concepts">Workflows and DAGs - Confusion about the concepts</h2>
<p><a href="https://twitter.com/joergenbr" target="_blank" rel="noopener">Jörgen Brandt</a>
 <a href="https://twitter.com/joergenbr/status/907626987333746688" target="_blank" rel="noopener">tweeted a
comment</a>
 that
got me thinking again on something I&rsquo;ve pondered a lot lately:</p>
<blockquote>
<p>&ldquo;A workflow is a DAG.&rdquo; is really a weak definition. That&rsquo;s like
saying &ldquo;A love letter is a sequence of characters.&rdquo; representation ≠
meaning</p>
<p>&ndash; <a href="https://twitter.com/joergenbr/status/907626987333746688" target="_blank" rel="noopener">@joergenbr</a>
</p>
</blockquote>
<p>Jörgen makes a good point. A <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph" target="_blank" rel="noopener">Directed Acyclic Graph
(DAG)</a>
 does not by
any means capture the full semantic content included in a computational
workflow. I think <a href="http://www.worldscientific.com/doi/pdf/10.1142/9789814508728_0001" target="_blank" rel="noopener">Werner Gitt&rsquo;s <em>universal information</em>
model</a>

is highly relevant here, suggesting that information comes in at least
five abstraction layers: statistics (signals, number of symbols), syntax
(set of symbols, grammar), semantics (meaning), pragmatics (action),
apobetics (purpose, result). A DAG seems to cover the syntax and
semantics layers, leaving out three layers out of five.</p></description>
</item>
<item>
<title>Go is growing in bioinformatics workflow tools</title>
<link>https://livesys.se/posts/golang-growing-in-bioinformatics-workflows/</link>
<pubDate>Fri, 10 Nov 2017 12:54:00 +0100</pubDate>
<guid>https://livesys.se/posts/golang-growing-in-bioinformatics-workflows/</guid>
<description><p><p class="image">
 <img src="gopher_thinking_workflows.png" alt="Gopher thinking with logos of different workflow tools in the air" />
</p>
</p>
<ul>
<li><strong>TL;DR: We wrote a post on gopherdata.io, about the growing ecosystem of
Go-based workflow tools in bioinformatics. <a href="https://gopherdata.io/post/more_go_based_workflow_tools_in_bioinformatics/" target="_blank" rel="noopener">Go read it
here</a>
</strong></li>
</ul>
<p>It is interesting to note how Google&rsquo;s <a href="https://golang.org/" target="_blank" rel="noopener">Go programming
language</a>
 seems to increase in popularity in
bioinformatics.</p>
<p>Just to give a sample of some of the Go based bioinformatics tools I&rsquo;ve
stumbled upon, there is since a few years back, the <a href="https://github.com/biogo/biogo" target="_blank" rel="noopener">biogo
library</a>
, providing common functionality
for bioinformatics tasks. It was recently reviewed in two great blog
posts (<a href="https://medium.com/@boti_ka/a-gentle-introduction-to-b%C3%ADogo-part-i-65dbd40e31d4" target="_blank" rel="noopener">part
I</a>
,
<a href="https://medium.com/@boti_ka/a-gentle-introduction-to-b%C3%ADogo-part-ii-1f0df1cf72f0" target="_blank" rel="noopener">part
II</a>
).
Further, Brent Pedersen also wrote a little collection of Go based
bioinfo tools, compiled down into a single static binary, called
<a href="https://github.com/brentp/goleft" target="_blank" rel="noopener">goleft</a>
, and finally, I there is the
<a href="https://github.com/exascience/elprep" target="_blank" rel="noopener">elprep tool</a>
, used to prepare
.sam/.bam/.cram files for variant calling, which was published in a
<a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132868" target="_blank" rel="noopener">PLoS one
paper</a>
.
Just a few examples.</p></description>
</item>
<item>
<title>The frustrating state of note taking tools</title>
<link>https://livesys.se/posts/my-frustration-with-the-state-of-note-taking-tools/</link>
<pubDate>Tue, 07 Nov 2017 18:45:00 +0100</pubDate>
<guid>https://livesys.se/posts/my-frustration-with-the-state-of-note-taking-tools/</guid>
<description><p>One year left to the dissertation (we hope) and now turning from mostly
software development into more of data analysis and needing to read up quite a
pile of books and papers on my actual topic, pharmaceutical bioinformatics.
With this background, I&rsquo;m feel forced to ponder ways to improving my note
taking workflow.<!-- raw HTML omitted --> I&rsquo;m already quite happy with the way of taking notes
I&rsquo;ve settled on, using a lot of drawings and often iterating over the same
notes multiple times to ask questions, fill in details, and figure out
connections. The main remaining question for me is instead about tools.</p></description>
</item>
<item>
<title>Learning how to learn</title>
<link>https://livesys.se/posts/how-to-learn/</link>
<pubDate>Tue, 31 Oct 2017 10:38:00 +0100</pubDate>
<guid>https://livesys.se/posts/how-to-learn/</guid>
<description><p><a href="https://www.goodreads.com/book/show/18693655-a-mind-for-numbers" target="_blank" rel="noopener"><p class="image">
 <img src="amfn.jpg" alt="" class="align_right" />
</p>
</a>

I&rsquo;m reading <a href="https://www.goodreads.com/book/show/18693655-a-mind-for-numbers" target="_blank" rel="noopener">A mind for
numbers</a>
, by
Barbara Oakley. Firstly, it is a very interesting book, but the main lesson
I&rsquo;ve already learned from this book seems so paramount that I have to write it
down, so I don&rsquo;t forget it (some meta-connotations in that statement ;) ). I
found the book through Barbara&rsquo;s coursera course &ldquo;<a href="https://www.coursera.org/learn/learning-how-to-learn/" target="_blank" rel="noopener">Learning how to
Learn</a>
&rdquo;, and to me it
seems learning in general is the topic of the book too, more than numbers
specifically - but I still have to read it through, so stay tuned. (Went for
the book instead as I never found time to follow this type of online courses).</p></description>
</item>
<item>
<title>On Provenance Reports in Scientific Workflows</title>
<link>https://livesys.se/posts/provenance-reports-in-scientific-workflows/</link>
<pubDate>Thu, 19 Oct 2017 11:44:00 +0200</pubDate>
<guid>https://livesys.se/posts/provenance-reports-in-scientific-workflows/</guid>
<description><!-- raw HTML omitted -->
<p>One of the more important tasks for a scientific workflow is to keep
track of so called &ldquo;provenance information&rdquo; about its data outputs -
information about how each data file was created. This is important so
other researchers can easily replicate the study (re-run it with the
same software and tools). It should also help for anyone wanting to
reproduce it (re-run the same study design, possibly with other software
and tools).</p></description>
</item>
<item>
<title>(Almost) ranging over multiple Go channels simultaneously</title>
<link>https://livesys.se/posts/range-over-multiple-go-channels/</link>
<pubDate>Thu, 05 Oct 2017 10:23:00 +0200</pubDate>
<guid>https://livesys.se/posts/range-over-multiple-go-channels/</guid>
<description><!-- raw HTML omitted -->
<p>Thus, optimally, one would want to use Go&rsquo;s handy <strong>range</strong> keyword for
looping over multiple channels, since <strong>range</strong> takes care of closing
the for-loop at the right time (when the inbound channel is closed). So
something like this (<strong>N.B:</strong> non-working code!):</p>
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go"><span style="display:flex;"><span><span style="color:#a90d91">for</span> <span style="color:#000">a</span>, <span style="color:#000">b</span>, <span style="color:#000">c</span> <span style="color:#000">:=</span> <span style="color:#a90d91">range</span> <span style="color:#000">chA</span>, <span style="color:#000">chB</span>, <span style="color:#000">chC</span> {
</span></span><span style="display:flex;"><span> <span style="color:#000">doSomething</span>(<span style="color:#000">a</span>, <span style="color:#000">b</span>, <span style="color:#000">c</span>)
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Unfortunately this is not possible, and probably for good reason (how
would it know whether to close the loop when the first, or all of the
channels are closed? etc).</p></description>
</item>
<item>
<title>First production run with SciPipe - A Go-based scientific workflow tool</title>
<link>https://livesys.se/posts/first-production-workflow-run-with-scipipe/</link>
<pubDate>Thu, 28 Sep 2017 19:32:00 +0200</pubDate>
<guid>https://livesys.se/posts/first-production-workflow-run-with-scipipe/</guid>
<description><p>Today marked the day when we ran the very first production workflow with
<a href="http://scipipe.org" target="_blank" rel="noopener">SciPipe</a>
, the <a href="https://golang.org/" target="_blank" rel="noopener">Go</a>
-based
<a href="https://en.wikipedia.org/wiki/Scientific_workflow_system" target="_blank" rel="noopener">scientific workflow
tool</a>
 we&rsquo;ve
been working on over the last couple of years. Yay! :)</p>
<p>This is how it looked (no fancy GUI or such yet, sorry):</p>
<p><p class="image">
 <img src="terminal_411.png" alt="" />
</p>
</p>
<p>The first result we got in this very very first job was a list of counts
of ligands (chemical compounds) in the <a href="https://jcheminf.springeropen.com/articles/10.1186/s13321-017-0203-5" target="_blank" rel="noopener">ExcapeDB
dataset</a>

(<a href="https://zenodo.org/record/173258" target="_blank" rel="noopener">download here</a>
) interacting with the
44 protein/gene targets <a href="http://dx.doi.org/10.1038/nrd3845" target="_blank" rel="noopener">identified by Bowes et
al</a>
 as a good baseline set for
identifying hazardous side-effects effects in the body (that is, any
chemical compounds binding these proteins, will never become an approved
drug).</p></description>
</item>
<item>
<title>Compiling RDFHDT C++ tools on UPPMAX (RHEL/CentOS 7)</title>
<link>https://livesys.se/posts/compiling-rdfhdt-c-tools-on-uppmax-rhel-centos-7/</link>
<pubDate>Wed, 13 Sep 2017 18:04:00 +0200</pubDate>
<guid>https://livesys.se/posts/compiling-rdfhdt-c-tools-on-uppmax-rhel-centos-7/</guid>
<description><!-- raw HTML omitted -->
<p>At <a href="http://pharmb.io" target="_blank" rel="noopener">pharmb.io</a>
 we are researching how to use semantic
technologies to push the boundaries for what can be done with
intelligent data processing, often of large datasets (see e.g. our
<a href="http://dx.doi.org/10.1186/2041-1480-2-S1-S6" target="_blank" rel="noopener">paper on linking RDF to cheminformatics and
proteomics</a>
, and <a href="http://dx.doi.org/10.1186/s13326-017-0136-y" target="_blank" rel="noopener">our work
on the RDFIO software
suite</a>
). Thus, for us,
RDFHDT opens new possibilites. As we are heavy users of the <a href="http://www.uppmax.uu.se" target="_blank" rel="noopener">UPPMAX HPC
center</a>
 for our computations, and so, we need
to have the HDT tools available there. This post will outline the steps
to compile the <a href="https://github.com/rdfhdt/hdt-cpp" target="_blank" rel="noopener">C++ HDT commandline tool
suite</a>
 from source.</p></description>
</item>
<item>
<title>New paper on RDFIO for interoperable biomedical data management in Semantic MediaWiki</title>
<link>https://livesys.se/posts/new-paper-on-rdfio-for-interoperable-biomedical-datamanagement-in-semantic-mediawiki/</link>
<pubDate>Mon, 11 Sep 2017 15:57:00 +0200</pubDate>
<guid>https://livesys.se/posts/new-paper-on-rdfio-for-interoperable-biomedical-datamanagement-in-semantic-mediawiki/</guid>
<description><p>As my collaborator and M.Sc. supervisor <a href="https://twitter.com/egonwillighagen" target="_blank" rel="noopener">Egon
Willighagen</a>
 already
<a href="http://chem-bla-ics.blogspot.nl/2017/09/new-paper-rdfio-extending-semantic.html" target="_blank" rel="noopener">blogged</a>
,
we just released a paper titled: &ldquo;<a href="https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0136-y" target="_blank" rel="noopener">RDFIO: extending Semantic MediaWiki
for interoperable biomedical data
management</a>
&rdquo;,
with uses cases from Egon and <a href="https://twitter.com/pkohonen" target="_blank" rel="noopener">Pekka
Kohonen</a>
, coding help from <a href="https://twitter.com/ali_king" target="_blank" rel="noopener">Ali
King</a>
 and project supervision from <a href="https://twitter.com/vrandezo" target="_blank" rel="noopener">Denny
Vrandečić</a>
, <a href="https://www.linkedin.com/in/roland-grafstr%c3%b6m-a86b3b2" target="_blank" rel="noopener">Roland
Grafström</a>
 and
<a href="https://twitter.com/ola_spjuth" target="_blank" rel="noopener">Ola Spjuth</a>
.</p>
<p>See the picture below (from the paper) for an overview of all the newly
developed functionality (drawn in black), as related to the previously
existing functionality (drawn in grey):</p></description>
</item>
<item>
<title>Notes on launching kubernetes jobs from the Go API</title>
<link>https://livesys.se/posts/launching-kubernetes-jobs-from-the-go-api-notes-from-a-beginner/</link>
<pubDate>Wed, 15 Feb 2017 00:01:00 +0100</pubDate>
<guid>https://livesys.se/posts/launching-kubernetes-jobs-from-the-go-api-notes-from-a-beginner/</guid>
<description><p><em>This post is <a href="https://medium.com/@saml/launching-kubernetes-jobs-from-the-go-api-notes-from-a-beginner-2b34fbc502c0" target="_blank" rel="noopener">also published on
medium</a>
</em></p>
<p>My current work at <a href="http://pharmb.io/" target="_blank" rel="noopener">pharmb.io</a>
 entails adding
<a href="http://kubernetes.io/" target="_blank" rel="noopener">kubernetes</a>
 support to my light-weight Go-based
scientific workflow engine,
<a href="https://github.com/scipipe/scipipe" target="_blank" rel="noopener">scipipe</a>
 (kubernetes, or <em>k8s</em> for
short, is Google&rsquo;s open source project for orchestrating container based
compute clusters), which should take scipipe from a simple &ldquo;run it on
your laptop&rdquo; workflow system with HPC support still in the work, to
something that can power scientific workflows on any set of networked
computers that can run kubernetes, which is quite a few (AWS, GCE,
Azure, your Raspberry Phi cluster etc etc).</p></description>
</item>
<item>
<title>SMWCon Fall 2016 - My talk on large RDF imports</title>
<link>https://livesys.se/posts/smwcon-fall-2016/</link>
<pubDate>Fri, 07 Oct 2016 11:50:00 +0200</pubDate>
<guid>https://livesys.se/posts/smwcon-fall-2016/</guid>
<description><p>I was invited to give a talk at <a href="https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2016/Conference_Days" target="_blank" rel="noopener">Semantic MediaWiki (SMW)
conference</a>

in Frankfurt last week, on our work on enabling import of RDF datasets into
<a href="https://www.semantic-mediawiki.org" target="_blank" rel="noopener">SMW</a>
. I have presented at SMWCon before
as well (2011: <a href="http://saml.rilspace.com/smwcon-fall-2011-impressions" target="_blank" rel="noopener">blog</a>
,
<a href="http://www.slideshare.net/SamuelLampa/hooking-up-semantic-mediawiki-with-external-tools-via-sparql" target="_blank" rel="noopener">slides</a>

<a href="https://www.youtube.com/watch?v=3US0G5dDynM" target="_blank" rel="noopener">video</a>
, 2013:
<a href="http://www.slideshare.net/SamuelLampa/20131030-smw-con" target="_blank" rel="noopener">slides</a>
), so it was
nice to re-connect with some old friends, and to get up to date about how SMW
is developing, as well as share about our own contributions.</p>
<ul>
<li><strong>Note:</strong> See also the <a href="https://storify.com/smllmp/semantic-mediawiki-conference-fall-2016" target="_blank" rel="noopener">SMWCon 2016 twitter
storyboard</a>

for a nice overview of the event.</li>
</ul>
<h2 id="the-talk---on-rdf-import-in-semantic-mediawiki">The Talk - on RDF Import in Semantic MediaWiki</h2>
<p>My talk this year was titled &ldquo;Batch import of large RDF datasets using
RDFIO or the new rdf2smw tool&rdquo;. More info can be found on the
<a href="https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2016/Batch_import_of_large_RDF_datasets_using_RDFIO_or_the_new_rdf2smw_tool" target="_blank" rel="noopener">talk page</a>
</p></description>
</item>
<item>
<title>Tutorial: Luigi for Scientific Workflows</title>
<link>https://livesys.se/posts/luigi-tutorial/</link>
<pubDate>Tue, 21 Jun 2016 13:49:00 +0200</pubDate>
<guid>https://livesys.se/posts/luigi-tutorial/</guid>
<description><p><em>This is a Luigi tutorial I held at the <a href="http://uppnex.se/events/eInfraMPS2015/" target="_blank" rel="noopener">e-Infrastructures for Massively
parallel sequencing</a>
 workshop
(<a href="https://www.youtube.com/channel/UCfLDx5VYn25QIZLmtybvdeQ/videos" target="_blank" rel="noopener">Video
archive</a>
)
at <a href="http://scilifelab.se/" target="_blank" rel="noopener">SciLifeLab</a>
 Uppsala in January 2015, moved
here for future reference.</em></p>
<h2 id="luigi_screenshotpng"><p class="image">
 <img src="luigi_screenshot.png" alt="" />
</p>
</h2>
<h2 id="what-is-luigi">What is Luigi?</h2>
<p><a href="https://github.com/spotify/luigi" target="_blank" rel="noopener">Luigi</a>
 is a batch workflow system
written in Python and developed by <a href="https://erikbern.com/" target="_blank" rel="noopener">Erik
Bernhardson</a>
 and others at
<a href="http://spotify.com/" target="_blank" rel="noopener">Spotify</a>
, where it is used to compute
machine-learning powered music recommendation lists, top lists etc.</p>
<p>Luigi is one of not-too-many batch workflow systems that supports
running both normal command line jobs and
<a href="http://hadoop.apache.org/" target="_blank" rel="noopener">Hadoop</a>
 jobs in the same (in this tutorial,
we will focus only on the command line part).</p></description>
</item>
<item>
<title>Combining the best of Go, D and Rust?</title>
<link>https://livesys.se/posts/the-best-of-go-d-and-rust/</link>
<pubDate>Sat, 11 Jun 2016 14:24:00 +0200</pubDate>
<guid>https://livesys.se/posts/the-best-of-go-d-and-rust/</guid>
<description><!-- raw HTML omitted -->
<p>I&rsquo;ve been following the development of <a href="http://dlang.org" target="_blank" rel="noopener">D</a>
,
<a href="http://golang.org" target="_blank" rel="noopener">Go</a>
 and <a href="https://www.rust-lang.org/" target="_blank" rel="noopener">Rust</a>
 (and also
<a href="http://www.freepascal.org/" target="_blank" rel="noopener">FreePascal</a>
 for <a href="https://github.com/NBISweden/mdc-file-export" target="_blank" rel="noopener">some use
cases</a>
) for some years
(<a href="http://saml.rilspace.org/moar-languagez-gc-content-in-python-d-fpc-c-and-c" target="_blank" rel="noopener">been into some benchmarking for bioinfo
tasks</a>
),
and now we finally have three (four, with fpc) stable statically
compiled languages with some momentum behind them, meaning they all are
past 1.0.</p>
<p>While I have went with Go for <a href="http://scipipe.org" target="_blank" rel="noopener">current projects</a>
, I
still have a hard time &ldquo;totally falling in love&rdquo; with any single of
these languages. They all fulfill different subsets of my wishlist for
an optimal compiled data munging language.</p></description>
</item>
<item>
<title>Time-boxing and a unified trello board = productivity</title>
<link>https://livesys.se/posts/time-boxing-and-unified-trello-board/</link>
<pubDate>Fri, 26 Feb 2016 12:40:00 +0100</pubDate>
<guid>https://livesys.se/posts/time-boxing-and-unified-trello-board/</guid>
<description><p><p class="image">
 <img src="selection_039.png" alt="" />
</p>
</p>
<ul>
<li><strong>Figure:</strong> Sketchy screenshot of how my current board looks. Notice
especially the &ldquo;Now&rdquo; stack, marked in yellow, where you are only
allowed to put <strong>one single</strong> card.</li>
</ul>
<p>I used to have a very hard time getting an overview of my current work,
and prioritizing and concentrating on any single task for too long. I
always felt there might be something else that might be more important
than what I were currently doing. And in fact, how would I know if I
didn&rsquo;t have the overview?</p></description>
</item>
<item>
<title>The unexpected convenience of JSON on the commandline</title>
<link>https://livesys.se/posts/the-unexpected-usefullness-of-json-on-the-commandline/</link>
<pubDate>Tue, 08 Dec 2015 07:10:00 +0100</pubDate>
<guid>https://livesys.se/posts/the-unexpected-usefullness-of-json-on-the-commandline/</guid>
<description><p>I was working with a migration from <a href="http://drupal.org" target="_blank" rel="noopener">drupal</a>
 to
<a href="http://processwire.com" target="_blank" rel="noopener">processwire</a>
 CMS:es, where I wanted to be able
to pipe data, including the body field with HTML formatting and all,
through multiple processing steps in a flexible manner. I&rsquo;d start with
an extraction SQL query, through a few components to replace and massage
the data, and finally over to an import command using processwire&rsquo;s
<a href="http://wireshell.pw" target="_blank" rel="noopener">wireshell tool</a>
. So, basically I needed a flexible
format for structured data that could be sent as one &ldquo;data object&rdquo; per
line, to work nicely with linux commandline tools like grep, sed and
awk.</p></description>
</item>
<item>
<title>The matrix transformation as a model for declarative atomic data flow operations</title>
<link>https://livesys.se/posts/matrix-transformation-as-model-for-data-flow-operations/</link>
<pubDate>Mon, 09 Nov 2015 19:45:00 +0100</pubDate>
<guid>https://livesys.se/posts/matrix-transformation-as-model-for-data-flow-operations/</guid>
<description><p><em>After just <a href="https://news.ycombinator.com/item?id=10532957" target="_blank" rel="noopener">reading on Hacker
News</a>
 about Google&rsquo;s
newly released <a href="http://tensorflow.org/" target="_blank" rel="noopener">TensorFlow library</a>
, for deep
learning based on tensors and data flow, I realized I wrote in a draft
post back in 2013 that:</em></p>
<blockquote>
<p><em><strong>&ldquo;What if one could have a fully declarative &ldquo;matrix language&rdquo; in
which all data transformations ever needed could be declaratively
defined in a way that is very easy to comprehend?&rdquo;</strong></em></p>
</blockquote>
<p><em>&hellip; so, I thought this is a good time to post this draft, to see
whether it spurs any further ideas. The following is an almost &ldquo;as is&rdquo;
copy and paste of that document from December 2013:</em></p></description>
</item>
<item>
<title>Wanted: Dynamic workflow scheduling</title>
<link>https://livesys.se/posts/dynamic-workflow-scheduling/</link>
<pubDate>Mon, 26 Oct 2015 21:23:00 +0100</pubDate>
<guid>https://livesys.se/posts/dynamic-workflow-scheduling/</guid>
<description><p><p class="image">
 <img src="scheduling_unsplash-1.jpg" alt="" />
</p>
</p>
<p><em>Photo credits: <a href="https://unsplash.com/whale" target="_blank" rel="noopener">Matthew Smith</a>
 /
<a href="https://unsplash.com/photos/OiiThC8Wf68" target="_blank" rel="noopener">Unsplash</a>
</em></p>
<p>In <a href="https://jcheminf.springeropen.com/articles/10.1186/s13321-016-0179-6" target="_blank" rel="noopener">our work on automating machine learning computations in
cheminformatics with scientific workflow
tools</a>
,
we have came to realize something; <em>Dynamic scheduling in scientific
workflow tools is very important and sometimes badly needed.</em></p>
<p>What I mean is that <em>new tasks should be able to be scheduled during the
execution of a workflow, not just in its scheduling phase.</em></p>
<p>What is striking is that far from all workflow tools allow this. Many
tools completely separate the execution into two stages:</p></description>
</item>
<item>
<title>How to be productive in vim in 30 minutes</title>
<link>https://livesys.se/posts/how-to-be-productive-in-vim-in-30-minutes/</link>
<pubDate>Tue, 15 Sep 2015 12:21:00 +0200</pubDate>
<guid>https://livesys.se/posts/how-to-be-productive-in-vim-in-30-minutes/</guid>
<description><p><p class="image">
 <img src="selection_333-1.png" alt="" />
</p>
</p>
<p>I had heard a lot of people say vim is very hard to learn, and got the
impression that it will take a great investment to switch to using it.</p>
<p>While I have came to understand that they are right in that there is a
lot of things to invest in to get really great at using vim, that will
really pay back, I have also found out one thing that I see almost
no-one mentioning:</p></description>
</item>
<item>
<title>How to compile vim for use with pyenv and vim-pyenv</title>
<link>https://livesys.se/posts/how-to-compile-vim-for-use-with-pyenv-and-vim-pyenv/</link>
<pubDate>Thu, 20 Aug 2015 08:05:00 +0200</pubDate>
<guid>https://livesys.se/posts/how-to-compile-vim-for-use-with-pyenv-and-vim-pyenv/</guid>
<description><!-- raw HTML omitted -->
<p>This manifested itself in a bunch of error message from the python
module in vim, ending with:</p>
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#000">AttributeError</span>: <span style="color:#c41a16">&#39;module&#39;</span> <span style="color:#a90d91">object</span> <span style="color:#000">has</span> <span style="color:#000">no</span> <span style="color:#000">attribute</span> <span style="color:#c41a16">&#39;vars&#39;</span>
</span></span></code></pre></div><p>I first thought it was an error in vim-pyenv and <a href="https://github.com/lambdalisue/vim-pyenv/issues/14" target="_blank" rel="noopener">reported
it</a>
 (see that issue
for more in-depth details). In summary it turns out that older versions
of VIM indeed lack some attributes in its python module, so I figured I
had to compile my own version, below are just my notes about how to do
this, for future reference:</p></description>
</item>
<item>
<title>How I would like to write Go programs</title>
<link>https://livesys.se/posts/how-i-would-like-to-write-golang/</link>
<pubDate>Sat, 18 Jul 2015 02:34:00 +0200</pubDate>
<guid>https://livesys.se/posts/how-i-would-like-to-write-golang/</guid>
<description><p><p class="image">
 <img src="selection_301.png" alt="" />
</p>
</p>
<p>Some time ago I got a <a href="http://blog.gopheracademy.com/composable-pipelines-pattern" target="_blank" rel="noopener">post published on
GopherAcademy</a>
,
outlining in detail how I think a <a href="http://www.jpaulmorrison.com/fbp" target="_blank" rel="noopener">flow-based
programming</a>
 inspired syntax can
strongly help to create clearer, easier-to-maintain, and more
declarative Go programs.</p>
<p>These ideas have since became clearer, and we (<a href="http://twitter.com/ola_spjuth" target="_blank" rel="noopener">Ola
Spjuth</a>
&rsquo;s <a href="http://www.farmbio.uu.se/research/researchgroups/pb/" target="_blank" rel="noopener">research group at
pharmbio</a>
) have
successfully used them to make the workflow syntax for
<a href="https://github.com/spotify/luigi" target="_blank" rel="noopener">Luigi</a>
 (Spotify&rsquo;s great workflow
engine by <a href="http://twitter.com/fulhack" target="_blank" rel="noopener">Erik Bernhardsson</a>
 &amp; co)
workflows easier, as implemented in the <a href="https://github.com/samuell/sciluigi#readme" target="_blank" rel="noopener">SciLuigi helper
library</a>
.</p></description>
</item>
<item>
<title>Terminator as a middle-way between floating and tiling window managers</title>
<link>https://livesys.se/posts/terminator-middle-way/</link>
<pubDate>Fri, 17 Jul 2015 19:22:00 +0200</pubDate>
<guid>https://livesys.se/posts/terminator-middle-way/</guid>
<description><p>I have tried hard to improve my linux desktop productivity by learning
to do as much as possible using keyboard shortcuts, aliases for terminal
commands etc etc (I even produced an <a href="https://www.udemy.com/command-line-productivity/?couponCode=BlogR33ders" target="_blank" rel="noopener">online course on linux commandline
productivity</a>
).</p>
<p>In this spirit, I naturally tried out a so called <a href="https://www.udemy.com/command-line-productivity" target="_blank" rel="noopener">tiling window
manager</a>
 (aka tiling
wm). In short, a tiling wm organizes all open windows on the screen (or
on the current desktop) into a &ldquo;tiled&rdquo; grid of frames. You can then
control how these frames are created, resized, as well as switch focus
between the frames, all using keyboard shortcuts. This allows you to
avoid leaving the keyboard for moving windows around or resizing them,
before starting your work in a new program.</p></description>
</item>
<item>
<title>FBP inspired data flow syntax: The missing piece for the success of functional programming?</title>
<link>https://livesys.se/posts/fbp-data-flow-syntax/</link>
<pubDate>Thu, 16 Jul 2015 17:13:00 +0200</pubDate>
<guid>https://livesys.se/posts/fbp-data-flow-syntax/</guid>
<description><p><p class="image">
 <img src="selection_288.png" alt="" />
</p>
</p>
<p>Often when I suggest people have a look at <a href="http://www.jpaulmorrison.com/fbp/" target="_blank" rel="noopener">Flow-based
Programming</a>
 (FBP) or <a href="https://en.wikipedia.org/wiki/Dataflow" target="_blank" rel="noopener">Data
Flow</a>
 for one reason or another,
people are often put off by the strong connection between these concepts
and graphical programming. That is, the idea that programs will be
easier to understand if expressed and developed in a visual notation.</p>
<p>This is unfortunate, since I think this is in no way the core benefit of
FBP or Data Flow, although it is a nice side-effect for those who prefer
it. For example, I personally mostly prefer working with text over a
graphical notation, for productivity reasons.</p></description>
</item>
<item>
<title>A few thoughts on organizing computational (biology) projects</title>
<link>https://livesys.se/posts/organizing-compbio-projects/</link>
<pubDate>Tue, 23 Jun 2015 20:32:00 +0200</pubDate>
<guid>https://livesys.se/posts/organizing-compbio-projects/</guid>
<description><p><p class="image">
 <img src="-bin-bash_251-1.png" alt="Screenshot of paper and a directory structure in a terminal" />
</p>
</p>
<p>I read this <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424" target="_blank" rel="noopener">excellent
article</a>

with practical recommendations on how to organize a computational project, in
terms of directory structure.</p>
<h2 id="directory-structure-matters">Directory structure matters</h2>
<p>The importance of a good directory structure seems to often be
overlooked in teaching about computational biology, but can be the
difference between a successful project, and one where every change or
re-run of some part of a workflow, will require days of manual fiddling
to get hand on the right data, in the right format, in the right place,
with the right version of the workflow, with the right parameters, and
then succeed to run it without errors.</p></description>
</item>
<item>
<title>Flow-based programming and Erlang style message passing - A Biology-inspired idea of how they fit together</title>
<link>https://livesys.se/posts/flowbased-vs-erlang-message-passing/</link>
<pubDate>Sat, 13 Jun 2015 14:25:00 +0200</pubDate>
<guid>https://livesys.se/posts/flowbased-vs-erlang-message-passing/</guid>
<description><p><strong>I think Erlang/Elixir fits great as control plane or
service-to-service messaging layer for distributing services built with
flow-based programming</strong></p>
<p><p class="image">
 <img src="erlang_logo.png" alt="Erlang logo" class="align_right" />
</p>
Just
back from a one day visit to <a href="http://www.erlang-factory.com/euc2015" target="_blank" rel="noopener">Erlang User
Conference</a>
. I find the
<a href="http://www.erlang.org/" target="_blank" rel="noopener">Erlang</a>
 virtual machine fascinating. And with
the new <a href="http://elixir-lang.org/" target="_blank" rel="noopener">Elixir language</a>
 built on top of it to
fix some of the pain points with <em>Erlang the language</em>, the eco-system
has got even more interesting.</p>
<p>What I find exciting about Erlang/Elixir and its virtual machine, is its
ability to utilize multiple CPU:s on computers, and doing this across
multiple computers, in what is commonly referred to as &ldquo;distributed
computing&rdquo;. <p class="image">
 <img src="fbp_general_interactive_application.png" alt="Flow-based programming example. Image from
http://jpaulmorrison.com/fbp/examples.html" class="align_right" />
</p></description>
</item>
<item>
<title>A cheatsheet for the iRODS rule language</title>
<link>https://livesys.se/posts/irods-rulelang-cheatsheet/</link>
<pubDate>Thu, 11 Jun 2015 01:55:00 +0200</pubDate>
<guid>https://livesys.se/posts/irods-rulelang-cheatsheet/</guid>
<description><p><p class="image">
 <img src="irods-logo.png" alt="" class="align_right" />
</p>
<a href="http://irods.org/" target="_blank" rel="noopener">iRODS</a>
,
the &ldquo;integrated rule oriented data system&rdquo; is a super cool system for
managing datasets consisting of files, from smallish ones, to really
large ones counted in petabytes, and possibly spanning multiple
continents.</p>
<p>There&rsquo;s a lot to be said about iRODS (up for another blog post) but the
one most interesting feature, in my opinion, is the <em>rule language</em>,
which allows to define custom rules and policies for how data should be
handled, totally automatically, depending on a lot of factors. For
example &ldquo;if data if untouched for three months, transparently migrate
it from fast disk storage to archival tape storage&rdquo;, etc etc.</p></description>
</item>
<item>
<title>Workflow tool makers: Allow defining data flow, not just task dependencies</title>
<link>https://livesys.se/posts/workflows-dataflow-not-task-deps/</link>
<pubDate>Wed, 10 Jun 2015 12:03:00 +0200</pubDate>
<guid>https://livesys.se/posts/workflows-dataflow-not-task-deps/</guid>
<description><h3 id="upsurge-in-workflow-tools">Upsurge in workflow tools</h3>
<p><p class="image">
 <img src="selection_201.png" alt="Workflow tool
logos" class="align_right" />
</p>
There
seem to be a little upsurge in light-weight - often python-based -
workflow tools for data pipelines in the last couple of years:
Spotify&rsquo;s <a href="https://github.com/spotify/luigi" target="_blank" rel="noopener">Luigi</a>
, OpenStack&rsquo;s
<a href="https://wiki.openstack.org/wiki/Mistral" target="_blank" rel="noopener">Mistral</a>
, Pinterest&rsquo;s
<a href="https://github.com/pinterest/pinball" target="_blank" rel="noopener">Pinball</a>
, and recently AirBnb&rsquo;s
<a href="https://github.com/airbnb/airflow" target="_blank" rel="noopener">Airflow</a>
, to name a few. These are
all interesting tools, and it is an interesting trend for us at
<a href="http://www.farmbio.uu.se/research/researchgroups/pb/?languageId=1" target="_blank" rel="noopener">pharmbio</a>
,
who try to see how we can use workflow tools to automate bio- and
cheminformatics tasks on compute clusters.</p></description>
</item>
<item>
<title>Patterns for composable concurrent pipelines in Go</title>
<link>https://livesys.se/posts/patterns-for-composable-concurrent-pipelines-in-go/</link>
<pubDate>Mon, 01 Jun 2015 14:54:00 +0200</pubDate>
<guid>https://livesys.se/posts/patterns-for-composable-concurrent-pipelines-in-go/</guid>
<description><p>I realize I didn&rsquo;t have a link to my blog on <a href="http://gopheracademy.com" target="_blank" rel="noopener">Gopher
Academy</a>
, on patterns for compoasable
concurrent pipelines in Go(lang), so here it goes:</p>
<ul>
<li><a href="http://blog.gopheracademy.com/composable-pipelines-pattern" target="_blank" rel="noopener">blog.gopheracademy.com/composable-pipelines-pattern</a>
</li>
</ul>
<p><p class="image">
 <img src="selection_212.png" alt="Gopher academy screenshot" />
</p>
</p></description>
</item>
<item>
<title>The role of simplicity in testing and automation</title>
<link>https://livesys.se/posts/the-role-of-simplicity-in-testing-and-automation/</link>
<pubDate>Mon, 23 Mar 2015 20:46:00 +0100</pubDate>
<guid>https://livesys.se/posts/the-role-of-simplicity-in-testing-and-automation/</guid>
<description><p>Disclaimer: Don&rsquo;t take this too seriously &hellip; this is
&ldquo;thinking-in-progress&rdquo; :)</p>
<p>It just struck me the other minute, how simplicity is the key theme
behind two very important areas in software development, that I&rsquo;ve been
dabbling with quite a bit recently: Testing, and automation.</p>
<p>Have you thought about how testing, in its essence, is: <em>Wrapping
<strong>complex</strong> code, which you can&rsquo;t mentally comprehend completely, in
<strong>simple</strong> code, that you can mentally comprehend, at least one test at
a time.</em> Because, after all, if you can&rsquo;t easily comprehend your test
code, as to make sure it is correct, by simply looking at it, you will
have to <em>create even simpler tests that test your tests</em>!</p></description>
</item>
<item>
<title>The problem with make for scientific workflows</title>
<link>https://livesys.se/posts/the-problem-with-make-for-scientific-workflows/</link>
<pubDate>Sat, 14 Mar 2015 20:46:00 +0100</pubDate>
<guid>https://livesys.se/posts/the-problem-with-make-for-scientific-workflows/</guid>
<description><h2 id="selection_131png"><p class="image">
 <img src="selection_131.png" alt="" />
</p>
</h2>
<h2 id="the-workflow-problem-solved-once-and-for-all-in-1979">The workflow problem solved once and for all in 1979?</h2>
<p>As soon as the topic of scientific workflows is brought up, there are
always a few make fans fervently insisting that the problem of workflows
is solved once and for all with <a href="http://www.gnu.org/software/make/" target="_blank" rel="noopener">GNU
make</a>
, written first in the 70&rsquo;s :)</p>
<p>Personally I haven&rsquo;t been so sure. On the one hand, I know the tool
solves a lot of problems for many people. Also, there is something very
attractive about buildling on a tool that you can be sure will be
available on more or less every unix-like operating system, for decades
to come.</p></description>
</item>
<item>
<title>Dynamic Navigation for Higher Performance</title>
<link>https://livesys.se/posts/dynamic-navigation-for-higher-performance/</link>
<pubDate>Wed, 11 Mar 2015 20:45:00 +0100</pubDate>
<guid>https://livesys.se/posts/dynamic-navigation-for-higher-performance/</guid>
<description><h3 id="improving-performance-in-delphi-bold-mda-applications-by-replacing-navigation-code-with-derived-links-in-the-model"><strong>Improving performance in Delphi Bold MDA applications by replacing navigation code with derived links in the model</strong></h3>
<p><em>This post on <a href="http://en.wikipedia.org/wiki/Model-driven_architecture" target="_blank" rel="noopener">Model Driven
Architecture</a>
 in
<a href="http://en.wikipedia.org/wiki/Delphi_%28programming_language%29" target="_blank" rel="noopener">Delphi</a>

and <a href="http://en.wikipedia.org/wiki/Bold_for_Delphi" target="_blank" rel="noopener">Bold</a>
, by <a href="https://twitter.com/rolflampa" target="_blank" rel="noopener">Rolf
Lampa</a>
, has been previously <a href="http://www.howtodothings.com/computers/a1043-dynamic-navigation-for-higher-performance.html" target="_blank" rel="noopener">published on
howtodothings.com</a>
.</em></p>
<p>Modeling class structures takes some thinking, and when done the
thinking and the drawing and after that starting up using the model,
then you&rsquo;ll spend awful lots of code traversing links in order to
retrieve trivial info in a given object structure. Navigating the same
sometimes complicated link-paths over and over again consumes CPU power
and it also causes much redundant code or expressions accessing the same
navigation paths over and over again. In the Bold For Delphi
Architecture you would also place redundant subscriptions from many
different locations to the same target, subscribing to the same paths
over an over again.</p></description>
</item>
<item>
<title>NGS Bioinformatics Course Day 3: New Luigi helper tool, "real-world" NGS pipelines</title>
<link>https://livesys.se/posts/ngs-bioinformatics-intro-course-day-3/</link>
<pubDate>Tue, 03 Mar 2015 20:45:00 +0100</pubDate>
<guid>https://livesys.se/posts/ngs-bioinformatics-intro-course-day-3/</guid>
<description><p><p class="image">
 <img src="ngsintro-coding.jpg" alt="" />
</p>
</p>
<p>It turned out I didn&rsquo;t have the time and strength to blog every day at
the NGS Bioinformatics Intro course, so here comes a wrap up with some
random notes and tidbits from the last days, including any concluding
remarks!</p>
<p>These days we started working on a more realistic NGS pipeline, on
analysing re-sequencing samples
(<a href="http://uppnex.se/twiki/pub/Courses/NgsIntro1502/Schedule/NGS_course_AJ_20150211.pdf" target="_blank" rel="noopener">slides</a>
,
<a href="http://uppnex.se/twiki/do/view/Courses/NgsIntro1502/ResequencingAnalysis" target="_blank" rel="noopener">tutorial</a>
).</p>
<h2 id="first-some-outcome-from-this-tutorial">First some outcome from this tutorial</h2>
<p>What do I mean with &ldquo;outcome&rdquo;? Well, as I tried to manually copy and
paste the <a href="http://uppnex.se/twiki/do/view/Courses/NgsIntro1502/ResequencingAnalysis.html" target="_blank" rel="noopener">bag of hairy nasty long bash commandline strings in the
tutorial
pages</a>
,
that all depended upon each other, I got so frustrated that I decided to
try to encode them in a workflow language / tool.</p></description>
</item>
<item>
<title>Random links from the Hadoop NGS Workshop</title>
<link>https://livesys.se/posts/hadoop-ngs-workshop/</link>
<pubDate>Thu, 19 Feb 2015 20:44:00 +0100</pubDate>
<guid>https://livesys.se/posts/hadoop-ngs-workshop/</guid>
<description><p>Some random links from the <a href="https://biobankcloud.com/?q=ngs-workshop" target="_blank" rel="noopener">Hadoop for Next-Gen Sequencing
workshop</a>
 held at KTH in Kista,
Stockholm in February 2015.</p>
<ul>
<li>
<p><strong>UPDATE: <a href="http://www.biobankcloud.eu/?q=ngs-workshop-videos-slides" target="_blank" rel="noopener">Slides and Videos now
available</a>
!</strong></p>
</li>
<li>
<p><a href="https://github.com/andypetrella/spark-notebook" target="_blank" rel="noopener">Spark notebook</a>
</p>
<ul>
<li><a href="https://github.com/Bridgewater/scala-notebook" target="_blank" rel="noopener">Scala notebook</a>
</li>
</ul>
</li>
<li>
<p><a href="https://amplab.cs.berkeley.edu/publication/adam-genomics-formats-and-processing-patterns-for-cloud-scale-computing/" target="_blank" rel="noopener">ADAM</a>
</p>
<ul>
<li>By <a href="http://bdgenomics.org/" target="_blank" rel="noopener">Big Data Genomics</a>
</li>
</ul>
</li>
<li>
<p><a href="https://twitter.com/fnothaft/status/568413764376256512" target="_blank" rel="noopener">Tweet by Frank Nothaft on common workflow
def</a>
</p>
<ul>
<li>Part of <a href="http://genomicsandhealth.org/our-work/working-groups/data-working-group" target="_blank" rel="noopener">Global Alliance for
&hellip;</a>
</li>
<li>Another link is <a href="http://ga4gh.org/" target="_blank" rel="noopener">ga4gh.org</a>
</li>
</ul>
</li>
<li>
<p><a href="http://tachyon-project.org/" target="_blank" rel="noopener">Tachyon</a>
 in-memory file system</p>
</li>
<li>
<p><a href="https://github.com/samuell/cuneiform" target="_blank" rel="noopener">Cuneiform</a>
</p>
<ul>
<li>Does support multiple outputs etc</li>
<li>Black-box vs. White-box</li>
<li>Workflow dependency graph can be dynamically built up while
you&rsquo;re running</li>
<li>Can specity tasks in any scripting languages, or in cuneiform
itself</li>
</ul>
</li>
<li>
<p><a href="https://github.com/samuell/Hi-WAY" target="_blank" rel="noopener">Hi-Way</a>
</p></description>
</item>
<item>
<title>Links: Our experiences using Spotify's Luigi for Bioinformatics Workflows</title>
<link>https://livesys.se/posts/our-experiences-using-spotifys-luigi-for-bioinformatics-workflows/</link>
<pubDate>Thu, 12 Feb 2015 20:45:00 +0100</pubDate>
<guid>https://livesys.se/posts/our-experiences-using-spotifys-luigi-for-bioinformatics-workflows/</guid>
<description><p><p class="image">
 <img src="selection_047_luigi.png" alt="Luigi Screenshot" />
</p>
</p>
<p><em>Fig 1: A screenshot of Luigi&rsquo;s web UI, of a real-world (although
rather simple) workflow implemented in Luigi:</em></p>
<p><em><strong>Update May 5, 2016:</strong> Most of the below material is more or less
outdated. Our latest work has resulted in the <a href="https://github.com/pharmbio/sciluigi" target="_blank" rel="noopener">SciLuigi helper
library</a>
, which we have used in
production and will be focus of further developments.</em></p>
<p>In the <a href="http://bioclipse.net/" target="_blank" rel="noopener">Bioclipse</a>
 / <a href="http://www.farmbio.uu.se/forskning/researchgroups/pb/" target="_blank" rel="noopener">Pharmaceutical
Bioinformatics
group</a>
 at Dept of
Pharm. Biosciences att UU, we are quite heavy users of Spotify&rsquo;s <a href="https://github.com/spotify/luigi" target="_blank" rel="noopener">Luigi
workflow library</a>
, to automate
workflows, mainly doing Machine Learning heavy lifting.</p></description>
</item>
<item>
<title>NGS Bioinformatics Intro Course Day 2</title>
<link>https://livesys.se/posts/ngs-bioinformatics-intro-course-day-2/</link>
<pubDate>Tue, 10 Feb 2015 20:44:00 +0100</pubDate>
<guid>https://livesys.se/posts/ngs-bioinformatics-intro-course-day-2/</guid>
<description><p>Today was the second day of the <a href="http://uppnex.se/twiki/do/view/Courses/NgsIntro1502/" target="_blank" rel="noopener">introductory course in NGS
bioinformatics</a>

that I&rsquo;m taking as part of my PhD studies.</p>
<p><p class="image">
 <img src="20150210_132439.jpg" alt="" />
</p>
</p>
<p>For me it started with a substantial oversleep, probably due to a
combination of an annoying cold and the ~2 hour commute from south
Stockholm to Uppsala and <a href="http://www.bmc.uu.se/" target="_blank" rel="noopener">BMC</a>
. Thus I missed
some really interesting
<a href="http://uppnex.se/twiki/pub/Courses/NgsIntro1502/Schedule/dahlo-filetypes.pdf" target="_blank" rel="noopener">material</a>

(and
<a href="http://uppnex.se/twiki/do/view/Courses/NgsIntro1502/FileTypes" target="_blank" rel="noopener">tutorial</a>
)
on file types in NGS analysis, but will make sure to go through that in
my free time during the week.</p></description>
</item>
<item>
<title>NGS Bioinformatics Intro Course Day 1</title>
<link>https://livesys.se/posts/ngs-intro-course-day-1/</link>
<pubDate>Mon, 09 Feb 2015 20:44:00 +0100</pubDate>
<guid>https://livesys.se/posts/ngs-intro-course-day-1/</guid>
<description><p>Just finished day 1 of the <a href="samuel.lampa.co/posts/introductory-course-in-bioinformatics-for-ngs-data/">introductory course on Bioinformatics for
Next generation sequencing
data</a>

at Scilifelab Uppsala. Attaching a photo from one of the hands-on
tutorial sessions, with the tutorial leaders, standing to the right.</p>
<p><a href="ngsintro.jpg"><p class="image">
 <img src="ngsintro.jpg" alt="" />
</p>
</a>
</p>
<p>Today&rsquo;s content was mostly introductions to the linux commandline in
general, and the <a href="http://www.uppmax.uu.se" target="_blank" rel="noopener">UPPMAX HPC environment</a>
 in
particular, an area I&rsquo;m already very familiar with, after two years as
a sysadmin at UPPMAX. Thus, today I mostly got to help out the other
students a bit.</p></description>
</item>
<item>
<title>Taking a one week introductory course in Bioinformatics for NGS data</title>
<link>https://livesys.se/posts/introductory-course-in-bioinformatics-for-ngs-data/</link>
<pubDate>Mon, 09 Feb 2015 20:44:00 +0100</pubDate>
<guid>https://livesys.se/posts/introductory-course-in-bioinformatics-for-ngs-data/</guid>
<description><!-- raw HTML omitted -->
<p>Right now I&rsquo;m sitting on the train and trying to get my head around
some of the <a href="http://uppnex.se/twiki/do/view/Courses/NgsIntro1502/PrecourseMaterial" target="_blank" rel="noopener">pre-course
materials</a>
.</p></description>
</item>
<item>
<title>RDFIO VM</title>
<link>https://livesys.se/posts/rdfio-vm/</link>
<pubDate>Tue, 13 Jan 2015 20:43:00 +0100</pubDate>
<guid>https://livesys.se/posts/rdfio-vm/</guid>
<description><!-- raw HTML omitted -->
<h2 id="the-old-virtual-machine-still-available">The old Virtual Machine still available</h2>
<ul>
<li>The old virtual machine from June 25, 2014, based on Ubuntu 14.04,
and RDFIO 2.x can be found
<a href="https://www.dropbox.com/s/0go5395v53gb9gq/rdfio_xubuntu_14.04.ova?dl=0" target="_blank" rel="noopener">here</a>
</li>
</ul></description>
</item>
<item>
<title>The smallest pipeable go program</title>
<link>https://livesys.se/posts/smallest-pipeable-go-program/</link>
<pubDate>Thu, 18 Dec 2014 20:43:00 +0100</pubDate>
<guid>https://livesys.se/posts/smallest-pipeable-go-program/</guid>
<description><p>Edit: My original suggested way further below in the post is no way the
&ldquo;smallest pipeable&rdquo; program, instead see this example (Credits: <a href="https://plus.google.com/&#43;AxelWagner_Merovius" target="_blank" rel="noopener">Axel Wagner</a>
):</p>
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go"><span style="display:flex;"><span><span style="color:#a90d91">package</span> <span style="color:#000">main</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a90d91">import</span> (
</span></span><span style="display:flex;"><span> <span style="color:#c41a16">&#34;io&#34;</span>
</span></span><span style="display:flex;"><span> <span style="color:#c41a16">&#34;os&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a90d91">func</span> <span style="color:#000">main</span>() {
</span></span><span style="display:flex;"><span> <span style="color:#000">io</span>.<span style="color:#000">Copy</span>(<span style="color:#000">os</span>.<span style="color:#000">Stdout</span>, <span style="color:#000">os</span>.<span style="color:#000">Stdin</span>)
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>&hellip; or (credits: <a href="https://twitter.com/rogpeppe" target="_blank" rel="noopener">Roger Peppe</a>
):</p>
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-go" data-lang="go"><span style="display:flex;"><span><span style="color:#a90d91">package</span> <span style="color:#000">main</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a90d91">import</span> (
</span></span><span style="display:flex;"><span> <span style="color:#c41a16">&#34;bufio&#34;</span>
</span></span><span style="display:flex;"><span> <span style="color:#c41a16">&#34;fmt&#34;</span>
</span></span><span style="display:flex;"><span> <span style="color:#c41a16">&#34;os&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a90d91">func</span> <span style="color:#000">main</span>() {
</span></span><span style="display:flex;"><span> <span style="color:#a90d91">for</span> <span style="color:#000">scan</span> <span style="color:#000">:=</span> <span style="color:#000">bufio</span>.<span style="color:#000">NewScanner</span>(<span style="color:#000">os</span>.<span style="color:#000">Stdin</span>); <span style="color:#000">scan</span>.<span style="color:#000">Scan</span>(); {
</span></span><span style="display:flex;"><span> <span style="color:#000">fmt</span>.<span style="color:#000">Printf</span>(<span style="color:#c41a16">&#34;%s\n&#34;</span>, <span style="color:#000">scan</span>.<span style="color:#000">Text</span>())
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Ah, I just realized that the &ldquo;smallest pipeable&rdquo;
<a href="http://golang.org" target="_blank" rel="noopener">Go</a>
(lang) program is rather small, if using my
<a href="https://github.com/samuell/glow" target="_blank" rel="noopener">little library of minimalistic streaming components</a>
.
Nothing more than:</p></description>
</item>
<item>
<title>Profiling and creating call graphs for Go programs</title>
<link>https://livesys.se/posts/profiling-and-call-graphs-for-golang/</link>
<pubDate>Thu, 08 Aug 2013 01:13:00 +0200</pubDate>
<guid>https://livesys.se/posts/profiling-and-call-graphs-for-golang/</guid>
<description><p>In trying to get my head around the code of the very interesting
<a href="https://github.com/trustmaster/goflow" target="_blank" rel="noopener">GoFlow</a>
 library, (for flow-based
programming in Go), and the accompanying <a href="https://github.com/samuell/blow" target="_blank" rel="noopener">flow-based bioinformatics
library</a>
 I started hacking on, I needed
to get some kind of visualization (like a call graph) &hellip; something
like this:</p>
<p><p class="image">
 <img src="basecompl_blow_callgraph_1.png" alt="Call graph" />
</p>
</p>
<p>(And in the end, that is what I got &hellip; read on &hellip; ) :)</p>
<p>I then found out about the go tool pprof command, for which the Go team
published a <a href="http://blog.golang.org/profiling-go-programs" target="_blank" rel="noopener">blog post</a>
 on
<a href="http://blog.golang.org/profiling-go-programs" target="_blank" rel="noopener">here</a>
.</p></description>
</item>
<item>
<title>(E)BNF parser for parts of the Galaxy ToolConfig syntax with ANTLR</title>
<link>https://livesys.se/posts/ebnf-parser-for-galaxy-toolconfig-syntax-with-antlr/</link>
<pubDate>Thu, 28 Jul 2011 09:46:00 +0200</pubDate>
<guid>https://livesys.se/posts/ebnf-parser-for-galaxy-toolconfig-syntax-with-antlr/</guid>
<description><p>As
<a href="http://saml.rilspace.com/fims-project-status-update-thinking-about-cli-wrapper-xml-formats" target="_blank" rel="noopener">blogged</a>

earlier, I&rsquo;m currently into parsing the syntax of some definitions for
the parameters and stuff of command line tools. As said in the linked
blog post, I was pondering whether to use the <a href="https://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax" target="_blank" rel="noopener">Galaxy
Toolconfig</a>

format or the <a href="http://www.docbook.org/tdg/en/html/cmdsynopsis.html" target="_blank" rel="noopener">DocBook CmdSynopsis
format</a>
. It turned
out though Well, that cmdsynopsis lacks the option to specify a list of
valid choices, for a parameter, as is possible in the Galaxy ToolConfig
format (see
<a href="http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax#A.3Coptions.3E_tag_set" target="_blank" rel="noopener">here</a>
),
and thus can be used to generate drop-down lists in wizards etc. which
is basically what I want to do &hellip; so, now I&rsquo;m going with the Galaxy
format after all.</p></description>
</item>
<item>
<title>Partial Galaxy ToolConfig to DocBook CmdSynopsis conversion with XSLT RegEx</title>
<link>https://livesys.se/posts/galaxy-toolconfig-to-docbook-cmdsynopsis/</link>
<pubDate>Thu, 21 Jul 2011 01:33:00 +0200</pubDate>
<guid>https://livesys.se/posts/galaxy-toolconfig-to-docbook-cmdsynopsis/</guid>
<description><!-- raw HTML omitted -->
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-xml" data-lang="xml"><span style="display:flex;"><span><span style="color:#000">&lt;tool</span> <span style="color:#836c28">id=</span><span style="color:#c41a16">&#34;sam_to_bam&#34;</span> <span style="color:#836c28">name=</span><span style="color:#c41a16">&#34;SAM-to-BAM&#34;</span> <span style="color:#836c28">version=</span><span style="color:#c41a16">&#34;1.1.1&#34;</span><span style="color:#000">&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;description&gt;</span>converts SAM format to BAM format<span style="color:#000">&lt;/description&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;requirements&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;requirement</span> <span style="color:#836c28">type=</span><span style="color:#c41a16">&#34;package&#34;</span><span style="color:#000">&gt;</span>samtools<span style="color:#000">&lt;/requirement&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;/requirements&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;command</span> <span style="color:#836c28">interpreter=</span><span style="color:#c41a16">&#34;python&#34;</span><span style="color:#000">&gt;</span>
</span></span><span style="display:flex;"><span> sam_to_bam.py
</span></span><span style="display:flex;"><span> --input1=$source.input1
</span></span><span style="display:flex;"><span> --dbkey=${input1.metadata.dbkey}
</span></span><span style="display:flex;"><span> #if $source.index_source == &#34;history&#34;:
</span></span><span style="display:flex;"><span> --ref_file=$source.ref_file
</span></span><span style="display:flex;"><span> #else
</span></span><span style="display:flex;"><span> --ref_file=&#34;None&#34;
</span></span><span style="display:flex;"><span> #end if
</span></span><span style="display:flex;"><span> --output1=$output1
</span></span><span style="display:flex;"><span> --index_dir=${GALAXY_DATA_INDEX_DIR}
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;/command&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;inputs&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;conditional</span> <span style="color:#836c28">name=</span><span style="color:#c41a16">&#34;source&#34;</span><span style="color:#000">&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;param</span> <span style="color:#836c28">name=</span><span style="color:#c41a16">&#34;index_source&#34;</span> <span style="color:#836c28">type=</span><span style="color:#c41a16">&#34;select&#34;</span> <span style="color:#836c28">label=</span><span style="color:#c41a16">&#34;Choose the source for the reference list&#34;</span><span style="color:#000">&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;option</span> <span style="color:#836c28">value=</span><span style="color:#c41a16">&#34;cached&#34;</span><span style="color:#000">&gt;</span>Locally cached<span style="color:#000">&lt;/option&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;option</span> <span style="color:#836c28">value=</span><span style="color:#c41a16">&#34;history&#34;</span><span style="color:#000">&gt;</span>History<span style="color:#000">&lt;/option&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;/param&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;when</span> <span style="color:#836c28">value=</span><span style="color:#c41a16">&#34;cached&#34;</span><span style="color:#000">&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;param</span> <span style="color:#836c28">name=</span><span style="color:#c41a16">&#34;input1&#34;</span> <span style="color:#836c28">type=</span><span style="color:#c41a16">&#34;data&#34;</span> <span style="color:#836c28">format=</span><span style="color:#c41a16">&#34;sam&#34;</span> <span style="color:#836c28">label=</span><span style="color:#c41a16">&#34;SAM File to Convert&#34;</span><span style="color:#000">&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;validator</span> <span style="color:#836c28">type=</span><span style="color:#c41a16">&#34;unspecified_build&#34;</span> <span style="color:#000">/&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;validator</span> <span style="color:#836c28">type=</span><span style="color:#c41a16">&#34;dataset_metadata_in_file&#34;</span> <span style="color:#836c28">filename=</span><span style="color:#c41a16">&#34;sam_fa_indices.loc&#34;</span> <span style="color:#836c28">metadata_name=</span><span style="color:#c41a16">&#34;dbkey&#34;</span> <span style="color:#836c28">metadata_column=</span><span style="color:#c41a16">&#34;1&#34;</span> <span style="color:#836c28">message=</span><span style="color:#c41a16">&#34;Sequences are not currently available for the specified build.&#34;</span> <span style="color:#836c28">line_startswith=</span><span style="color:#c41a16">&#34;index&#34;</span> <span style="color:#000">/&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;/param&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;/when&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;when</span> <span style="color:#836c28">value=</span><span style="color:#c41a16">&#34;history&#34;</span><span style="color:#000">&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;param</span> <span style="color:#836c28">name=</span><span style="color:#c41a16">&#34;input1&#34;</span> <span style="color:#836c28">type=</span><span style="color:#c41a16">&#34;data&#34;</span> <span style="color:#836c28">format=</span><span style="color:#c41a16">&#34;sam&#34;</span> <span style="color:#836c28">label=</span><span style="color:#c41a16">&#34;Convert SAM file&#34;</span> <span style="color:#000">/&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;param</span> <span style="color:#836c28">name=</span><span style="color:#c41a16">&#34;ref_file&#34;</span> <span style="color:#836c28">type=</span><span style="color:#c41a16">&#34;data&#34;</span> <span style="color:#836c28">format=</span><span style="color:#c41a16">&#34;fasta&#34;</span> <span style="color:#836c28">label=</span><span style="color:#c41a16">&#34;Using reference file&#34;</span> <span style="color:#000">/&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;/when&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;/conditional&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;/inputs&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;outputs&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;data</span> <span style="color:#836c28">format=</span><span style="color:#c41a16">&#34;bam&#34;</span> <span style="color:#836c28">name=</span><span style="color:#c41a16">&#34;output1&#34;</span> <span style="color:#836c28">label=</span><span style="color:#c41a16">&#34;${tool.name} on ${on_string}: converted BAM&#34;</span> <span style="color:#000">/&gt;</span>
</span></span><span style="display:flex;"><span> <span style="color:#000">&lt;/outputs&gt;</span>
</span></span><span style="display:flex;"><span><span style="color:#000">&lt;/xml&gt;</span>
</span></span></code></pre></div><p>&hellip; you see that in the <strong>command</strong> tag, the actual syntax of the
command is specified in a kind of &ldquo;free text&rdquo; format &hellip; This might
not be exactly what one might think to use XSLT transformations for, but
together with the regex functionality in XSLT 2.0 you definitely has
this option too. Helped by <a href="http://www.xml.com/pub/a/2003/06/04/tr.html" target="_blank" rel="noopener">this
article</a>
 on xml.com, I put
together this little XSLT stylesheet for parsing up the free text
content of that command tag (haven&rsquo;t got to the more detailed config
inside the inputs-tag in the galaxy format, but might not need either,
if staying with the galaxy format anyway):</p></description>
</item>
<item>
<title>Answering questions without answers - by wrapping simulations in semantics</title>
<link>https://livesys.se/posts/answering-questions-by-wrapping-simulations-in-semantics/</link>
<pubDate>Wed, 17 Feb 2010 01:45:00 +0100</pubDate>
<guid>https://livesys.se/posts/answering-questions-by-wrapping-simulations-in-semantics/</guid>
<description><!-- raw HTML omitted -->
<p>There are lots of things that can&rsquo;t be answered by a computer from data
alone. Maybe the majority of what we humans perceive as knowledge is
inferred from a combination of data (simple fact statements about
reality) and rules that tell how facts can be combined together to allow
making <em>implicit</em> knowledge (knowledge that is not persisted as facts
anywhere, but has to be <em>inferred</em> from other facts and rules) become
<em>explicit</em>.</p></description>
</item>
<item>
<title>A Hello World program in SWI-Prolog</title>
<link>https://livesys.se/posts/a-hello-world-prolog-program/</link>
<pubDate>Tue, 22 Sep 2009 17:04:00 +0200</pubDate>
<guid>https://livesys.se/posts/a-hello-world-prolog-program/</guid>
<description><!-- raw HTML omitted -->
<p>Then you can load the program from inside prolog after you&rsquo;ve started
it.</p>
<p>So, let&rsquo;s start the prolog interactive GUI:</p>
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-prolog" data-lang="prolog"><span style="display:flex;"><span><span style="color:#c41a16">prolog</span>
</span></span></code></pre></div><p>Then, in the Prolog GUI, load the file test.pl like so:</p>
<div class="highlight"><pre tabindex="0" style="background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-prolog" data-lang="prolog"><span style="display:flex;"><span><span style="color:#c41a16">?-</span> [<span style="color:#c41a16">test</span>].
</span></span></code></pre></div><p>Now, if you had some prolog clauses in the test.pl file, you will be
able to extract that information by querying.</p>
<p>A very simple test program that you could create is:</p></description>
</item>
</channel>
</rss>