musings

An Hourglass representing Research and Technology

The programming we choose to do in our team is like the neck in the hourglass representing life science research and technology:

There are many grains of sand above us. Those represent all the software tools developed in life science research.
There is a large void below us. This represents the need for widely applicable tools in the life sciences.
At the bottom there are also grains of sand. They are well settled. These represent the current technology: commercially available tools and well-serviced open source packages.

How does the sand get from the top to the bottom? Via the neck of development. It is narrow; only few academic tools make it to the bottom. The flow through the neck is powered by:

push: a few academic groups that have the capability and capacity to make their tools available
pull: a few companies that look far ahead and are able to see and use the potential of an academic tool

In our hourglass of life science tools, new sand is being added at the top all the time. And most of it overflows the beaker after a while. Some tools never deliver what the author thought they would do. Some are made to solve a single problem and rightfully abandoned when that is done. But many tools are published and left as orphans. Only a selection of tools that promise to be useful for a larger audience ever make it to the neck.

In practice, the neck is too narrow. There are many more valuable tools than are taken up. A team like ours can help to make the neck larger by making existing research tools applicable for wider use as a service to life scientists with a clear need (we call it professionalization). But it is sometimes hard to convince funding parties to pay for this. It is also hard to convince researchers to work on making their software better: professionalization does not generate new high-impact papers. We work on convincing the funding parties that it is better to professionalize existing successes than to reinvent them using research money. And we work on convincing the scientists that professionalization of their output will lead to higher citation scores on their existing publications.

Science wants novelty. And the current Dutch finance climate is directed towards applied science, towards innovation in society. Look at the picture, and you can see that these are hard to combine. Innovation starts where novelty ends. The only way to make the combination is to include development.

Photo by graymalkn on flickr

Breaking silence?

If books mention “breaking silence” less frequently, does that mean that we have more silence now because it is no longer broken?

Can a clock with Poisson counting statistics replace your watch?

Thinking about Poisson statistics I was wondering how much use a clock would be that would count the seconds with counting statistics or Poisson statistics. That is: it has the irregularity of radio-active decay. Some people put their watch a few minutes ahead of the true time to make sure that they are never too late, but the problem is that they can start counting on that after a while. A true random clock might be unreliable enough that you would need to always stay carefully ahead….

Commodies are not free

Computer infrastructure used in universities is not part of a market, let alone of a "transparent market" in which everyone has a clear view on what alternatives exist and what their relative merits and costs are.

Nobody in a university research group finds it strange to pay for pens and paper.

Nobody in a research group finds it strange to pay for state-of-the art lab equipment.

But very often computer services have been offered for free. Like water, and electricity, they have been discounted into general costs of running the university.

This situation is unsustainable in a world in which life-science research becomes driven by big data. And it also becomes unsustainable in a world where large storage and computer infrastructure suitable for routine jobs can be rented commercially.

The sustainable way to the future is to properly budget for data handling and storage. Budgeting for computing needs means people are required to balance cost and value, like with every other aspect of a research project.

Photo: CC-BY-SA-NC on Flickr by John Flinchbaugh

Decision tree for scientific programmers in bioinformatics

This is one of the syndromes we're trying to fight in BioAssist...

Fight or flight reactions to the cost of computing

Wolf chasing rabbit Some of the computing services at universities become paid services. And the primary reaction in the science groups often is a fight because the realistic costs of operating the existing infrastructure are high. And if the fight does not work, there is a flight towards running decentralized infrastructure. This can look cheaper but maintenance and incident control are rarely accounted for.

We will need good documentation to convince people of the true costs of the alternatives. It is such a waste if the rare time of good bioinformatics experts is spent on inefficient server management.

Photo: CC-BY-SA, Hollingsworth on flickr.com

Five star rating your own photos

Have you ever been wondering how to use the five stars in your photo catalog? I’ve heard people say: there are only two kinds of pictures: pictures you could show to someone, and pictures you wouldn’t show to anyone. Isn’t choosing between zero and one star enough?

Good experience with tri-lingual education

My native language is Dutch. My wife’s is French. We both speak each other’s language fluently. But we originally met each other in Germany, and have always been speaking English together.

Het is smerig werk maar iemand moet het doen

Niet iedereen houdt van hetzelfde soort werk. Dit erkennen is een geweldige kans voor mensen die bij een groep betrokken zijn, als de communicatie over wat iedereen leuk en niet leuk vindt open is.

Ik hoorde ooit een angstaanjagend verhaal van een oud stel dat hun 40-jarig huwelijk vierde. Als een van de onderdelen van de viering maakten ze de afspraak dat ze een keer zouden communiceren over iets wat ze niet leuk vinden aan de ander. De man begint: "mijn liefste, ik hou van bijna alles wat je voor me doet. Maar als er één ding is dat ik zou willen veranderen, is het de manier waarop je het brood snijdt. Je serveert me altijd de hielen van het brood, en ik veel liever een normale boterham". Hierop viel de vrouw bijna flauw. Ze had haar man 40 jaar lang de hielen van het brood gegeven, alleen omdat dit het deel is dat ze zelf het lekkerst vindt.

In elke onderneming is het belangrijk dat alle klussen worden geklaard, ook de vuile. Soms moeten we gewoon dingen doen die we niet leuk vinden. Maar in een gezamenlijke inspanning is het belangrijk om te communiceren over welke delen van het werk we wel en niet leuk vinden. Misschien, heel misschien, wil een collega graag een deel van het werk overnemen dat je verafschuwt.

Is my software any good?

If you are not getting any user feedback for your software, there are two possible reasons.

It is bad. Nobody uses it.
It is good. Everyone is happy.

If this happens to you, think back. Did you ever get any feedback before? How did you react?

Did you listen to your users and fix their problems?
Did you teach your users your way to use your software?

By answering these two questions you can figure out for yourself why you no longer get feedback. If you listened, and the stream of questions stopped, this probably means the users are now happy. If you attempted to correct their usage, most likely nobody uses it any more.

You did remember to include your contact details, did you?

Is there a need for a research data management specialism?

Fire damaged chemical lab Hiroshima; from Wikimedia; public domain

Yes!

There is an interesting difference between how risks are often approached in a research lab where a lot of data is handled and in a chemical lab. Many people working with data regularly encounter problems like not being able to locate data quickly or not being able to reproduce results exactly, but do often they think these problems are an integral consequence of working with large amounts of data, and do not recognize these are problems with the data management practice and preparation. The equivalent in a chemical lab would be researchers thinking that daily fires and explosions naturally belong to working with chemical compounds, rather than recognizing these as a consequence of bad lab practice and bad preparation.

There is also resistance to the uptake of a data management specialism because many researchers think that data management is relatively easy. Everybody has a computer at home, and many maintain photo libraries. However, this experience does not directly translate into work with large amounts of data in the lab:

Data in the lab is often 1-3 orders of magnitude larger than a photo library at home. A maintenance job that costs an hour for photolibrary would translate into more than 6 months of work in a large data-intensive project. Because of this, there is really a need for different approaches.
Data in a photo library consists of JPG files and maybe RAW files, and these files have simple 1 to 1 relationships. In the lab there are many more different kinds of data, and the relationships are much more complex.
A photo library is usually maintained by a single person. In the lab, the same data is worked on by different people, and they must each be aware of everything that is done by the others.

And in fact, even in a photo library at home one can not always quickly find what one is looking for.

It is a dirty job but somebody has got to do it

Not everybody likes the same kind of work. Recognizing this is a great opportunity for people involved in a group, if communication about what everyone likes and dislikes is open.

I once heard a terrifying story of an old couple that celebrated their 40th wedding anniversary. As one of the parts of the celebration they made the appointment that they would communicate for once about something they dislike about the other. The man starts: "my dearest, I love almost everything you do for me. But if there is one thing that I would like to change it is the way you cut the bread. You always serve me the heels of the bread, and I much prefer a normal center slice". Upon this, the woman nearly fainted. She had served her husband the heels of the bread for 40 years, only because this is the part she likes best.

In any enterprise it is important that all jobs are done, including the dirty ones. Sometimes we just have to do things we do not like. But in a collaborative effort, it is important to communicate about which parts of the work we like and dislike. Maybe, just maybe, a colleague would love to take over part of the work that you despise.

Less than impressive view from Fyra

Those sound-walls help to keep the neighborhood of the high-speed trains livable, but the view from the new train itself is uninteresting. #trip

Lets go build some obsolete tools.... and prevent being blamed.

One of the first stages in the development of a new tool (software or hardware) is a functional specification. The functional specification matures in discussions between the developers (department of R&D) and the customer representatives (often the departments of marketing and sales).

Of course a functional specification is useful: it is very hard to develop something new without an idea of how the new tool will be used and what it will be compared with. However, defining a functional specification can also be taken too far. In some organizations, the functional specifications are spelled out in the tiniest details. At the end of a long formal procedure, the book of specifications is signed like a contract between marketing and development. The development can only be started when the list of signatures is complete and will be performed in splendid isolation from the world of potential users. Why do organizations do specifications this way? It is often an attempt to separate the responsibilities of the departments, so that if anything fails the appropriate party can be blamed. If the final product does not meet the functional specifications, this can be blamed on the developers. And if the product does not succeed even though it meets all the functional specifications, this will be blamed on marketing.

I have a serious problem with this approach: using this procedure, how can one ensure that the product will be useful? After two years without interaction, development may produce exactly what marketing asked for (making all deadlines and within budget), but the market has changed and does no longer need the designed product. Or technology has changed, and better specifications would have been within reach and are offered by the competition. Or maybe marketing truly made a mistake, and asked for something the world is not waiting for. In these cases, clearly the development department can not be blamed! But if you develop an obsolete product this way, where does that leave the organization as a whole? And if this is not the best solution for the organization as a whole, will it be good for the development department? Even though everyone did exactly what was expected, people may be laid off because development costs can not be recovered.

The solution is, as often, to keep a middle road. Using e.g. Agile or LEAN development methods, developers can stay in constant communication with marketing. Iterative and modular design procedures can be used to verify that the new tool does what it should, without relying on the capacity of people to describe specifications in words beforehand. And because the communication with the market is not lost during the development process, the tool will have a significantly higher chance of actually being useful at the moment of introduction.

Image (by QuiteLucid on Flickr): "a camel is a racing horse designed by a committee".

My priority is higher than yours....

Axe in a block of wood When more than one task must be completed, it is always better to do one before the other. I discussed that earlier in the post Parallel processing in a project team.

An individual customer with a new question could think that it is even quicker if you fit his project in between. We have to make them aware that if we allow interruptions of any kind in an agile sprint, nothing will ever be completely done. What can we tell them? There will be other people with new requests while we are working on your task.... should we honor their requests to interrupt our work too?

They should understand: agile sprints are not interrupted. They are either completed, or canceled.

Image from Flickr by brittgow

Parallel processing in a project team

How important is it to prioritize?

Lets assume we have 4 ideal projects A, B, C, D, and an ideal project team. Each of these projects takes 3 months to complete. They are all equally important. We work on them in parallel. When will the projects be ready? Project A will be ready after 12 months. Project B will be ready in 12 months. Project C will be ready in 12 months. And Project D will be ready in 12 months.

Now, what happens when we prioritize and work on the projects in alphabetical order? When will the projects be ready now? Project A will be ready after 3 months. Project B will be ready after 6 months. Project C will be ready after 9 months. And project D will be ready after 12 months. Everyone except the customer of project D will be better off! Fantastic, isn't it?

Unfortunately, this is not the perception of your customers. Each of them sees a 3 month project and wants it finished in 3 months. The customer in project D does not want to hear "we will start on your project in 9 months". All too often, priorities therefore change. Every month, at a project team review, you will be forced to reprioritize. What is the effect? Project A will be ready after 9 months. Project B will be ready after 10 months. Project C will be ready after 11 months. Project D will be ready after 12 months.

What a waste. Don't fall into this trap. Prioritize, and stick with the choice.

[Image credit: goldstardeputy]

Pay attention to your customers

At the RAMIRI training (Realising and Managing International Research Infrastructure) I followed in Trieste this week, one sentence particularly resonated with me. It was by Kimo Koski (CSC Finland) who was remarking that it takes effort to keep an organization customer focused as it grows.

Starting at about 100 people an organization can keep it self fully busy without ever serving a customer.

While I was working for Bruker AXS, our Sales director Paul Ulrich Pennartz used to say something related sometimes when he was in a cynical mood:

Without those pesky customers we would finally be able to concentrate on our work.

If you've ever wondered why small companies are more responsive than large ones, I think these two people summarize the cause quite well.

(Note: both quotes were reproduced in essence, this is not literally what they said)

Please do ask questions at a lecture, except...

Via twitter, I saw a very cynical remark about asking questions after a scientific lecture with a flow diagram discouraging most people to ask anything at all. This does not at all correspond to my experience organizing symposia and conferences. Most of the time, questions are very welcome, and people are way too shy to share their visions. I therefore made a rebuttal in the form of the following flow diagram which I think is a better representation of the line of thought to follow.

Radical change rarely brings immediate improvement

After every radical change in an organization, there is a need for a phase of quiet thoughtful improvements. Expecting miracles from huge corporate reorganizations is a fallacy that leads to reorganization upon reorganization potentially resulting in complete destruction of the organization.

Have you ever played a game of Pac-man? It is a simple game where you control a little eater eating dots on the screen, while ghosts are chasing you. The game is an excellent mirror of business life in a changing environment:

In Pac-man, you are trying to improve your health at every step by eating a dot and staying out of the way of the ghosts.
In business life, you are making small changes to your products and procedures to sell more and stay out of the way of your competitors.

There is a further analogy:

In Pac-man, sometimes things get stuck. Ghosts are closing in from all sides, and there is no escape. At such a point, you can use the teleport: a panic key that takes you to a random spot in the scene in an instant.
In business life, sometimes things get stuck. Competitors are closing in and it seems there is no way out. At such a point the CEO will call (often quickly without consulting all those that are involved) for a radical reorganization.

In business there is an important lesson we can learn from the teleport feature in Pac-man: A teleport is far from a guaranteed save! It can bring you into a very dangerous situation. The goal of the teleport is not an immediate improvement in the flow of the game, it is to escape from a hopelessly stuck situation, from impending disaster. Directly after a teleport, you have to act and make steps to regain control. Similarly, in business a radical reorganization will rarely take you to a better situation immediately. A reorganization is meant to shake up the bowl and escape from a hopelessly stuck situation (often invisible to many of the employees). After the relatively thoughtless jump that has to be executed quickly to avoid an immediate game over, the organization will need to go into a thoughtful phase in which small improvements are made to optimize the situation.

If you realize that a reorganization has not brought you immediate gains, try to refrain from making further reorganizations. Instead, look for opportunities for small changes, and give it some time.

Tell me what it is, not how you use it!

Regarding research data management I have been telling people the importance of describing a data set as "what it is" as opposed to "how you use it".

An early example that was given to me by a biobanking expert in The Netherlands was in the description of a chest X-ray: most likely such an image can be used both to study the bone in the spine, and to study the state of the major arteries (e.g. the aorta). If such an X-ray is acquired, it is likely that it is for one purpose only, but that does not exclude a re-use in another field. To optimize the re-usability of the data (see the FAIR principles), a chest X-ray should be labeled "chest X-ray" and not "X-ray of the spine" even if it was acquired for that specific goal.

I think this is similar to a "book cupboard". Most "book cupboards" are actually "book-size shelves". Those shelves can be used to store books, but can also used for other purposes. To optimize the Findability of the right storage solution in the shop, it would be useful if the label would not express a single use.

Recently I heard yet another very good example of the same principle in an episode of the SE-radio podcast: Function names when writing computer code. It really improves the human readability of computer software (and hence the maintainability) if each function is named after what it does, rather than how it is used. The example from the podcast: do not name the function "reformat_email", but name it "remove_double_newlines" if that is what it does.

I've heard someone say "Researchers are the worst judges on the possibilities for re-use of their own data". It is true: a researcher studying the aorta will not even see the spine on their own X-ray images, let alone think about ways in which the data can be reused by bone researchers. I think labeling a data set with how it is used is a consequence of this. A trained librarian/archivist, with training in classification systems, will quickly see through such a mistake and suggest better naming.