By Rachael Skyner (link)
Topic intro to be written.
I’m lucky enough to often be tasked with strange (for a comp chemist) infrastructure work at diamond. Although this isn’t my favourite work, it definitely has a great positive impact on the users, and gets me more praise than any method development work I do., so that’s a bonus.
The most recent request I had was to write an automated pipeline to take plates (with crystals hopefully in them) imaged by the newly-installed formulatrix imagers in the research complex, and run them through a ranking algorithm, known as ranker, that attempts to rank the probability that an image will contain a crystal (a score of 0.5 or more means you’re likely to have a crystal). Once this is done, a GUI (TeXRank) serves the images up to the user in their scored order.
The aim of the pipeline is to take the images from the imagers (which are recorded in a database in RockMaker, after some very simple user input to run the imaging process), run all of the image transfer and ranking steps behind the scene, and finally e-mail the XChem team with a set of instructions as to how they could view their ranked images in TeXRank for further analysis.
The main component of such a pipeline is the scheduling of tasks. There are many options available for task scheduling (google it), but I chose to use luigi as it is easy to use, open-source and well documented (and I had already used it before, to be honest). Luigi works on the basis of writing simple tasks as classes within python, that each have a requirement (another task), an output (e.g. a file on the filesystem) that signifies the class as done, and a run method that runs the task to produce the task output. Using the requirement definition, it becomes very easy to put together a coherent pipeline. Additionally, luigi also has a visualisation package that comes for free and allows you to visualise the tasks with their dependencies, and their status.
Topic intro to be written.
I’m lucky enough to often be tasked with strange (for a comp chemist) infrastructure work at diamond. Although this isn’t my favourite work, it definitely has a great positive impact on the users, and gets me more praise than any method development work I do., so that’s a bonus.
The most recent request I had was to write an automated pipeline to take plates (with crystals hopefully in them) imaged by the newly-installed formulatrix imagers in the research complex, and run them through a ranking algorithm, known as ranker, that attempts to rank the probability that an image will contain a crystal (a score of 0.5 or more means you’re likely to have a crystal). Once this is done, a GUI (TeXRank) serves the images up to the user in their scored order.
The aim of the pipeline is to take the images from the imagers (which are recorded in a database in RockMaker, after some very simple user input to run the imaging process), run all of the image transfer and ranking steps behind the scene, and finally e-mail the XChem team with a set of instructions as to how they could view their ranked images in TeXRank for further analysis.
The main component of such a pipeline is the scheduling of tasks. There are many options available for task scheduling (google it), but I chose to use luigi as it is easy to use, open-source and well documented (and I had already used it before, to be honest). Luigi works on the basis of writing simple tasks as classes within python, that each have a requirement (another task), an output (e.g. a file on the filesystem) that signifies the class as done, and a run method that runs the task to produce the task output. Using the requirement definition, it becomes very easy to put together a coherent pipeline. Additionally, luigi also has a visualisation package that comes for free and allows you to visualise the tasks with their dependencies, and their status.
Comments
Post a Comment