Stephan Schiffels

Population Genetics – Computational Methods - Human History

Trident's restaurant - your order please?

Posted on October 4, 2023 by Stephan Schiffels Categories: blog

This post was originally published on the Poseidon Blog

Image credit by Unsplash

Today we’re releasing a new major release for trident, our command-line tool and workhorse for working with Poseidon packages.

If you haven’t checked it out yet, take a look at our documentation. With trident, you can download packages from our web-server, initialise new packages, perform validations, list summaries of packages, and - most importantly - forge new packages out of existing ones. This functionality, forge is more than a merge, and more than a subset tool. It is both. Perhaps a working analogy for what it does is a restaurant. At a restaurant, you order a specific meal, but you don’t care how the ingredients have been packaged, and with which other ingredients they are stored. You just want the chefs to assmble the meal, no matter whether the beans are stored together with the peas or not.

With forge, you just show the program where to find the storage room (that’s what we call a base-directory that contains packages), and you tell it which individuals or groups you would like to include or exclude. Forge will then automatically look through all packages and assemble exactly the set of samples that you need in your new package.

Forge comes with a kind of mini-language to specify the ingredients that you need. This mini-language is something that we have clarified and subtly improved in this new release. There are four ways to ask forge to include entities in forge:

  1. Include an individual by name: You can write <EAS006> (in angular brackets) to ask forge to look for the individual named “EAS006” (I’m glad you ask, it’s a male individual from Southern England who did 1500 years ago and was buried near a town called Eastry today.). If there are multiple versions of this individual, because there are multiple versions of packages, forge looks for this individual only on the latest version.
  2. Include an individual by name, group and package: Sometimes, there are several individuals with that same name in your package collection, perhaps because you have multiple versions of the same package, or because for some reason you have already previously forged packages all in the same collection. In such a case, there is a more specific way to define an individual, via the syntax <2022_Gretzinger_AngloSaxons-1.0.0:England_EMA:EAS006>, which now specifies a very specific individual that is found in the given package and in the given group.
  3. Include entire groups: You can write England_EMA to ask forge to look for all individuals that are associated with the group named “England_EMA” (I’m again glad you ask: EMA stands for “Early Middle Ages”). This again implicitly looks for the group only in the latest versions of packages.
  4. Include entire packages: You can write *2022_Gretzinger_AngloSaxons* to ask forge to select for all individuals in the entire package.

Note that you can combine these requests to merge them into the final forge. So for example you could say <EAS006>,<EAS001>,England_IronAge or so, which would then result in a package consisting of the two specified individuls plus all individuals from group “England_IronAge”. So this provides you with a lot of power already to select large numbers of samples in a succinct way. But sometimes you need even more power, because you want to also specifically exclude some samples. In forge, you can do that by prepending a minus-sign to any entity:

  1. Exclude an individual by name: Write -<EAS006> to ask forge to remove any sample with that name from the final selection, in case it has been selected before.
  2. Exclude an individual by name, group and package: Write -<2022_Gretzinger_AngloSaxons-0.1.0:England_EMA:EAS006> to not remove all individual with the name “EAS006”, but only the one in the specified package.
  3. Exclude an entire group: Write -England_EMA to remove all individuals that belong to that group.
  4. Exclude an entire package: Write -*2022_Gretzinger_AngloSaxons* to remove this package.

So now you can accurately describe what you want to include and exclude. Note that the order matters, as you would expect: *2022_Gretzinger_AngloSaxons*,-England_EMA,<EAS006> will contain a selection that includes <EAS006>, since it was added back at the end. In contrast, *2022_Gretzinger_AngloSaxons*,<EAS006>,-England_EMA will not include the sample, as the entire group was removed after it was added, so it got removed again.

There is one more gist: If your forge-list starts with an exclusion statement, so if you just write -*2022_Gretzinger_AngloSaxons*, then this implicitly means that you include all latest versions of packages in the base directory. So this is a way to forge everything except a given package or group or individual.

The exact semantics of this forge-machinery are quite complex and powerful, and we have clarified how this works in the latest version, and made it specifically more consistent and intuitive. This in particular concerns a feature that we now built in for the first time: You can now append versions of packages explicitly to a package-name, to select a specific version of a package. This works both in package requests, and in individuals by name, group and package.

So *2022_Gretzinger_AngloSaxons-1.2.1* works, and <2022_Gretzinger_AngloSaxons-1.2.1:England_EMA:EAS006> works as well.

There was quite a bunch of refactoring of code that needed to happen under the hood to make this update possible, in particular in relation to supporting multiple versions of packages. This update will bring us a good step closer to making archaeoenetics analyses truly reproducible, as older versions of packages will continue to be served on our servers (which we will post about separately).