Bruford believes that it is not yet known whether the changes reduced the error rate in the publications, because the published datasets often contain outdated gene lists. “It will take years for that to start showing up,” she says. Therefore, the State High Council recommends that researchers access the latest data from public databases. In addition, journals should encourage their authors to do so before they submit a research paper to them.
Since the beginning of 2021, Ziemann has published a monthly ranking list of journals in violation of the regulations. These often include well-known titles such as “Nature Communications,” “eLife,” “PLOS Genetics,” and “Scientific Reports.” Ziemann suspects that this is due to the fact that articles published in these journals often contain gene lists and more comprehensive data sets in the appendix.
Spreadsheets: Avoid them altogether or use them
According to Ziemann, the conceivable approach is to do without spreadsheet software entirely for publishing. The problem does not appear even in some programs – such as the open source versions LibreOffice and Gnumeric. In general, however, it is difficult to verify the output of the spreadsheets. And if there is a problem, it is not easy to tell where it is happening; After all, there is no documentation of any kind for the steps the program handles, he says.
Some computer biologists rely on scripting languages like Python and R, which don’t automatically correct gene names, Ziemann says — the error can then be traced back to the source. However, users will need to be proficient in these computer languages in order to be able to write code to analyze the data. For example, Purdy says she doesn’t have time for that. She’s used to Excel’s quirks: you repeatedly insert apostrophes in front of affected genes to prevent conversion, or format tables before importing data. “That’s just one of the things you have to live with.”
Bruford believes that Excel’s autocorrect issues are unlikely to be put off in the near future. “We are a small group of users of the program compared to all users of Excel,” she says. Microsoft has never said that it might modify its program in the future to serve the genetic research community. So if you continue to work with problematic programs, Ziemann recommends doing at least a short check before passing or publishing data. Sorting the data column by gene abbreviations may be sufficient to detect incorrectly transformed data.
Communicator. Reader. Hipster-friendly introvert. General zombie specialist. Tv trailblazer