«My fear is that the tasks are underestimated severely by Open Data advocats»
The space scientist Nicolas Thomas on his challenges with Open Data
On the national and the international level science is pushing towards Open Data. As noble as the principle is, the challenges for scientists are immense. Nicolas Thomas, space scientist from the University of Bern, will talk at the event «Open Data and Data Management – Issues and Challenges» on 29 October in Bern.
At the conference «Open Data and Datamanagement» you talk about issues and challenges in Space Sciences: what are the most pressing ones?
Nicolas Thomas: In Space Sciences, there are problems with resources to the job properly. We are required to adhere to rigorous standards. For example, Planetary Data Standard version 4 is just being rolled out. The definition of the standard runs to 250 pages and needs a qualified computer expert just to read it. Packaging our data in such a way that the data can be used by the community for the next 25 years requires a lot of time. The data we deliver to the European Space Agency’s Planetary Science Archive is also reviewed and changes are sometimes required if the reviewers are unhappy with the content of the package. It is not a trivial task.
In the more general cases of Open Data, the key question to ask is «why is the data set being released?» If you want a community to work with that data, then the archiving aspects are time consuming and therefore cost manpower. Badly documented data is useless or a further time sink for the producer because he/she ends up spending half their time answering emails about how to use the data.
My fear is that these tasks are underestimated severely by those people advocating Open Data. In addition, I wonder whether appropriate standards are available in all fields of research to support this. The Space guys having been dealing with this for 30 years and even they have major problems with long term support and changes in standards.
And to be honest, I personally don’t know what the goal is. That is possibly my ignorance. But just saying that data has to be open is inadequate in my view. If you say that the data to make a published plot has to be open as a text file so that someone else can replot it, well that’s fine, but almost useless in my view. So it has to be clear what the requirement is.
Could you give me an example?
I have data from a flight mission right now. CaSSIS is flying around Mars and is returning data. The raw data will be archived. Calibrated data will also be archived in the Planetary Data System. Even stereo reconstructions will be archived and all this data is open to the general public. But some of the files used to calibrate the data were derived from on-ground testing of the instrument pre-flight. Sometimes that data was in a completely different format to the flight data because the instrument was not running with the flight software – it was not ready. But, it was used to derive something that is needed for the calibration. I have no plans to archive that pre-flight data in such a way that someone could use that data. The amount of work is far outside the scope of the funding that I have received. I estimate that it would take between 1 and 2 man years to achieve that and possibly more.
Where would you set the limits for Open Data?
There are no limits if you are prepared to pay. But you will pay a lot if you want to push to the end. I think that those people insisting on whatever level of Open Data need to write down EXACTLY what they want so that this can be properly costed and evaluated through some form of cost/benefit analysis. It is the correct way.