Stephen-Gates/data-package-version.md

Data Package Version

The Data Package version format follows the Semantic Versioning specification format: MAJOR.MINOR.PATCH

Semantic Versioning helps developers manage dependencies between software packages. The version numbers, and the way they change, convey meaning about the underlying code and what has been modified from one version to the next.

~~In Data Packages this concept is applied to data.~~

The version numbers, and the way they change, convey meaning how the data package has been modified from one version to the next.

Given a Data Package version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible changes, e.g.

Change the ~~data structure~~ table schema
Change field or data package names or data package identifiers
Add, remove or re-order fields

MINOR version when you add data in a backwards-compatible manner, e.g.

Add new data to existing data resource
Add a new data resource

PATCH version when you make backwards-compatible fixes, e.g.

corrections to existing data
changes to metadata

Scenarios

You are developing your data though public consultation.~~before releasing to production.~~ Start your initial data release at 0.1.0
You release your data~~to production~~ for the first time. Use version 1.0.0
You append last months data to an existing ~~production~~ release. Increment the MINOR version number

ethanwhite · 2017-09-13T19:30:46Z

Overall this looks good, but it feels a little too softwarey to me. If it was me I'd mention the association with software versioning briefly toward the beginning and then try to focus on data specific language.

E.g.:

I wouldn't talk about "production". That won't mean a lot to many data folks (at least in the sciences).
When talking about "names or identifiers" I'd try to be specific (or give specific examples): "Change names or identifiers including file names, column headers, ..."
I'm also torn about "backwards-compatible" but I don't see an obvious improvement to suggest that doesn't get wordy

Stephen-Gates · 2017-09-23T23:47:49Z

Thanks @ethanwhite
I've ~~struck out~~ or included some text based on your suggestions.

henrykironde · 2017-09-24T08:10:10Z

Thanks @Stephen-Gates and everyone for the input. I have gone through the previous issue, and I am grad to say that we are converging to the same point.

To emphasize our current progress towards our goal of the versioning for data, we have stated that a complete data package contains data and it's resources. The resources include but not limited to the metadata info and the data specification description file which should contain the version specification.

I think @Stephen-Gates has put together a good categorization of some these cases. If we could add more detailed explanation to these cases. For example, Change the data structure may sound a bit ambiguous to some of the users who are not very familiar with the terms as applied.

Protocol as written above

Given a Data Package version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible changes, e.g.

Change the data structure
Change field or data package names or data package identifiers
Add or remove or fields

MINOR version when you add data in a backwards-compatible manner, e.g.

Add new data to existing data resource
Add a new data resource

PATCH version when you make backwards-compatible fixes, e.g.

corrections to existing data
changes to metadata

Additionally we should also think about users who take time to provide individual packaging for non packaged data. These users are also going to utilize these protocols. On a good note, some of the described version protocols will hold for both packaged and non packaged data.

Ethan mentioned,

Location is an interesting case. Certainly moving data breaks code that accesses it directly,
but the data itself doesn't change. 
Based on the software analogy I'm tempted to say that we ignore where the data is for data versioning, 
since I wouldn't bump a version moving software from GitHub to BitBucket.

My concern here is that many products or software are universally managed by package managers like apt-get, pip, Conda and a good number of software are standalone applications.

In the case of data products / packages, they are provided as services and are used or ingested based on their URL. If a URL changed, the user's tool will not be able to get this data. In the case of a software, a change from Github to bitbucket, either the download page is updated with a new link or the package manager is given an update to the current location of the source. Let me know what you think

I will compile some of the questions/concerns that I do feel that have not been answered and repost them here for more contributions.

Stephen-Gates · 2017-09-26T15:49:15Z

@henrykironde Thanks!

In the Gist:

I replaced data structure with table schema
I added re-order fields (based on a comment by @rufuspollock)

This change makes the suggestion of appending a column being a MINOR change, now a MAJOR change (as @ethanwhite originally suggested)

Based on previous discussion I thought we were ignoring location changes.

I look forward to your questions/concerns

henrykironde · 2017-09-26T16:50:22Z

@Stephen-Gates, Thanks for the updates, I totally agree with you on the URL changes after discussing the same with @zhangcandrew. We should ignore the change in the URL.

About the notifications, I am not getting any notifications from gist, but I get all the notifications from the git issues

Stephen-Gates/data-package-version.md

Select an option

No results found

Select an option

No results found

Data Package Version

Scenarios

ethanwhite commented Sep 13, 2017

Uh oh!

Stephen-Gates commented Sep 23, 2017

Uh oh!

henrykironde commented Sep 24, 2017 •

edited

Loading

Uh oh!

Stephen-Gates commented Sep 26, 2017

Uh oh!

henrykironde commented Sep 26, 2017

Uh oh!

Stephen-Gates/data-package-version.md

Data Package Version

Scenarios

ethanwhite commented Sep 13, 2017

Uh oh!

Stephen-Gates commented Sep 23, 2017

Uh oh!

henrykironde commented Sep 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stephen-Gates commented Sep 26, 2017

Uh oh!

henrykironde commented Sep 26, 2017

Uh oh!

henrykironde commented Sep 24, 2017 •

edited

Loading