canSAS-XI/DataFormats: Difference between revisions

From canSAS
No edit summary
 
(5 intermediate revisions by 2 users not shown)
Line 9: Line 9:
Pete - post issues on cansas examples github
Pete - post issues on cansas examples github


Pete - request for more examples of NXcanSAS files: https://github.com/canSAS-org/NXcanSAS_examples/issues/3#issuecomment-509619529


Reduced data from multiple detectors ....
- Is this a general problem
- Jeff : looking at how to handle this for VSANS
- Issue for both raw and reduced : raw easier
- How to convert?
- AJJ : question is of multiple 2D q-spaces rather than detector spaces.
- Tim : multiple physical detectors vs moving detector
- AJJ : multiple SASdata entries in a single SASentry
- Jan : how to get software to read complex datasets (e.g. lots of mag fields with many detectors)
- AJJ : discussion of complexity ... some of the complex analyses will require the users to understand their data. Perhaps plugins in e.g. DPDAK?


- Is there a standard location for motors etc. Not really ... each instrument/facility does it differently.
==== Reduced data from multiple detectors ====
  - can we have a standard set of metadata tags for NXcanSAS to support
* Is this a general problem?
  - how about more detailed application definitions?
* Jeff : looking at how to handle this for VSANS at NIST
* Issue for both raw and reduced : raw easier
- Pete - search methods, specify search path.
* How to convert from raw to reduced? Very instrument dependent. Choice of detector combination to Q space varies with instrument and experiment and whether 1D or 2D
- Adam : SESANS data - should create equivalent of SASData for SESANS data.
** AJJ : question is of multiple 2D q-spaces rather than detector spaces.
- Brian : how to include multi-modal data - probably at the same level as a SASentry
** Tim : multiple physical detectors vs moving detector
- Detectors : how do we link detector metadata to the q-spaces derived from them. e.g. multiple detector configurations might be used to give a single Q space. This is true for 1D as well.
** AJJ : multiple SASdata entries in a single SASentry
- Stitching : discussion of stitching. How is it automated.  
** Jan : how to get software to read complex datasets (e.g. lots of mag fields with many detectors)
** AJJ : discussion of complexity ... some of the complex analyses will require the users to understand their data. Perhaps plugins in e.g. DPDAK?


Discussion of dataset representation in python
* Sven-Jannik : Is there a standard location for motors etc. Not really ... each instrument/facility does it differently.
  - xarray - not fully featured.
** can we have a standard set of metadata tags for NXcanSAS to support
  - scippy (c++ version of xarray for mantid) - reinventing numpy! not what we are looking for here.
* Armin : how about more detailed application definitions?
  - DAWN has something
* Pete : Application definitions tend to be generic - can use search methods, specify nexus path on data loading as one way. Similar to translation dictionaries.
  - nexpy supports network streaming / reading online rather than having to load whole dataset
* Brian : how to include multi-modal data - probably at the same level as a SASentry
* Detectors : how do we link detector metadata to the q-spaces derived from them. e.g. multiple detector configurations might be used to give a single Q space. This is true for 1D as well.
* Stitching : discussion of stitching - U-Ser was interested in how we at SANS beamlines manage stiching - automated or manual. Explained that it still requires user input with expert help to define data combination routine and choices of overlap/data cutoff etc.


Multiple detectors - multiple kinds of detector (Tim)
* Adam : SESANS data - what to do? Not I vs Q, but is small angle scattering. Should create equivalent of SASData for SESANS data. This allows for combined SANS/SESANS experiments to write one set of sample metadata and two datasets to file - similar to having multiple SASdata entries for eg USAXS+SAXS+WAXS
- e.g. optical camera + x-ray detector working at different frequencies. More frames from one than the other
- SAXS + WAXS at DESY - just write two nexus files to avoid the problem. Server issues and I/O
- I22 - timeframe generation (one file) + two separate detector files. Acquisition gives you a header nexus file linked to the others. Currently looking at stacking in the right order in data structures - slowest at top.
- ESS - all timestamped ...
- Pete - timestamping at synchrotrons will be coming. XPCS can use this.
- Brian : Visualisation
      - Time slider bars - showing last collected image.


Metadata for Machine Learning
==== Discussion of dataset representation in python ====
- Datasets generated by Diamond should be put online (3Tb issue!)
Need dataset representation that maintains information about units and uncertainties and can ideally support other metadata.
- Metadata from sample is required.
* xarray - not fully featured.
- XPDF users need the information about the sample. (same for liquids neutron diffraction).
* scippy (c++ version of xarray for mantid) - reinventing numpy! not what we are looking for here.
- Need to move on from basics (Pete) - don't know what we need to know to feed the machine learning algorithms.
* DAWN has the java based dataset representation that supports uncertainties and units.  
- What are the users going to give us?
  - Very little ...
  - Use header nexus file with link to raw detector data. Basically digital logbook.
  - Pete - shouldn't limit metadata for edge cases. Need to have the metadata possible and turn off as needed.
  - Pete - automated addition of data from proposal/safety form
  - Brian - adding analyte composition and density etc.
  - Pete - issue with NXsample : requires chemical formula, not composition.
  - Some are easy for low throughput. Need an efficient way of entering the information.  
  - Should look at what the MX do.
  - Needs to be end-to-end. Has to be a benefit to user.
  - Machine learning : Pete - need some samples with bad metadata to teach algorithm!


Large datasets are a problem - example of tomography/imaging where can't read whole file into memory. So need lazy data loading.
* nexpy supports network streaming / reading online rather than having to load whole dataset
==== Multiple detectors - multiple kinds of detector (Tim) ====
Not only do we have multiple detectors, but we might have multiple types of detector running at different frame rates e.g. optical camera + x-ray detector working at different frequencies. More frames from one than the other
* SAXS + WAXS at DESY - just write two nexus files to avoid the problem. Server issues and I/O
* I22 - timeframe generation (one file) + two separate detector files. Acquisition gives you a header nexus file linked to the others. Currently looking at stacking in the right order in data structures - slowest at top.
* ESS and other spallation sources - all timestamped so do post-hoc correlations.
* Pete - timestamping at synchrotrons will be coming. XPCS can use this.
* Brian : Visualisation
** Time slider bars - showing last collected image.
==== Metadata for Machine Learning ====
* Datasets generated by Diamond for machine learning project should be put online for others to use (3Tb issue!)
* Metadata from sample is required for most machine learning applications that would be useful to users.
* XPDF users need the information about the sample to be able to process data at all (same for liquids neutron diffraction).
* Need to move on from basics such as simple dimensions etc (Pete) - however, we don't know what we need to know to feed the machine learning algorithms.
* What are the users going to give us?
** Very little ... without encouragement
** Use header nexus file with link to raw detector data. Basically digital logbook.
** Pete - shouldn't limit metadata for edge cases. Need to have the metadata possible and turn off as needed.
** Pete - automated addition of data from proposal/safety form
** Brian - adding analyte composition and density etc.
** Pete - issue with NXsample : requires chemical formula, not composition.
** Some are easy for low throughput. Need an efficient way of entering the information.
** Should look at what the MX do.
** Needs to be end-to-end. Has to be a benefit to user.
** Machine learning : Pete - need some samples with bad metadata to teach algorithm!


=== Actions ===
=== Actions ===
- Identify the right location for data sets and upload to Zenodo and/or github.
* Implementation of a SESANSdata group equivalent to SASdata : '''Adam W to develop proposal'''
 
* Upload example datasets to Zenodo and/or github. (Relevant github issue here : https://github.com/canSAS-org/NXcanSAS_examples/issues/3#issuecomment-509619529) : '''Action for all canSAS who have data in NXcanSAS format'''


- Linking of detector metadata to sasdata entries.
* Linking of detector metadata to sasdata entries. '''Jeff K to develop proposal'''


- Writing of notes with examples for NXcanSAS usage.
* Writing of notes with examples for NXcanSAS usage. Should be included in NXcanSAS documentation or on cansas.org. Needs to be more than just developer docs : '''Action for all canSAS who have examples of how they have used NXcanSAS for specific cases'''.


- Build a list of suggested metadata with standard names. Including how to build sample descriptions. '''Brian P to write a proposal for sample relevant metadata and to circulate to Data Formats group'''
* Build a list of suggested metadata with standard names. Including how to build sample descriptions. '''Brian P to write a proposal for sample relevant metadata and to circulate to Data Formats group'''

Latest revision as of 15:26, 9 July 2019

Presentation

Session Notes

Jan : real world examples of nxcanSAS from facilities needed. Tim suggests everyone putting them on Zenodo to get DOIs Pete - post issues on cansas examples github

Pete - request for more examples of NXcanSAS files: https://github.com/canSAS-org/NXcanSAS_examples/issues/3#issuecomment-509619529


Reduced data from multiple detectors

  • Is this a general problem?
  • Jeff : looking at how to handle this for VSANS at NIST
  • Issue for both raw and reduced : raw easier
  • How to convert from raw to reduced? Very instrument dependent. Choice of detector combination to Q space varies with instrument and experiment and whether 1D or 2D
    • AJJ : question is of multiple 2D q-spaces rather than detector spaces.
    • Tim : multiple physical detectors vs moving detector
    • AJJ : multiple SASdata entries in a single SASentry
    • Jan : how to get software to read complex datasets (e.g. lots of mag fields with many detectors)
    • AJJ : discussion of complexity ... some of the complex analyses will require the users to understand their data. Perhaps plugins in e.g. DPDAK?
  • Sven-Jannik : Is there a standard location for motors etc. Not really ... each instrument/facility does it differently.
    • can we have a standard set of metadata tags for NXcanSAS to support
  • Armin : how about more detailed application definitions?
  • Pete : Application definitions tend to be generic - can use search methods, specify nexus path on data loading as one way. Similar to translation dictionaries.
  • Brian : how to include multi-modal data - probably at the same level as a SASentry
  • Detectors : how do we link detector metadata to the q-spaces derived from them. e.g. multiple detector configurations might be used to give a single Q space. This is true for 1D as well.
  • Stitching : discussion of stitching - U-Ser was interested in how we at SANS beamlines manage stiching - automated or manual. Explained that it still requires user input with expert help to define data combination routine and choices of overlap/data cutoff etc.
  • Adam : SESANS data - what to do? Not I vs Q, but is small angle scattering. Should create equivalent of SASData for SESANS data. This allows for combined SANS/SESANS experiments to write one set of sample metadata and two datasets to file - similar to having multiple SASdata entries for eg USAXS+SAXS+WAXS

Discussion of dataset representation in python

Need dataset representation that maintains information about units and uncertainties and can ideally support other metadata.

  • xarray - not fully featured.
  • scippy (c++ version of xarray for mantid) - reinventing numpy! not what we are looking for here.
  • DAWN has the java based dataset representation that supports uncertainties and units.

Large datasets are a problem - example of tomography/imaging where can't read whole file into memory. So need lazy data loading.

  • nexpy supports network streaming / reading online rather than having to load whole dataset

Multiple detectors - multiple kinds of detector (Tim)

Not only do we have multiple detectors, but we might have multiple types of detector running at different frame rates e.g. optical camera + x-ray detector working at different frequencies. More frames from one than the other

  • SAXS + WAXS at DESY - just write two nexus files to avoid the problem. Server issues and I/O
  • I22 - timeframe generation (one file) + two separate detector files. Acquisition gives you a header nexus file linked to the others. Currently looking at stacking in the right order in data structures - slowest at top.
  • ESS and other spallation sources - all timestamped so do post-hoc correlations.
  • Pete - timestamping at synchrotrons will be coming. XPCS can use this.
  • Brian : Visualisation
    • Time slider bars - showing last collected image.

Metadata for Machine Learning

  • Datasets generated by Diamond for machine learning project should be put online for others to use (3Tb issue!)
  • Metadata from sample is required for most machine learning applications that would be useful to users.
  • XPDF users need the information about the sample to be able to process data at all (same for liquids neutron diffraction).
  • Need to move on from basics such as simple dimensions etc (Pete) - however, we don't know what we need to know to feed the machine learning algorithms.
  • What are the users going to give us?
    • Very little ... without encouragement
    • Use header nexus file with link to raw detector data. Basically digital logbook.
    • Pete - shouldn't limit metadata for edge cases. Need to have the metadata possible and turn off as needed.
    • Pete - automated addition of data from proposal/safety form
    • Brian - adding analyte composition and density etc.
    • Pete - issue with NXsample : requires chemical formula, not composition.
    • Some are easy for low throughput. Need an efficient way of entering the information.
    • Should look at what the MX do.
    • Needs to be end-to-end. Has to be a benefit to user.
    • Machine learning : Pete - need some samples with bad metadata to teach algorithm!

Actions

  • Implementation of a SESANSdata group equivalent to SASdata : Adam W to develop proposal
  • Linking of detector metadata to sasdata entries. Jeff K to develop proposal
  • Writing of notes with examples for NXcanSAS usage. Should be included in NXcanSAS documentation or on cansas.org. Needs to be more than just developer docs : Action for all canSAS who have examples of how they have used NXcanSAS for specific cases.
  • Build a list of suggested metadata with standard names. Including how to build sample descriptions. Brian P to write a proposal for sample relevant metadata and to circulate to Data Formats group