canSAS-XI/DataFormats: Difference between revisions

From canSAS
No edit summary
Line 26: Line 26:
** can we have a standard set of metadata tags for NXcanSAS to support  
** can we have a standard set of metadata tags for NXcanSAS to support  
* Armin : how about more detailed application definitions?
* Armin : how about more detailed application definitions?
* Pete : Application definitions tend to be generic - can use search methods, specify nexus path on data loading as one way. Similar to translation dictionaries.
- Pete - search methods, specify search path.
* Brian : how to include multi-modal data - probably at the same level as a SASentry
- Adam : SESANS data - should create equivalent of SASData for SESANS data.
* Detectors : how do we link detector metadata to the q-spaces derived from them. e.g. multiple detector configurations might be used to give a single Q space. This is true for 1D as well.
- Brian : how to include multi-modal data - probably at the same level as a SASentry
* Stitching : discussion of stitching - U-Ser was interested in how we at SANS beamlines manage stiching - automated or manual. Explained that it still requires user input with expert help to define data combination routine and choices of overlap/data cutoff etc.
- Detectors : how do we link detector metadata to the q-spaces derived from them. e.g. multiple detector configurations might be used to give a single Q space. This is true for 1D as well.
 
- Stitching : discussion of stitching. How is it automated.  
* Adam : SESANS data - what to do? Not I vs Q, but is small angle scattering. Should create equivalent of SASData for SESANS data. This allows for combined SANS/SESANS experiments to write one set of sample metadata and two datasets to file - similar to having multiple SASdata entries for eg USAXS+SAXS+WAXS


==== Discussion of dataset representation in python ====
==== Discussion of dataset representation in python ====
  - xarray - not fully featured.
Need dataset representation that maintains information about units and uncertainties and can ideally support other metadata.
  - scippy (c++ version of xarray for mantid) - reinventing numpy! not what we are looking for here.
* xarray - not fully featured.
  - DAWN has something
* scippy (c++ version of xarray for mantid) - reinventing numpy! not what we are looking for here.
  - nexpy supports network streaming / reading online rather than having to load whole dataset
* DAWN has the java based dataset representation that supports uncertainties and units.
 
Large datasets are a problem - example of tomography/imaging where can't read whole file into memory. So need lazy data loading.
* nexpy supports network streaming / reading online rather than having to load whole dataset


==== Multiple detectors - multiple kinds of detector (Tim) ====
==== Multiple detectors - multiple kinds of detector (Tim) ====
- e.g. optical camera + x-ray detector working at different frequencies. More frames from one than the other
Not only do we have multiple detectors, but we might have multiple types of detector running at different frame rates e.g. optical camera + x-ray detector working at different frequencies. More frames from one than the other
- SAXS + WAXS at DESY - just write two nexus files to avoid the problem. Server issues and I/O
* SAXS + WAXS at DESY - just write two nexus files to avoid the problem. Server issues and I/O
- I22 - timeframe generation (one file) + two separate detector files. Acquisition gives you a header nexus file linked to the others. Currently looking at stacking in the right order in data structures - slowest at top.
* I22 - timeframe generation (one file) + two separate detector files. Acquisition gives you a header nexus file linked to the others. Currently looking at stacking in the right order in data structures - slowest at top.
- ESS - all timestamped ...  
* ESS and other spallation sources - all timestamped so do post-hoc correlations.
- Pete - timestamping at synchrotrons will be coming. XPCS can use this.
* Pete - timestamping at synchrotrons will be coming. XPCS can use this.
- Brian : Visualisation
* Brian : Visualisation
      - Time slider bars - showing last collected image.
** Time slider bars - showing last collected image.


==== Metadata for Machine Learning ====
==== Metadata for Machine Learning ====
- Datasets generated by Diamond should be put online (3Tb issue!)
* Datasets generated by Diamond for machine learning project should be put online for others to use (3Tb issue!)
- Metadata from sample is required.
* Metadata from sample is required for most machine learning applications that would be useful to users.
- XPDF users need the information about the sample. (same for liquids neutron diffraction).
* XPDF users need the information about the sample to be able to process data at all (same for liquids neutron diffraction).
- Need to move on from basics (Pete) - don't know what we need to know to feed the machine learning algorithms.
* Need to move on from basics such as simple dimensions etc (Pete) - however, we don't know what we need to know to feed the machine learning algorithms.
- What are the users going to give us?
* What are the users going to give us?
  - Very little ...  
** Very little ... without encouragement
  - Use header nexus file with link to raw detector data. Basically digital logbook.
** Use header nexus file with link to raw detector data. Basically digital logbook.
  - Pete - shouldn't limit metadata for edge cases. Need to have the metadata possible and turn off as needed.
** Pete - shouldn't limit metadata for edge cases. Need to have the metadata possible and turn off as needed.
  - Pete - automated addition of data from proposal/safety form
** Pete - automated addition of data from proposal/safety form
  - Brian - adding analyte composition and density etc.
** Brian - adding analyte composition and density etc.
  - Pete - issue with NXsample : requires chemical formula, not composition.
** Pete - issue with NXsample : requires chemical formula, not composition.
  - Some are easy for low throughput. Need an efficient way of entering the information.  
** Some are easy for low throughput. Need an efficient way of entering the information.  
  - Should look at what the MX do.  
** Should look at what the MX do.  
  - Needs to be end-to-end. Has to be a benefit to user.
** Needs to be end-to-end. Has to be a benefit to user.
  - Machine learning : Pete - need some samples with bad metadata to teach algorithm!
** Machine learning : Pete - need some samples with bad metadata to teach algorithm!


=== Actions ===
=== Actions ===
- Implementation of a SESANSdata group equivalent to SASdata : Adam W to develop proposal.
* Implementation of a SESANSdata group equivalent to SASdata : '''Adam W to develop proposal'''


- Identify the right location for data sets and upload to Zenodo and/or github. (https://github.com/canSAS-org/NXcanSAS_examples/issues/3#issuecomment-509619529)
* Upload example datasets to Zenodo and/or github. (Relevant github issue here : https://github.com/canSAS-org/NXcanSAS_examples/issues/3#issuecomment-509619529) : '''Action for all canSAS who have data in NXcanSAS format'''


- Linking of detector metadata to sasdata entries.
* Linking of detector metadata to sasdata entries. '''AJJ to develop proposal in absence of other volunteers!'''


- Writing of notes with examples for NXcanSAS usage. Should be included in NXcanSAS documentation or on cansas.org. Needs to be more than just developer docs.
* Writing of notes with examples for NXcanSAS usage. Should be included in NXcanSAS documentation or on cansas.org. Needs to be more than just developer docs : '''Action for all canSAS who have examples of how they have used NXcanSAS for specific cases'''.


- Build a list of suggested metadata with standard names. Including how to build sample descriptions. '''Brian P to write a proposal for sample relevant metadata and to circulate to Data Formats group'''
* Build a list of suggested metadata with standard names. Including how to build sample descriptions. '''Brian P to write a proposal for sample relevant metadata and to circulate to Data Formats group'''

Revision as of 15:01, 9 July 2019

Presentation

Session Notes

Jan : real world examples of nxcanSAS from facilities needed. Tim suggests everyone putting them on Zenodo to get DOIs Pete - post issues on cansas examples github

Pete - request for more examples of NXcanSAS files: https://github.com/canSAS-org/NXcanSAS_examples/issues/3#issuecomment-509619529


Reduced data from multiple detectors

  • Is this a general problem?
  • Jeff : looking at how to handle this for VSANS at NIST
  • Issue for both raw and reduced : raw easier
  • How to convert from raw to reduced? Very instrument dependent. Choice of detector combination to Q space varies with instrument and experiment and whether 1D or 2D
    • AJJ : question is of multiple 2D q-spaces rather than detector spaces.
    • Tim : multiple physical detectors vs moving detector
    • AJJ : multiple SASdata entries in a single SASentry
    • Jan : how to get software to read complex datasets (e.g. lots of mag fields with many detectors)
    • AJJ : discussion of complexity ... some of the complex analyses will require the users to understand their data. Perhaps plugins in e.g. DPDAK?
  • Sven-Jannik : Is there a standard location for motors etc. Not really ... each instrument/facility does it differently.
    • can we have a standard set of metadata tags for NXcanSAS to support
  • Armin : how about more detailed application definitions?
  • Pete : Application definitions tend to be generic - can use search methods, specify nexus path on data loading as one way. Similar to translation dictionaries.
  • Brian : how to include multi-modal data - probably at the same level as a SASentry
  • Detectors : how do we link detector metadata to the q-spaces derived from them. e.g. multiple detector configurations might be used to give a single Q space. This is true for 1D as well.
  • Stitching : discussion of stitching - U-Ser was interested in how we at SANS beamlines manage stiching - automated or manual. Explained that it still requires user input with expert help to define data combination routine and choices of overlap/data cutoff etc.
  • Adam : SESANS data - what to do? Not I vs Q, but is small angle scattering. Should create equivalent of SASData for SESANS data. This allows for combined SANS/SESANS experiments to write one set of sample metadata and two datasets to file - similar to having multiple SASdata entries for eg USAXS+SAXS+WAXS

Discussion of dataset representation in python

Need dataset representation that maintains information about units and uncertainties and can ideally support other metadata.

  • xarray - not fully featured.
  • scippy (c++ version of xarray for mantid) - reinventing numpy! not what we are looking for here.
  • DAWN has the java based dataset representation that supports uncertainties and units.

Large datasets are a problem - example of tomography/imaging where can't read whole file into memory. So need lazy data loading.

  • nexpy supports network streaming / reading online rather than having to load whole dataset

Multiple detectors - multiple kinds of detector (Tim)

Not only do we have multiple detectors, but we might have multiple types of detector running at different frame rates e.g. optical camera + x-ray detector working at different frequencies. More frames from one than the other

  • SAXS + WAXS at DESY - just write two nexus files to avoid the problem. Server issues and I/O
  • I22 - timeframe generation (one file) + two separate detector files. Acquisition gives you a header nexus file linked to the others. Currently looking at stacking in the right order in data structures - slowest at top.
  • ESS and other spallation sources - all timestamped so do post-hoc correlations.
  • Pete - timestamping at synchrotrons will be coming. XPCS can use this.
  • Brian : Visualisation
    • Time slider bars - showing last collected image.

Metadata for Machine Learning

  • Datasets generated by Diamond for machine learning project should be put online for others to use (3Tb issue!)
  • Metadata from sample is required for most machine learning applications that would be useful to users.
  • XPDF users need the information about the sample to be able to process data at all (same for liquids neutron diffraction).
  • Need to move on from basics such as simple dimensions etc (Pete) - however, we don't know what we need to know to feed the machine learning algorithms.
  • What are the users going to give us?
    • Very little ... without encouragement
    • Use header nexus file with link to raw detector data. Basically digital logbook.
    • Pete - shouldn't limit metadata for edge cases. Need to have the metadata possible and turn off as needed.
    • Pete - automated addition of data from proposal/safety form
    • Brian - adding analyte composition and density etc.
    • Pete - issue with NXsample : requires chemical formula, not composition.
    • Some are easy for low throughput. Need an efficient way of entering the information.
    • Should look at what the MX do.
    • Needs to be end-to-end. Has to be a benefit to user.
    • Machine learning : Pete - need some samples with bad metadata to teach algorithm!

Actions

  • Implementation of a SESANSdata group equivalent to SASdata : Adam W to develop proposal
  • Linking of detector metadata to sasdata entries. AJJ to develop proposal in absence of other volunteers!
  • Writing of notes with examples for NXcanSAS usage. Should be included in NXcanSAS documentation or on cansas.org. Needs to be more than just developer docs : Action for all canSAS who have examples of how they have used NXcanSAS for specific cases.
  • Build a list of suggested metadata with standard names. Including how to build sample descriptions. Brian P to write a proposal for sample relevant metadata and to circulate to Data Formats group