canSAS-XI/DataFormats: Difference between revisions

From canSAS
 
(4 intermediate revisions by 2 users not shown)
Line 12: Line 12:




Reduced data from multiple detectors ....
==== Reduced data from multiple detectors ====
- Is this a general problem
* Is this a general problem?
- Jeff : looking at how to handle this for VSANS
* Jeff : looking at how to handle this for VSANS at NIST
- Issue for both raw and reduced : raw easier
* Issue for both raw and reduced : raw easier
- How to convert?
* How to convert from raw to reduced? Very instrument dependent. Choice of detector combination to Q space varies with instrument and experiment and whether 1D or 2D
- AJJ : question is of multiple 2D q-spaces rather than detector spaces.
** AJJ : question is of multiple 2D q-spaces rather than detector spaces.
- Tim : multiple physical detectors vs moving detector
** Tim : multiple physical detectors vs moving detector
- AJJ : multiple SASdata entries in a single SASentry
** AJJ : multiple SASdata entries in a single SASentry
- Jan : how to get software to read complex datasets (e.g. lots of mag fields with many detectors)
** Jan : how to get software to read complex datasets (e.g. lots of mag fields with many detectors)
- AJJ : discussion of complexity ... some of the complex analyses will require the users to understand their data. Perhaps plugins in e.g. DPDAK?
** AJJ : discussion of complexity ... some of the complex analyses will require the users to understand their data. Perhaps plugins in e.g. DPDAK?


- Is there a standard location for motors etc. Not really ... each instrument/facility does it differently.
* Sven-Jannik : Is there a standard location for motors etc. Not really ... each instrument/facility does it differently.
  - can we have a standard set of metadata tags for NXcanSAS to support  
** can we have a standard set of metadata tags for NXcanSAS to support  
  - how about more detailed application definitions?
* Armin : how about more detailed application definitions?
* Pete : Application definitions tend to be generic - can use search methods, specify nexus path on data loading as one way. Similar to translation dictionaries.
- Pete - search methods, specify search path.
* Brian : how to include multi-modal data - probably at the same level as a SASentry
- Adam : SESANS data - should create equivalent of SASData for SESANS data.
* Detectors : how do we link detector metadata to the q-spaces derived from them. e.g. multiple detector configurations might be used to give a single Q space. This is true for 1D as well.
- Brian : how to include multi-modal data - probably at the same level as a SASentry
* Stitching : discussion of stitching - U-Ser was interested in how we at SANS beamlines manage stiching - automated or manual. Explained that it still requires user input with expert help to define data combination routine and choices of overlap/data cutoff etc.  
- Detectors : how do we link detector metadata to the q-spaces derived from them. e.g. multiple detector configurations might be used to give a single Q space. This is true for 1D as well.
- Stitching : discussion of stitching. How is it automated.  


Discussion of dataset representation in python
* Adam : SESANS data - what to do? Not I vs Q, but is small angle scattering. Should create equivalent of SASData for SESANS data. This allows for combined SANS/SESANS experiments to write one set of sample metadata and two datasets to file - similar to having multiple SASdata entries for eg USAXS+SAXS+WAXS
  - xarray - not fully featured.
  - scippy (c++ version of xarray for mantid) - reinventing numpy! not what we are looking for here.
  - DAWN has something
  - nexpy supports network streaming / reading online rather than having to load whole dataset


Multiple detectors - multiple kinds of detector (Tim)
==== Discussion of dataset representation in python ====
- e.g. optical camera + x-ray detector working at different frequencies. More frames from one than the other
Need dataset representation that maintains information about units and uncertainties and can ideally support other metadata.
- SAXS + WAXS at DESY - just write two nexus files to avoid the problem. Server issues and I/O
* xarray - not fully featured.
- I22 - timeframe generation (one file) + two separate detector files. Acquisition gives you a header nexus file linked to the others. Currently looking at stacking in the right order in data structures - slowest at top.
* scippy (c++ version of xarray for mantid) - reinventing numpy! not what we are looking for here.
- ESS - all timestamped ...
* DAWN has the java based dataset representation that supports uncertainties and units.  
- Pete - timestamping at synchrotrons will be coming. XPCS can use this.
- Brian : Visualisation
      - Time slider bars - showing last collected image.


Metadata for Machine Learning
Large datasets are a problem - example of tomography/imaging where can't read whole file into memory. So need lazy data loading.
- Datasets generated by Diamond should be put online (3Tb issue!)
* nexpy supports network streaming / reading online rather than having to load whole dataset
- Metadata from sample is required.
 
- XPDF users need the information about the sample. (same for liquids neutron diffraction).
==== Multiple detectors - multiple kinds of detector (Tim) ====
- Need to move on from basics (Pete) - don't know what we need to know to feed the machine learning algorithms.
Not only do we have multiple detectors, but we might have multiple types of detector running at different frame rates e.g. optical camera + x-ray detector working at different frequencies. More frames from one than the other
- What are the users going to give us?
* SAXS + WAXS at DESY - just write two nexus files to avoid the problem. Server issues and I/O
  - Very little ...  
* I22 - timeframe generation (one file) + two separate detector files. Acquisition gives you a header nexus file linked to the others. Currently looking at stacking in the right order in data structures - slowest at top.
  - Use header nexus file with link to raw detector data. Basically digital logbook.
* ESS and other spallation sources - all timestamped so do post-hoc correlations.
  - Pete - shouldn't limit metadata for edge cases. Need to have the metadata possible and turn off as needed.
* Pete - timestamping at synchrotrons will be coming. XPCS can use this.
  - Pete - automated addition of data from proposal/safety form
* Brian : Visualisation
  - Brian - adding analyte composition and density etc.
** Time slider bars - showing last collected image.
  - Pete - issue with NXsample : requires chemical formula, not composition.
 
  - Some are easy for low throughput. Need an efficient way of entering the information.  
==== Metadata for Machine Learning ====
  - Should look at what the MX do.  
* Datasets generated by Diamond for machine learning project should be put online for others to use (3Tb issue!)
  - Needs to be end-to-end. Has to be a benefit to user.
* Metadata from sample is required for most machine learning applications that would be useful to users.
  - Machine learning : Pete - need some samples with bad metadata to teach algorithm!
* XPDF users need the information about the sample to be able to process data at all (same for liquids neutron diffraction).
* Need to move on from basics such as simple dimensions etc (Pete) - however, we don't know what we need to know to feed the machine learning algorithms.
* What are the users going to give us?
** Very little ... without encouragement
** Use header nexus file with link to raw detector data. Basically digital logbook.
** Pete - shouldn't limit metadata for edge cases. Need to have the metadata possible and turn off as needed.
** Pete - automated addition of data from proposal/safety form
** Brian - adding analyte composition and density etc.
** Pete - issue with NXsample : requires chemical formula, not composition.
** Some are easy for low throughput. Need an efficient way of entering the information.  
** Should look at what the MX do.  
** Needs to be end-to-end. Has to be a benefit to user.
** Machine learning : Pete - need some samples with bad metadata to teach algorithm!


=== Actions ===
=== Actions ===
- Identify the right location for data sets and upload to Zenodo and/or github.
* Implementation of a SESANSdata group equivalent to SASdata : '''Adam W to develop proposal'''
 
* Upload example datasets to Zenodo and/or github. (Relevant github issue here : https://github.com/canSAS-org/NXcanSAS_examples/issues/3#issuecomment-509619529) : '''Action for all canSAS who have data in NXcanSAS format'''


- Linking of detector metadata to sasdata entries.
* Linking of detector metadata to sasdata entries. '''Jeff K to develop proposal'''


- Writing of notes with examples for NXcanSAS usage.
* Writing of notes with examples for NXcanSAS usage. Should be included in NXcanSAS documentation or on cansas.org. Needs to be more than just developer docs : '''Action for all canSAS who have examples of how they have used NXcanSAS for specific cases'''.


- Build a list of suggested metadata with standard names. Including how to build sample descriptions. '''Brian P to write a proposal for sample relevant metadata and to circulate to Data Formats group'''
* Build a list of suggested metadata with standard names. Including how to build sample descriptions. '''Brian P to write a proposal for sample relevant metadata and to circulate to Data Formats group'''

Latest revision as of 15:26, 9 July 2019

Presentation

Session Notes

Jan : real world examples of nxcanSAS from facilities needed. Tim suggests everyone putting them on Zenodo to get DOIs Pete - post issues on cansas examples github

Pete - request for more examples of NXcanSAS files: https://github.com/canSAS-org/NXcanSAS_examples/issues/3#issuecomment-509619529


Reduced data from multiple detectors

  • Is this a general problem?
  • Jeff : looking at how to handle this for VSANS at NIST
  • Issue for both raw and reduced : raw easier
  • How to convert from raw to reduced? Very instrument dependent. Choice of detector combination to Q space varies with instrument and experiment and whether 1D or 2D
    • AJJ : question is of multiple 2D q-spaces rather than detector spaces.
    • Tim : multiple physical detectors vs moving detector
    • AJJ : multiple SASdata entries in a single SASentry
    • Jan : how to get software to read complex datasets (e.g. lots of mag fields with many detectors)
    • AJJ : discussion of complexity ... some of the complex analyses will require the users to understand their data. Perhaps plugins in e.g. DPDAK?
  • Sven-Jannik : Is there a standard location for motors etc. Not really ... each instrument/facility does it differently.
    • can we have a standard set of metadata tags for NXcanSAS to support
  • Armin : how about more detailed application definitions?
  • Pete : Application definitions tend to be generic - can use search methods, specify nexus path on data loading as one way. Similar to translation dictionaries.
  • Brian : how to include multi-modal data - probably at the same level as a SASentry
  • Detectors : how do we link detector metadata to the q-spaces derived from them. e.g. multiple detector configurations might be used to give a single Q space. This is true for 1D as well.
  • Stitching : discussion of stitching - U-Ser was interested in how we at SANS beamlines manage stiching - automated or manual. Explained that it still requires user input with expert help to define data combination routine and choices of overlap/data cutoff etc.
  • Adam : SESANS data - what to do? Not I vs Q, but is small angle scattering. Should create equivalent of SASData for SESANS data. This allows for combined SANS/SESANS experiments to write one set of sample metadata and two datasets to file - similar to having multiple SASdata entries for eg USAXS+SAXS+WAXS

Discussion of dataset representation in python

Need dataset representation that maintains information about units and uncertainties and can ideally support other metadata.

  • xarray - not fully featured.
  • scippy (c++ version of xarray for mantid) - reinventing numpy! not what we are looking for here.
  • DAWN has the java based dataset representation that supports uncertainties and units.

Large datasets are a problem - example of tomography/imaging where can't read whole file into memory. So need lazy data loading.

  • nexpy supports network streaming / reading online rather than having to load whole dataset

Multiple detectors - multiple kinds of detector (Tim)

Not only do we have multiple detectors, but we might have multiple types of detector running at different frame rates e.g. optical camera + x-ray detector working at different frequencies. More frames from one than the other

  • SAXS + WAXS at DESY - just write two nexus files to avoid the problem. Server issues and I/O
  • I22 - timeframe generation (one file) + two separate detector files. Acquisition gives you a header nexus file linked to the others. Currently looking at stacking in the right order in data structures - slowest at top.
  • ESS and other spallation sources - all timestamped so do post-hoc correlations.
  • Pete - timestamping at synchrotrons will be coming. XPCS can use this.
  • Brian : Visualisation
    • Time slider bars - showing last collected image.

Metadata for Machine Learning

  • Datasets generated by Diamond for machine learning project should be put online for others to use (3Tb issue!)
  • Metadata from sample is required for most machine learning applications that would be useful to users.
  • XPDF users need the information about the sample to be able to process data at all (same for liquids neutron diffraction).
  • Need to move on from basics such as simple dimensions etc (Pete) - however, we don't know what we need to know to feed the machine learning algorithms.
  • What are the users going to give us?
    • Very little ... without encouragement
    • Use header nexus file with link to raw detector data. Basically digital logbook.
    • Pete - shouldn't limit metadata for edge cases. Need to have the metadata possible and turn off as needed.
    • Pete - automated addition of data from proposal/safety form
    • Brian - adding analyte composition and density etc.
    • Pete - issue with NXsample : requires chemical formula, not composition.
    • Some are easy for low throughput. Need an efficient way of entering the information.
    • Should look at what the MX do.
    • Needs to be end-to-end. Has to be a benefit to user.
    • Machine learning : Pete - need some samples with bad metadata to teach algorithm!

Actions

  • Implementation of a SESANSdata group equivalent to SASdata : Adam W to develop proposal
  • Linking of detector metadata to sasdata entries. Jeff K to develop proposal
  • Writing of notes with examples for NXcanSAS usage. Should be included in NXcanSAS documentation or on cansas.org. Needs to be more than just developer docs : Action for all canSAS who have examples of how they have used NXcanSAS for specific cases.
  • Build a list of suggested metadata with standard names. Including how to build sample descriptions. Brian P to write a proposal for sample relevant metadata and to circulate to Data Formats group