Talk:cansas multid

From canSAS

Note: please sign and date your contributions. Reverse chronological order works for now. Keep this line on the top of the page.

Discussion on the standard for reduced small-angle scattering data with multi-dimensional data

data structures

  • 2008-05-08, Pete Jemian

The main difference between the cansas1d/1.0 and the multi-dimensional formats is the need to handle multi-dimensional data. The table structure used in the 1D format is quite 'tag-heavy' and just will not extend easily. While the table format makes it easy to use XSLT v1 to display the data in a browser, it takes a little bit more work to extract the I(Q) data into vectors for use in processing or analysis software.

We need to define some new data structures. Rather than create anew, I chose to model the IgorPro data structures. Here's a quick sampler:

<!-- examples of the basic data types -->
<text name="shape">round</text>
<scalar name="wavelength" unit="pm">100.00</scalar>
<vector name="Q" unit="1/A" count="5">0.04 0.05 0.06 0.07 0.08</vector>
<array name="myArr" unit="none" rows="2" columns="3">
	11 12 13 
	21 22 23
</array>
<matrix name="myMatrix" unit="none" dimension="3" count="2 4 3">
	111 112 113 
	121 122 123 
	131 132 133 
	141 142 143

	211 212 213 
	221 222 223 
	231 232 233
	241 242 243
</matrix>

There's more possible here but this just gives an idea.

Also, the Title, Run, Date, etc. can be rolled up into attributes such as

<SASentry date="2008-05-07" name="name-for-entry">
  <SASdata 
    name="cansas2-example" 
    date="2008-05-04" 
    time="18:00:00"
    instrument="Dell-D830" 
    run="1" 
    source="simulated"
>
<!-- ... -->

Here the instrument="Dell-D830" attribute will associate this SASdata with the similarly named

<SASinstrument name="Dell-D830" />

element. Similar for the source="simulated" attribute: it identifies

<SASsource name="simulated" />

Run is used (in practice) as an attribute of a specific SASdata. (Note that order of attributes is not important to the syntax of a well-formed XML file.)

brief example

  • 2008-05-06, Pete Jemian

Here is an attenuated example (lots of data points removed) of some 1-D data in the multi-dimensional XML format:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="cansas2.xsl"?>
<!--
file:    bimodal.xml
Author:    Pete Jemian <jemian@anl.gov>
Revision:  $Id$

Test data for small-angle scattering size distribution determination routines
Calculated small-angle scattering from model size distribution composed of
two log-normal size distributions described below.

-->
<SASroot version="2.0a"
  xmlns="http://www.smallangles.net/cansas/2.0a"
  xsi:schemaLocation="http://www.smallangles.net/cansas/2.0a cansas2.xsd"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:cansas="http://www.smallangles.net/cansas/2.0a"
>

  <SASentry name="SAS bimodal test1" date="1992-01-31">
    <SASdata name="simulated small-angle scattering data">
      <vector name="Q" unit="1/A" count="4">0.004015714 0.004540865 0.005009597 <!-- ... --> 0.3850296</vector>
      <vector name="I" unit="1/cm" count="4">3497.473 3340.003 3322.474 <!-- ... --> 0.110684</vector>
      <vector name="Idev" unit="1/cm" count="4">90.72816 84.95314 79.63133 <!-- ... --> 0.010393647</vector>
    </SASdata>
    <SASsample name="bimodal-test1">
      <comment> # comments from FORTRAN source code                </comment>
      <comment> # ------------------------------------------------------------------------  </comment>
      <comment> #    Calculated (bimodal) test distribution.              </comment>
      <comment> #    created 31 January 1992 by Pete R. Jemian            </comment>
      <comment> # ------------------------------------------------------------------------  </comment>
      <comment> # Model consists of a bimodal, log-normal volume fraction        </comment>
      <comment> #   size distribution.  Parameters are as follows:          </comment>
      <comment> #    PARAMETER ( contrast = 100.0 )  ! * 10^20, 1/cm^4          </comment>
      <comment> #    PARAMETER ( Background = 0.1 )  ! 1/cm              </comment>
      <comment> #    PARAMETER ( sMult = 1000. )    ! counts per 1/cm (for shot noise)    </comment>
      <comment> #    PARAMETER ( sNoise = 0.025 )    ! minimum level          </comment>
      <comment> # ------------------------------------------------------------------------  </comment>
      <comment> #    !     Vf      rBar(A)   sDev(A)              </comment>
      <comment> #    PARAMETER ( a1 = 0.012, c1 =  75., s1 = 15. )           </comment>
      <comment> #    PARAMETER ( a2 = 0.008, c2 = 180., s2 = 60. )           </comment>
      <comment> #    !    Vf   : volume fraction              </comment>
      <comment> #    !    rBar : peak center (A)              </comment>
      <comment> #    !    sDev : peak half-width (A)              </comment>
      <comment> # ------------------------------------------------------------------------  </comment>
    </SASsample>
    <SASinstrument name="simulated SAS calculation">
      <SASsource name="simulated monochromatic source">
        <radiation>artificial</radiation>
        <scalar name="wavelength" unit="A">1.00</scalar>
      </SASsource>
      <SAScollimation />
      <SASdetector name="calculation" />
    </SASinstrument>
    <SASprocess name="create the SAS data" date="1992-01-31">
      <text name="shape">spheres</text>
      <scalar name="Vf1" unit="dimensionless">0.012</scalar>
      <scalar name="rBar1" unit="A">75</scalar>
      <scalar name="sDev1" unit="A">15</scalar>
      <scalar name="Vf2" unit="dimensionless">0.008</scalar>
      <scalar name="rBar2" unit="A">180</scalar>
      <scalar name="sDev2" unit="A">60</scalar>
      <scalar name="contrast" unit="cm^4">100E20</scalar>
      <scalar name="Background" unit="1/cm">0.1</scalar>
      <scalar name="sMult" unit="cts/cm">1000.0</scalar>
      <scalar name="sNoise" unit="fraction">0.25</scalar>
      <comment>construct distribution from two log-normal volume-fraction size distributions</comment>
      <comment>Vf : volume fraction</comment>
      <comment>rBar : peak radius</comment>
      <comment>sDev : peak half-width</comment>
      <dataset name="initial size distribution">
        <vector name="D" unit="A" count="4">25 61.5 73.345 <!-- ... --> 880.6937</vector>
        <vector name="f" unit="1/A" count="4">9.07952E-11 7.73813E-09 2.65831E-07 <!-- ... --> 7.25662E-07</vector>
      </dataset>
    </SASprocess>
  </SASentry>
</SASroot>

comments below culled were from email discussion

  • 2008-05-06, Ron Ghosh

I think this is where I came in, or when the ILL appreciated that other sites were competitive with ILL-SANS.

NeXus should have started with these voluminous problems of TREATED data, (RG at IPNS, Argonne 1995 or was it '96....) and the commensurate simplification of minimal instrument requirements. We have benefited a little from their instrument dictionary. The time is now ripe to show the use of HDF for storing multi-dimensional data, only using pure HDF5 and able to ignore the needs for compatibility with the multiferous raw data variants wrapped around with NeXus.

The arguments for tools and visualisers has been well ventilated, and there really are advantages for the compression and fast indexing offered by hdf5, and long-term support by NCSA.

  • 2008-05-06, Stephen King

[Quick reply for now]

Agree we need to make a start on more complicated data. On this I would like to draw attention to Pete's term 'multi-dimensional' data rather than Andrew's '2D' data. We must address data beyond simple 2D detector patterns; so kinetic/time-resolved stuff (frames, periods, whatever you call them), and other experiments where data is being collected as a function of some variable (temperature, pH, or whatever) - what NeXus calls 'scanned' variables.

And before anyone asks, for 'multi-dimensional' data I am quite happy to abandon human-readability! I'm even prepared to consider binary storage!


  • 2008-05-04, Pete Jemian

Thanks for kicking this off. I have a rough outline of a multi-dimensional canSAS format. Multi-dimensional data must provide for any reasonable number of dimensions. Definitely, it is a vector format if XML is used.

The multi-dimensional format should reduce some inconsistencies discovered while working with the 1D format. Little things such as SASsample has an element called ID but other elements such as SASentry use an element called Title or even an attribute called name.'. For consistency, that should be changed to an attribute called name. But it is these little differences that make coding an interface program more difficult.

But, we should also consider how different we are from NeXus and then why. NeXus has two underlying formats, HDF (v4 and v5) and also XML. An XSLT translation for cansas1d/1.0 to NeXus XML can be built to translate if desired. Also Freddie Akeroyd has added some capabilities to NeXus XML to accept our table format. The rough outline of a multi-dimensional canSAS format I've developed is so close to NeXus that we need to consider strongly why to develop on our own and not use what many other facilities have adopted.

Our cansas1d/1.0 standard is defined by an XML Schema (XSD) and this is a major strength. NeXus uses meta-DTDs. How easy is it to try define NeXus using XSDs? Doesn't look that easy. I've found a great reason this last week to have a standard that is defined by an XSD. That is, autogeneration of a data structure in a particular programming language by software that reads the XSD directly. For Java, it is JAXB. By a Google search, there are such things for Python as well. I've been testing the Java one and it makes Java coding very easy. Non-compliant files will be identified (with rather cryptic diagnostics - maybe that could be improved).

By the way, the JAXB technology makes it clear why a vector format would make input/output much simpler. In our table format, each Q[i], I[i], etc. is a separate leaf in the data structure tree. The "leaves" of the SASdata "branch" need to be collated into structures of double[] Q, double[] I before they can actually be used in reasonable work. Similar problem with the IgorPro reader. It takes 5 minutes to parse the many thousand elements in the 1998spheres.xml example. A vector format of that data loads in IgorPro within quite reasonable time (forgot the numbers) using Andrew Nelson's XMLutils XOP.


  • 2008-05-04, Andrew Jackson

The SNS software group is working with the DANSE project on getting their reduced data into the DANSE software. Paul Butler has suggested that making use of the canSAS format would be a good idea. However 2D data needs to be represented.

Thus I suggest two things. Firstly with the 1D format agreed, we should look at 2D data. AFAICS we can do this by introducing the vector format proposed by Andrew N and Pete with little other pain - am I right Pete? Secondly, how do you all feel about adding Michael Reuter to this group to get his input on the 2D since he will be implementing it in short order?

To head off the argument, I realize that this 2D implementation won't be easily readable without using an XML reader, nor will it be "human readable". However, we only aimed to make a 1D format that was those two things and I think they are unreasonable for a 2D format.