INGOR
Public Member Functions | Public Attributes | List of all members
ytData Class Reference

General data container. More...

#include <util/ytData.h>

Public Member Functions

#define ytData_TYPE_REAL
 variable type representing real values.
 
#define ytData_TYPE_ORDINAL
 variable type representing integer ordinal values.
 
#define ytData_TYPE_CATEGORICAL
 variable type representing categorical discrete values.
 
#define ytData_TYPE_DISCRETE
 variable type representing integer discrete values.
 
ytDataytData_new ()
 Generates the empty ytData instance.
 
void ytData_delete (ytData *this)
 Deletes the ytData instance.
 
int ytData_numSamples (const ytData *this)
 Returns the number of samples (ytData::n).
 
int ytData_numVariables (const ytData *this)
 Returns the number of variables (ytData::p).
 
const char * ytData_typeName (int type)
 Returns the string expression of the type value. More...
 
void ytData_stat (const ytData *this, FILE *fp, int level)
 Prints or checks data statistics. More...
 
void ytData_print (ytData *this, FILE *fp)
 Prints the contents.
 
int ytData_getType (const ytData *this, int j)
 Returns the type of the variable. More...
 
int ytData_parseType (const char *name)
 Returns the type ID of the given type name. More...
 
const char * ytData_getName (const ytData *this, int j)
 Returns the name of the variable. More...
 
const char * ytData_getTypeName (const ytData *this, int j)
 Returns the string expression of the type of the specified variable.
 
int ytData_findName (const ytData *this, const char *name)
 Returns the index of the variable of the specified name. More...
 
void ytData_dynamic (ytData *this)
 Converts data for the dynamic model. More...
 
ytDataytData_dynamic2 (const ytData *this, int *T)
 Generates time expanded data. More...
 
ytDataytData_bootstrap (const ytData *this, ytRNG *rng, ytData *data)
 Performs the boostrap resampling. More...
 
ytDataytData_pseudoBootstrap (const ytData *this, ytRNG *rng, int blocks, ytData *data)
 Performs the pseudo bootstrap resampling for dynamic data. More...
 
ytDataytData_pidBootstrap (const ytData *this, ytRNG *rng, int n, int F, ytData *data)
 Resampling primary IDs for the bootstrap method. More...
 
ytDataytData_listBootstrap (const ytData *this, ytRNG *rng, int n, ytArray *listSet, int F, ytData *data)
 Resampling lists of primary IDs for the bootstrap method. More...
 
ytArrayytData_readPrimaryIDList (const ytData *this, const char *file)
 
void ytData_extractRange (const ytData *this, ytKeyValues *kv)
 Extrats value ranges. More...
 
void ytData_checkRange (const ytData *this, ytDoubleArray *xlar, ytDoubleArray *xrar)
 Checks if the range arrays are valid. More...
 
ytDataytData_hybrid (ytData *this, int N)
 Generates a new ytData instance for static-dynamic hybrid model. More...
 
ytDataytData_dehybrid (ytData *this, int N)
 De-hybridize time-extended static-dynamic hybrid data. More...
 
ytDataytData_dbn (ytData *this, int T)
 Converts data for the time-expanded DBN model. More...
 
ytArrayytData_collectPrimaryId (const ytData *this)
 Collects sample IDs with repsect to the primary ID.
 
int ytData_maxSecondaryId (const ytData *this)
 Returns the maximum secondary ID of the samples. More...
 
const ytStrArrayytData_getCategories (const ytData *this, int j)
 Returns the dictionary (categories) of the variable.
 
void ytData_convertAllToReal (ytData *this)
 Converts all values to real values. More...
 
void ytData_splitXY (ytData *this)
 Converts the data to explanatory/objective variable separated data. More...
 
ytDataytData_selectVars (const ytData *this, const ytStrArray *names)
 Selects variables by their names. More...
 
int ytData_countNAN (const ytData *this)
 Counts up the number of NaNs. More...
 
void ytData_MPI_Bcast (ytData **data, int root, MPI_Comm comm)
 Broadcasts the ytData instance with MPI. More...
 

Public Attributes

ytObject obj
 
int n
 The number of samples.
 
int p
 The number of variables.
 
double * X
 n x p explanatory data matrix. More...
 
double * Y
 n x p target data matrix. More...
 
ytStrArraynames
 Names of variables.
 
ytIntArraytypes
 Value types of the variables. The j-th element represents the type ID of the j-th variable. The type ID is one of ytData_TYPE_REAL, ytData_TYPE_ORDINAL, ytData_TYPE_CATEGORICAL, and ytData_TYPE_DISCRETE.
 
ytKeyValuessampleAttrs
 attributes for samples. The value associated with the key is an array. The type of the array depends on the attributes.
 
ytKeyValuesvarAttrs
 attributes for variables.
 
ytArraydict
 dictionary for categories. The elements are ytStrArray instances, and the j-th element corresponds to the dictionary for the j-th variable. If the variable does not categarical, NULL needs to be set.
 
ytKeyValuesmeta
 meta data
 

Detailed Description

General data container.

Predefined attributes

sample attributes

primaryid
[ytIntArray] Integer primary ID of samples. The primary ID generally refers to the ID of the same individuals, genes, etc.
secondaryid
[ytIntArray] Integer secondary ID of samples. The secondary ID generally refers to the ID of the times, years etc.
See also
ytGDF

Member Function Documentation

◆ ytData_bootstrap()

ytData * ytData_bootstrap ( const ytData this,
ytRNG rng,
ytData data 
)

Performs the boostrap resampling.

Currently, only "primaryid" and "secondaryid" are set in ytData::sampleAttrs.

If the original data has these IDs, this sets the resampled corresponding IDs. If the original data does not have primary IDs, this sets the resampled index of samples as the primary IDs. If the original data does not have secondary IDs, this does not set them in the new data.

This does not set ytData::varAttrs and ytData::meta.

Parameters
this
dataytData instance where bootstrap results are stored. If NULL, new ytData instance is allocated and is returned. The instance needs to be the one returned by this function.
rng
Returns
bootstrapped data. Only a part of the fields has its value.

◆ ytData_checkRange()

void ytData_checkRange ( const ytData this,
ytDoubleArray xlar,
ytDoubleArray xrar 
)

Checks if the range arrays are valid.

If the ranges in the given arrays exceed the values in data, then this changes them.

◆ ytData_convertAllToReal()

void ytData_convertAllToReal ( ytData this)

Converts all values to real values.

This converts the types of all variables to real (ytData_TYPE_REAL). The categorical values are converted to the integer values of the internal indices of the values.

Parameters
thisytData instance.

◆ ytData_countNAN()

int ytData_countNAN ( const ytData this)

Counts up the number of NaNs.

This counts the number of NaNs only in ytData::X.

◆ ytData_dbn()

ytData * ytData_dbn ( ytData this,
int  T 
)

Converts data for the time-expanded DBN model.

This converts data with p × T variables to p variables with T samples. If the original data set has N samples, these are regarded as data for different primary IDs.

Parameters
this
T

◆ ytData_dehybrid()

ytData * ytData_dehybrid ( ytData this,
int  N 
)

De-hybridize time-extended static-dynamic hybrid data.

Note: The current implementation supports only N = 2.

Parameters
this
Ndepth

◆ ytData_dynamic()

void ytData_dynamic ( ytData this)

Converts data for the dynamic model.

This replaces ytData::n, ytData::X, ytData::Y.

In addition, sample attributes primaryid, and secondaryid are also replaced. Note that other sample attributes are not set after calling this.

◆ ytData_dynamic2()

ytData * ytData_dynamic2 ( const ytData this,
int *  T 
)

Generates time expanded data.

This converts the given data to the time-expanded data where each time point (Secondary ID) of a variable is regarded as a different variable.

If the data contains T time points with p variables, then the new data with T × p variables is genereted. Therefore, the new data will have P samples whereas the old data set has P × T samples where P represents the number of unique primary IDs.

Note that this assumes that each primary ID has the same lengths of time points (secondary IDs).

New sample attributes for P samples are taken from the old ones at the first time point (secondary ID) of the particular primary IDs.

P is identical to the number of samples in the new data set.

Parameters
[in]thisytData instance to convert.
[out]Tthe number of time points (secondary IDs).
Returns
Generated ytData instance. Also the numbers of primary IDs and secondary IDs are returned.

◆ ytData_extractRange()

void ytData_extractRange ( const ytData this,
ytKeyValues kv 
)

Extrats value ranges.

This extracts the minimum and maximum values for each variable, and stores them as ytDoubleArray instances. The arrays are set in the given ytKeyValues instance as values with keys "xl" and "xr".

Note taht this does not consider extra outer regions. This simply searches for the max and min of each variable.

This is to fix parent value ranges for B-spline modeling when bootstraping.

◆ ytData_findName()

int ytData_findName ( const ytData this,
const char *  name 
)

Returns the index of the variable of the specified name.

Returns
-1 if not found.

◆ ytData_getName()

const char * ytData_getName ( const ytData this,
int  j 
)

Returns the name of the variable.

If the name is not set, this returns NULL.

Parameters
this
jindex of variable.
Returns
variable name, or NULL if not available.

◆ ytData_getType()

int ytData_getType ( const ytData this,
int  j 
)

Returns the type of the variable.

Parameters
this
jindex
Returns
type of the j -th variable.

◆ ytData_hybrid()

ytData * ytData_hybrid ( ytData this,
int  N 
)

Generates a new ytData instance for static-dynamic hybrid model.

The order of the new data set is t=0, t=-1, ..., t=-N.

Parameters
NSpecifies to generate a new data with T- N to T-0.

◆ ytData_listBootstrap()

ytData * ytData_listBootstrap ( const ytData this,
ytRNG rng,
int  n,
ytArray listSet,
int  F,
ytData data 
)

Resampling lists of primary IDs for the bootstrap method.

Parameters
listSetytArray instance containing ytIntArray instances as its elements defining lists of primary IDs.
FIf true, checks the consistensity of the length of the secondary IDs and the list lengths.

◆ ytData_maxSecondaryId()

int ytData_maxSecondaryId ( const ytData this)

Returns the maximum secondary ID of the samples.

Note: This returns the maximum value of the internal, predefined secondaryid sample attributes.

Parameters
thisytData instance.
Returns
Maximum value of the secondary ID (predefined secondaryid sample attributes. If the attribute does not exist, 0 is returned.

◆ ytData_MPI_Bcast()

void ytData_MPI_Bcast ( ytData **  data,
int  root,
MPI_Comm  comm 
)

Broadcasts the ytData instance with MPI.

Parameters
[in,out]datapointer ot the \ref ytData instance. Specifies the pointer to the ponter of the instance to send or receive.
rootroot rank in the communicator to broadcast. Other ranks receive the broadcasted data.
commMPI communicator

◆ ytData_parseType()

int ytData_parseType ( const char *  name)

Returns the type ID of the given type name.

Parameters
name

◆ ytData_pidBootstrap()

ytData * ytData_pidBootstrap ( const ytData this,
ytRNG rng,
int  n,
int  F,
ytData data 
)

Resampling primary IDs for the bootstrap method.

The new "bootstrapped" ytData instance contains only three sample attributes: "primaryid", "secondaryid", and "orig_primaryid".

"orig_primaryid" keeps track of the original primary IDs.

Parameters
this
rng
nNumber of IDs to resample. If 0, the same number as in the original data set is used (the number of primary IDs).
FIf true, checks the consistensity of the length of the secondary IDs.
dataIf NULL, the new ytData instance is generated and returned.
Returns
data or a newly generated ytData instance with resampled data.

◆ ytData_pseudoBootstrap()

ytData * ytData_pseudoBootstrap ( const ytData this,
ytRNG rng,
int  blocks,
ytData data 
)

Performs the pseudo bootstrap resampling for dynamic data.

See also
ytData_bootstrap()

◆ ytData_readPrimaryIDList()

ytArray * ytData_readPrimaryIDList ( const ytData this,
const char *  file 
)

brief Reads a primary ID list file.

Each line defines the list of sample names where these samples are resampled together when the "list" bootstrap mode.

◆ ytData_selectVars()

ytData * ytData_selectVars ( const ytData this,
const ytStrArray names 
)

Selects variables by their names.

If the given node is not found in the original ytData instance, this outputs only the warning messages. Checks the number of variables after receiving the new instance by yourself if you want to know whether this happens or not.

◆ ytData_splitXY()

void ytData_splitXY ( ytData this)

Converts the data to explanatory/objective variable separated data.

This assumes that the given data set consists of different set of samples for explanatory and objective variables, and then regards the first half of the samples (rows) are ones of explanatory variables and the second half of them objective variables. Thus, the number of samples of the given data becomes the half of them after applying this routine.

Parameters
this

◆ ytData_stat()

void ytData_stat ( const ytData this,
FILE *  fp,
int  level 
)

Prints or checks data statistics.

Parameters
level0 - only warning.

◆ ytData_typeName()

const char * ytData_typeName ( int  type)

Returns the string expression of the type value.

Parameters
typetype value returned by ytData_getType().
Returns
type name of the specified type value.

Member Data Documentation

◆ X

double* ytData::X

n x p explanatory data matrix.

This is a column major matrix. The (i,j) element of X can be accessed by X[i + j * n] where n is the number of samples. Here columns represent variables and rows represent samples.

The values are not only doubles but also integers or indices. The type of the variable is stored in the types field.

◆ Y

double* ytData::Y

n x p target data matrix.

Note that if X and Y are different, it is assumed that Y is allocated by malloc() and free() is called when deleting the instance.


The documentation for this class was generated from the following files: