Package 'MMDai' reference manual

Title:	Multivariate Multinomial Distribution Approximation and Imputation for Incomplete Categorical Data
Description:	A method to impute the missingness in categorical data. Details see the paper <doi:10.4310/SII.2020.v13.n1.a2>.
Authors:	Chaojie Wang
Maintainer:	Chaojie Wang <[email protected]>
License:	GPL (>= 2)
Version:	2.0.0
Built:	2025-02-26 05:45:46 UTC
Source:	https://github.com/cran/MMDai

Generate random dataset

Description

This function is used to generate random datasets following mixture of product multinomial distribution

Usage

GenerateData(
  n,
  p,
  d,
  k = 3,
  theta = rdirichlet(1, rep(10, k)),
  psi = InitialPsi(p, d, k)
)
GenerateData(
  n,
  p,
  d,
  k = 3,
  theta = rdirichlet(1, rep(10, k)),
  psi = InitialPsi(p, d, k)
)

Arguments

`n`	- number of samples
`p`	- number of variables
`d`	- a vector which denotes the number of categories for each variable. It could be distinct among variables.
`k`	- number of latent classes
`theta`	- probability for latent class
`psi`	- probability for specific category

Value

data - generated random dataset, a matrix with n rows and p columns.

Examples

# dimension parameters
n<-200; p<-5; d<-rep(2,p);
# generate complete data
Complete<-GenerateData(n, p, d, k = 3)
# dimension parameters
n<-200; p<-5; d<-rep(2,p);
# generate complete data
Complete<-GenerateData(n, p, d, k = 3)

Imputation

Description

This function is used to perform multiple imputation for missing data given the joint distribution.

Usage

Imputation(data, theta, psi)
Imputation(data, theta, psi)

Arguments

`data`	- incomplete dataset
`theta`	- vector of probability for each component
`psi`	- specific probability for each variable in each component

Value

ImputedData - dataset has been imputated.

initial psi

Description

This function creates a psi list in that each component has equal weight

Usage

InitialPsi(p, d, k)
InitialPsi(p, d, k)

Arguments

`p`	- number of variables
`d`	- a vector which denotes the number of categories for each variable. It could be distinct among variables.
`k`	- number of components

Value

psi - a list in that each component has equal weight

Identify the suitable number of components k

Description

This function is used to find the suitable number of components k.

Usage

kIdentifier(data, d, TT = 1000, alpha = 0.25)
kIdentifier(data, d, TT = 1000, alpha = 0.25)

Arguments

`data`	- data in matrix formation with n rows and p columns
`d`	- number of categories for each variable
`TT`	- number of iterations in Gibbs sampler, default value is 1000. T should be an even number for 'burn-in'.
`alpha`	- hyperparameter that could be regarded as the pseudo-count of the number of samples in the new component

Value

k_est - posterior estimation of k

k_track - track of k in the iteration process

Examples

# dimension parameters
n<-200; p<-5; d<-rep(2,p);
# generate complete data
Complete<-GenerateData(n, p, d, k = 3)
# mask percentage of data at MCAR
Incomplete<-Complete
Incomplete[sample(1:n*p,0.2*n*p,replace = FALSE)]<-NA
# k identify
K<-kIdentifier(data = Incomplete, d, TT = 10)
# dimension parameters
n<-200; p<-5; d<-rep(2,p);
# generate complete data
Complete<-GenerateData(n, p, d, k = 3)
# mask percentage of data at MCAR
Incomplete<-Complete
Incomplete[sample(1:n*p,0.2*n*p,replace = FALSE)]<-NA
# k identify
K<-kIdentifier(data = Incomplete, d, TT = 10)

This is a real application dataset. The source of original data is the ratings dataset in (Harper and Konstan (2016) <DOI:10.1145/2827872>). This dataset is used to evaluate the performance of package in real applications.

Author(s)

Chaojie Wang

Estimate theta and psi in multinomial mixture model

Description

This function is used to estimate theta and psi in multinomial mixture model given the number of components k.

Usage

ParEst(data, d, k, TT = 1000)
ParEst(data, d, k, TT = 1000)

Arguments

`data`	- data in matrix formation with n rows and p columns
`d`	- number of categories for each variable
`k`	- number of components
`TT`	- number of iterations in Gibbs sampler, default value is 1000. T should be an even number for 'burn-in'.

Value

theta - vector of probability for each component

psi - specific probability for each variable in each component

Examples

# dimension parameters
n<-200; p<-5; d<-rep(2,p);
# generate complete data
Complete<-GenerateData(n, p, d, k = 3)
# mask percentage of data at MCAR
Incomplete<-Complete
Incomplete[sample(1:n*p,0.2*n*p,replace = FALSE)]<-NA
# k identify
K<-kIdentifier(data = Incomplete, d, TT = 10)
Par<-ParEst(data = Incomplete, d, k = K$k_est, TT = 10)
# dimension parameters
n<-200; p<-5; d<-rep(2,p);
# generate complete data
Complete<-GenerateData(n, p, d, k = 3)
# mask percentage of data at MCAR
Incomplete<-Complete
Incomplete[sample(1:n*p,0.2*n*p,replace = FALSE)]<-NA
# k identify
K<-kIdentifier(data = Incomplete, d, TT = 10)
Par<-ParEst(data = Incomplete, d, k = K$k_est, TT = 10)

Estimate theta and psi in multinomial mixture model

Description

This function is generate random sample from Dirichlet distribution

Usage

rdirichlet(n = 1, alpha = c(1, 1))
rdirichlet(n = 1, alpha = c(1, 1))

Arguments

`n`	- sample size
`alpha`	- parameters in Dirichlet distribution

Value

out - generated data

Examples

# dimension parameters
rdirichlet(n=10,alpha=c(1,1,1))
# dimension parameters
rdirichlet(n=10,alpha=c(1,1,1))

Package 'MMDai'

Help Index

Generate random dataset

Description

Usage

Arguments

Value

Examples

Imputation

Description

Usage

Arguments

Value

initial psi

Description

Usage

Arguments

Value

Identify the suitable number of components k

Description

Usage

Arguments

Value

Examples

Real application dataset

Description

Author(s)

Estimate theta and psi in multinomial mixture model

Description

Usage

Arguments

Value

Examples

Estimate theta and psi in multinomial mixture model

Description

Usage

Arguments

Value

Examples