Identifying Trx-fold Protein
- Chang Wang, Stephen D. Scott, Jun Zhang, Qingping Tao, Dmitri
E. Fomenko, and Vadim N. Gladyshev. A Study in Modeling Low-Conservation
Protein Superfamilies. Technical report UNL-CSE-2004-0003, University
of Nebraska, 2004. [pdf]
(With full description of the source of the data. The data set
provided here is referred as "MIL (Motif-based alignment)". )
- Qingping Tao, Stephen D. Scott and N. V. Vinodchandran. SVM-Based
Generalized Multiple-Instance Learning via Approximate Box Counting.
In Proceedings of the Twenty-First International Conference on Machine
Learning (ICML 2004), pages 779-806, Banff, Alberta, Canada, July 2004.
[pdf]
- Database file: primary.des.db
- Specification file: primary.des.spec
- Partitions for jackknife test
a. positive examples: pos.svm
b. negative examples: neg.svm.*
- Database file:
- Each line contains a bag;
- Each line / bag is in the format: <label>
<number of points in the bag> point_1 point_2 ...
...
- For example, a bag with label 1 and 3 points, (1,1,1), (2,2,2),
(3,3,3) is written as
1
3 1 1 1 2 2 2 3 3 3
<number of classes>
<class label 1> <class label 2>
... ...
<dummy, no meaning>
<number of dimensions>
<minimum value for dimension 1> <maximum value
for dimension 1>
<minimum value for dimension 2> <maximum value
for dimension 2>
... ...
- *.svm* files:
- Each line refers to a bag in the database file;
- Each line is in the format: <label> 1:<index
of the bag in the database file>
- The index is started with 0.