Skip to content

Commit

Permalink
ReleaseNotes: Fix matrix operations + other details
Browse files Browse the repository at this point in the history
  • Loading branch information
Rahul Iyer committed Jul 27, 2015
1 parent e4aba58 commit 5b57779
Showing 1 changed file with 22 additions and 12 deletions.
34 changes: 22 additions & 12 deletions ReleaseNotes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,38 +8,48 @@ A complete list of changes for each release can be obtained by viewing the git
commit history located at https://github.com/madlib/madlib/commits/master.

Current list of bugs and issues can be found at http://jira.madlib.net.

--------------------------------------------------------------------------------
MADlib v1.8

Release Date: 2015-July-17

New features:
* Latent Dirichlet Allocation (LDA) Performance Improvement
- Function lda_train() is 1.5X ~ 3X faster
- Improve the scalability support (vocabulary size multiplied by the
number of topics up to 250 million)
* Matrix Operations
-
* Improved Latent Dirichlet Allocation (LDA) Performance
- Function lda_train() is about twice as fast.
- Improved the scalability of the function
(vocabulary size x number of topics can be up to 250 million).
* New module: Matrix operations
Added the following operations/functions for dense and sparse matrices:
- Mathematical operations: addition, subtraction, multiplication,
element-wise multiplication, scalar and vector multiplication.
- Aggregation operations: apply various operations including
max, min, sum, mean along a specified dimension.
- Visitor methods: extract row/column from matrix.
- Representation: convert a matrix to either dense or sparse representation.
* Quotation and International Character Support
- Most modules now support table and column names that are quoted and
contain international characters, including:
- Regression models (GLMs, linear regression, elastic net, etc.)
- Decision trees and random forests
- Unsupervised learning models (association rules, k-means, LDA, etc.)
- Summary, Pearson's correlation, and principal component analysis
- Summary, Pearson's correlation, and PCA
* Array Norms and Distances
- Generic p-norm distance
- Jaccard distance
- Cosine similarity
* Text Analysis:
- Text utility for term frequency and vacabulary construction (prepares
documents for input to LDA).
* Miscellaneous
- Text utility for term frequency and vacabulary construction
- Low-rank matrix factorization: 32-bit integer aupport (MADLIB-903)
- Cross-validation: classification support (MADLIB-908)
- Clean up functions for junk tables
- Improved organization of User and Developer guide at doc.madlib.net/latest.
- Low-rank matrix factorization: added 32-bit integer aupport (MADLIB-903).
- Cross-validation: added classification support (MADLIB-908).
- Added a new clean-up function for removing MADlib temporary tables.

Note:
- LDA models that are trained using MADlib v1.7.1 or earlier need to be
re-trained for the use of MADlib v1.8
re-trained to be used in MADlib v1.8.

Known issues:
- Performance for decision tree with cross-validation is poor on a HAWQ
Expand Down

0 comments on commit 5b57779

Please sign in to comment.