Skip to content

Commit 56049a7

Browse files
更新文件目录到 db 目录下面
1 parent 5453b06 commit 56049a7

File tree

3,664 files changed

+193
-193
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

3,664 files changed

+193
-193
lines changed

blog/ml/13.利用PCA来简化数据.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@
115115
116116
```python
117117
def replaceNanWithMean():
118-
datMat = loadDataSet('input/13.PCA/secom.data', ' ')
118+
datMat = loadDataSet('db/13.PCA/secom.data', ' ')
119119
numFeat = shape(datMat)[1]
120120
for i in range(numFeat):
121121
# 对value不为NaN的求均值

blog/ml/14.利用SVD简化数据.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -402,7 +402,7 @@ def imgCompress(numSV=3, thresh=0.8):
402402
thresh 判断的阈值
403403
"""
404404
# 构建一个列表
405-
myMat = imgLoadData('input/14.SVD/0_5.txt')
405+
myMat = imgLoadData('db/14.SVD/0_5.txt')
406406

407407
print "****original matrix****"
408408
# 对原始图像进行SVD分解并重构图像e

blog/ml/15.大数据与MapReduce.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -64,15 +64,15 @@ cat inputFile.txt | python mapper.py | sort | python reducer.py > outputFile.txt
6464
```
6565
# 测试 Mapper
6666
# Linux
67-
cat input/15.BigData_MapReduce/inputFile.txt | python src/python/15.BigData_MapReduce/mrMeanMapper.py
67+
cat db/15.BigData_MapReduce/inputFile.txt | python src/python/15.BigData_MapReduce/mrMeanMapper.py
6868
# Window
69-
# python src/python/15.BigData_MapReduce/mrMeanMapper.py < input/15.BigData_MapReduce/inputFile.txt
69+
# python src/python/15.BigData_MapReduce/mrMeanMapper.py < db/15.BigData_MapReduce/inputFile.txt
7070
7171
# 测试 Reducer
7272
# Linux
73-
cat input/15.BigData_MapReduce/inputFile.txt | python src/python/15.BigData_MapReduce/mrMeanMapper.py | python src/python/15.BigData_MapReduce/mrMeanReducer.py
73+
cat db/15.BigData_MapReduce/inputFile.txt | python src/python/15.BigData_MapReduce/mrMeanMapper.py | python src/python/15.BigData_MapReduce/mrMeanReducer.py
7474
# Window
75-
# python src/python/15.BigData_MapReduce/mrMeanMapper.py < input/15.BigData_MapReduce/inputFile.txt | python src/python/15.BigData_MapReduce/mrMeanReducer.py
75+
# python src/python/15.BigData_MapReduce/mrMeanMapper.py < db/15.BigData_MapReduce/inputFile.txt | python src/python/15.BigData_MapReduce/mrMeanReducer.py
7676
```
7777

7878
### MapReduce 机器学习
@@ -93,17 +93,17 @@ cat input/15.BigData_MapReduce/inputFile.txt | python src/python/15.BigData_MapR
9393
* mrjob 是一个不错的学习工具,与2010年底实现了开源,来之于 Yelp(一个餐厅点评网站).
9494

9595
```Shell
96-
python src/python/15.BigData_MapReduce/mrMean.py < input/15.BigData_MapReduce/inputFile.txt > input/15.BigData_MapReduce/myOut.txt
96+
python src/python/15.BigData_MapReduce/mrMean.py < db/15.BigData_MapReduce/inputFile.txt > db/15.BigData_MapReduce/myOut.txt
9797
```
9898

9999
> 实战脚本
100100
101101
```
102102
# 测试 mrjob的案例
103103
# 先测试一下mapper方法
104-
# python src/python/15.BigData_MapReduce/mrMean.py --mapper < input/15.BigData_MapReduce/inputFile.txt
104+
# python src/python/15.BigData_MapReduce/mrMean.py --mapper < db/15.BigData_MapReduce/inputFile.txt
105105
# 运行整个程序,移除 --mapper 就行
106-
python src/python/15.BigData_MapReduce/mrMean.py < input/15.BigData_MapReduce/inputFile.txt
106+
python src/python/15.BigData_MapReduce/mrMean.py < db/15.BigData_MapReduce/inputFile.txt
107107
```
108108

109109
### 项目案例:分布式 SVM 的 Pegasos 算法
@@ -213,7 +213,7 @@ def batchPegasos(dataSet, labels, lam, T, k):
213213

214214
[完整代码地址](https://github.com/apachecn/AiLearning/blob/master/src/py2.x/ml/15.BigData_MapReduce/pegasos.py): <https://github.com/apachecn/AiLearning/blob/master/src/py2.x/ml/15.BigData_MapReduce/pegasos.py>
215215

216-
运行方式:`python /opt/git/MachineLearning/src/python/15.BigData_MapReduce/mrSVM.py < input/15.BigData_MapReduce/inputFile.txt`
216+
运行方式:`python /opt/git/MachineLearning/src/python/15.BigData_MapReduce/mrSVM.py < db/15.BigData_MapReduce/inputFile.txt`
217217
[MR版本的代码地址](https://github.com/apachecn/AiLearning/blob/master/src/py2.x/ml/15.BigData_MapReduce/mrSVM.py): <https://github.com/apachecn/AiLearning/blob/master/src/py2.x/ml/15.BigData_MapReduce/mrSVM.py>
218218

219219
* * *

blog/ml/2.k-近邻算法.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ knn 算法按照距离最近的三部电影的类型,决定未知电影的类
9898

9999
> 收集数据:提供文本文件
100100
101-
海伦把这些约会对象的数据存放在文本文件 [datingTestSet2.txt](https://github.com/apachecn/AiLearning/tree/dev/input/2.KNN/datingTestSet2.txt) 中,总共有 1000 行。海伦约会的对象主要包含以下 3 种特征:
101+
海伦把这些约会对象的数据存放在文本文件 [datingTestSet2.txt](https://github.com/apachecn/AiLearning/tree/ddb/2.KNN/datingTestSet2.txt) 中,总共有 1000 行。海伦约会的对象主要包含以下 3 种特征:
102102

103103
* 每年获得的飞行常客里程数
104104
* 玩视频游戏所耗时间百分比
@@ -288,7 +288,7 @@ def datingClassTest():
288288
# 设置测试数据的的一个比例(训练数据集比例=1-hoRatio)
289289
hoRatio = 0.1 # 测试范围,一部分测试一部分作为样本
290290
# 从文件中加载数据
291-
datingDataMat, datingLabels = file2matrix('input/2.KNN/datingTestSet2.txt') # load data setfrom file
291+
datingDataMat, datingLabels = file2matrix('db/2.KNN/datingTestSet2.txt') # load data setfrom file
292292
# 归一化数据
293293
normMat, ranges, minVals = autoNorm(datingDataMat)
294294
# m 表示数据的行数,即矩阵的第一维
@@ -361,7 +361,7 @@ You will probably like this person: in small doses
361361

362362
> 收集数据: 提供文本文件
363363
364-
目录 [trainingDigits](../input/2.KNN/trainingDigits) 中包含了大约 2000 个例子,每个例子内容如下图所示,每个数字大约有 200 个样本;目录 [testDigits](../input/2.KNN/testDigits) 中包含了大约 900 个测试数据。
364+
目录 [trainingDigits](db/2.KNN/trainingDigits) 中包含了大约 2000 个例子,每个例子内容如下图所示,每个数字大约有 200 个样本;目录 [testDigits](db/2.KNN/testDigits) 中包含了大约 900 个测试数据。
365365

366366
![手写数字数据集的例子](/img/ml/2.KNN/knn_2_handWriting.png)
367367

@@ -402,7 +402,7 @@ array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1
402402
def handwritingClassTest():
403403
# 1. 导入训练数据
404404
hwLabels = []
405-
trainingFileList = listdir('input/2.KNN/trainingDigits') # load the training set
405+
trainingFileList = listdir('db/2.KNN/trainingDigits') # load the training set
406406
m = len(trainingFileList)
407407
trainingMat = zeros((m, 1024))
408408
# hwLabels存储0~9对应的index位置, trainingMat存放的每个位置对应的图片向量
@@ -412,17 +412,17 @@ def handwritingClassTest():
412412
classNumStr = int(fileStr.split('_')[0])
413413
hwLabels.append(classNumStr)
414414
# 将 32*32的矩阵->1*1024的矩阵
415-
trainingMat[i, :] = img2vector('input/2.KNN/trainingDigits/%s' % fileNameStr)
415+
trainingMat[i, :] = img2vector('db/2.KNN/trainingDigits/%s' % fileNameStr)
416416

417417
# 2. 导入测试数据
418-
testFileList = listdir('input/2.KNN/testDigits') # iterate through the test set
418+
testFileList = listdir('db/2.KNN/testDigits') # iterate through the test set
419419
errorCount = 0.0
420420
mTest = len(testFileList)
421421
for i in range(mTest):
422422
fileNameStr = testFileList[i]
423423
fileStr = fileNameStr.split('.')[0] # take off .txt
424424
classNumStr = int(fileStr.split('_')[0])
425-
vectorUnderTest = img2vector('input/2.KNN/testDigits/%s' % fileNameStr)
425+
vectorUnderTest = img2vector('db/2.KNN/testDigits/%s' % fileNameStr)
426426
classifierResult = classify0(vectorUnderTest, trainingMat, hwLabels, 3)
427427
print "the classifier came back with: %d, the real answer is: %d" % (classifierResult, classNumStr)
428428
if (classifierResult != classNumStr): errorCount += 1.0

blog/ml/4.朴素贝叶斯.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -502,11 +502,11 @@ def spamTest():
502502
fullText = []
503503
for i in range(1, 26):
504504
# 切分,解析数据,并归类为 1 类别
505-
wordList = textParse(open('input/4.NaiveBayes/email/spam/%d.txt' % i).read())
505+
wordList = textParse(open('db/4.NaiveBayes/email/spam/%d.txt' % i).read())
506506
docList.append(wordList)
507507
classList.append(1)
508508
# 切分,解析数据,并归类为 0 类别
509-
wordList = textParse(open('input/4.NaiveBayes/email/ham/%d.txt' % i).read())
509+
wordList = textParse(open('db/4.NaiveBayes/email/ham/%d.txt' % i).read())
510510
docList.append(wordList)
511511
fullText.extend(wordList)
512512
classList.append(0)

blog/ml/5.Logistic回归.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,7 @@ def plotBestFit(dataArr, labelMat, weights):
297297
```python
298298
def testLR():
299299
# 1.收集并准备数据
300-
dataMat, labelMat = loadDataSet("input/5.Logistic/TestSet.txt")
300+
dataMat, labelMat = loadDataSet("db/5.Logistic/TestSet.txt")
301301

302302
# print dataMat, '---\n', labelMat
303303
# 2.训练模型, f(x)=a1*x1+b2*x2+..+nn*xn中 (a1,b2, .., nn).T的矩阵值
@@ -576,8 +576,8 @@ def colicTest():
576576
Returns:
577577
errorRate -- 分类错误率
578578
'''
579-
frTrain = open('input/5.Logistic/horseColicTraining.txt')
580-
frTest = open('input/5.Logistic/horseColicTest.txt')
579+
frTrain = open('db/5.Logistic/horseColicTraining.txt')
580+
frTest = open('db/5.Logistic/horseColicTest.txt')
581581
trainingSet = []
582582
trainingLabels = []
583583
# 解析训练数据集中的数据特征和Labels

blog/ml/6.支持向量机.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -526,7 +526,7 @@ def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup=('lin', 0)):
526526
def testDigits(kTup=('rbf', 10)):
527527

528528
# 1. 导入训练数据
529-
dataArr, labelArr = loadImages('input/6.SVM/trainingDigits')
529+
dataArr, labelArr = loadImages('db/6.SVM/trainingDigits')
530530
b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup)
531531
datMat = mat(dataArr)
532532
labelMat = mat(labelArr).transpose()
@@ -544,7 +544,7 @@ def testDigits(kTup=('rbf', 10)):
544544
print("the training error rate is: %f" % (float(errorCount) / m))
545545

546546
# 2. 导入测试数据
547-
dataArr, labelArr = loadImages('input/6.SVM/testDigits')
547+
dataArr, labelArr = loadImages('db/6.SVM/testDigits')
548548
errorCount = 0
549549
datMat = mat(dataArr)
550550
labelMat = mat(labelArr).transpose()

blog/ml/7.集成方法-随机森林和AdaBoost.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -498,13 +498,13 @@ def adaClassify(datToClass, classifierArr):
498498
```python
499499
# 马疝病数据集
500500
# 训练集合
501-
dataArr, labelArr = loadDataSet("input/7.AdaBoost/horseColicTraining2.txt")
501+
dataArr, labelArr = loadDataSet("db/7.AdaBoost/horseColicTraining2.txt")
502502
weakClassArr, aggClassEst = adaBoostTrainDS(dataArr, labelArr, 40)
503503
print weakClassArr, '\n-----\n', aggClassEst.T
504504
# 计算ROC下面的AUC的面积大小
505505
plotROC(aggClassEst.T, labelArr)
506506
# 测试集合
507-
dataArrTest, labelArrTest = loadDataSet("input/7.AdaBoost/horseColicTest2.txt")
507+
dataArrTest, labelArrTest = loadDataSet("db/7.AdaBoost/horseColicTest2.txt")
508508
m = shape(dataArrTest)[0]
509509
predicting10 = adaClassify(dataArrTest, weakClassArr)
510510
errArr = mat(ones((m, 1)))

blog/ml/8.回归.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ def standRegres(xArr,yArr):
161161

162162

163163
def regression1():
164-
xArr, yArr = loadDataSet("input/8.Regression/data.txt")
164+
xArr, yArr = loadDataSet("db/8.Regression/data.txt")
165165
xMat = mat(xArr)
166166
yMat = mat(yArr)
167167
ws = standRegres(xArr, yArr)
@@ -325,7 +325,7 @@ def lwlrTestPlot(xArr,yArr,k=1.0):
325325

326326
#test for LWLR
327327
def regression2():
328-
xArr, yArr = loadDataSet("input/8.Regression/data.txt")
328+
xArr, yArr = loadDataSet("db/8.Regression/data.txt")
329329
yHat = lwlrTest(xArr, xArr, yArr, 0.003)
330330
xMat = mat(xArr)
331331
srtInd = xMat[:,1].argsort(0) # argsort()函数是将x中的元素从小到大排列,提取其对应的index(索引),然后输出
@@ -418,7 +418,7 @@ def abaloneTest():
418418
None
419419
'''
420420
# 加载数据
421-
abX, abY = loadDataSet("input/8.Regression/abalone.txt")
421+
abX, abY = loadDataSet("db/8.Regression/abalone.txt")
422422
# 使用不同的核进行预测
423423
oldyHat01 = lwlrTest(abX[0:99], abX[0:99], abY[0:99], 0.1)
424424
oldyHat1 = lwlrTest(abX[0:99], abX[0:99], abY[0:99], 1)
@@ -540,7 +540,7 @@ def ridgeTest(xArr,yArr):
540540

541541
#test for ridgeRegression
542542
def regression3():
543-
abX,abY = loadDataSet("input/8.Regression/abalone.txt")
543+
abX,abY = loadDataSet("db/8.Regression/abalone.txt")
544544
ridgeWeights = ridgeTest(abX, abY)
545545
fig = plt.figure()
546546
ax = fig.add_subplot(111)
@@ -619,7 +619,7 @@ def stageWise(xArr,yArr,eps=0.01,numIt=100):
619619

620620
#test for stageWise
621621
def regression4():
622-
xArr,yArr=loadDataSet("input/8.Regression/abalone.txt")
622+
xArr,yArr=loadDataSet("db/8.Regression/abalone.txt")
623623
print(stageWise(xArr,yArr,0.01,200))
624624
xMat = mat(xArr)
625625
yMat = mat(yArr).T
@@ -745,12 +745,12 @@ def scrapePage(retX, retY, inFile, yr, numPce, origPrc):
745745

746746
# 依次读取六种乐高套装的数据,并生成数据矩阵
747747
def setDataCollect(retX, retY):
748-
scrapePage(retX, retY, 'input/8.Regression/setHtml/lego8288.html', 2006, 800, 49.99)
749-
scrapePage(retX, retY, 'input/8.Regression/setHtml/lego10030.html', 2002, 3096, 269.99)
750-
scrapePage(retX, retY, 'input/8.Regression/setHtml/lego10179.html', 2007, 5195, 499.99)
751-
scrapePage(retX, retY, 'input/8.Regression/setHtml/lego10181.html', 2007, 3428, 199.99)
752-
scrapePage(retX, retY, 'input/8.Regression/setHtml/lego10189.html', 2008, 5922, 299.99)
753-
scrapePage(retX, retY, 'input/8.Regression/setHtml/lego10196.html', 2009, 3263, 249.99)
748+
scrapePage(retX, retY, 'db/8.Regression/setHtml/lego8288.html', 2006, 800, 49.99)
749+
scrapePage(retX, retY, 'db/8.Regression/setHtml/lego10030.html', 2002, 3096, 269.99)
750+
scrapePage(retX, retY, 'db/8.Regression/setHtml/lego10179.html', 2007, 5195, 499.99)
751+
scrapePage(retX, retY, 'db/8.Regression/setHtml/lego10181.html', 2007, 3428, 199.99)
752+
scrapePage(retX, retY, 'db/8.Regression/setHtml/lego10189.html', 2008, 5922, 299.99)
753+
scrapePage(retX, retY, 'db/8.Regression/setHtml/lego10196.html', 2009, 3263, 249.99)
754754
```
755755

756756
> 测试算法:使用交叉验证来测试不同的模型,分析哪个效果最好

0 commit comments

Comments
 (0)