
Scala中读取本地文件wordCount
发布日期:2021-05-14 05:35:39
浏览次数:20
分类:精选文章
本文共 2578 字,大约阅读时间需要 8 分钟。
import java.io.Fileimport scala.collection.mutable.Mapval textFilePath ="D:/doc/spark/input" //create a File val dirFile = new File(textFilePath) //to get every file or dir's path val files = dirFile.listFiles() //to save the word val resultMap1 = Map.empty[String,Int] //to get every file in the dir for(file <- files){ //get every file's data val data = Source.fromFile(file) //to get every word var str = data.getLines().flatMap(s => s.split(" ")) // judge and count str foreach { word => if(resultMap1.contains(word)){ resultMap1(word) += 1 }else{ resultMap1 += (word->1) } } } //print the result resultMap1.foreach(x => println(x._1,x._2)) println("--------------") //filter null val resultMap2 = resultMap1.filter(x => x._1.nonEmpty) resultMap2.foreach(x => println(x._1,x._2)) println("--------------") //sortBY val resultMap3 = resultMap2.toList.sortBy(_._2) resultMap3.foreach(x => println(x._1,x._2))
递归调用列出文件夹下 所有文件夹和 文件的数据
/*word Count*/import java.io.Fileimport scala.collection.mutable.Map val textFilePath ="D:/doc/spark/input" //for save the result words val resultMap1 = Map.empty[String,Int] /** * func to recursion dir's all files * */ def toGetAllFile(rootPath:File): Map[String,Int] ={ rootPath.listFiles().map(x => { //judge current path is dir or file if(x.isDirectory){ //dir then recursion toGetAllFile(x) }else{ //file ,then read data val source = Source.fromFile(x) //get every lines data and split into words val words = source.getLines().flatMap(s => s.split(" ")) //judge resultMap already have or new one words foreach { s => if(resultMap1.contains(s)){ //resultMap already have then add 1 resultMap1(s) +=1 }else{ //new one then add this parameter resultMap1 += (s -> 1) } } } }) return resultMap1 } toGetAllFile(new File(textFilePath)) //filter the null val stringToInt = resultMap1.filter(x => x._1.nonEmpty) //sort by val tuples1 = stringToInt.toList.sortBy(x => x._2) //print the result tuples1.foreach(x => println("tuples1 is: ",x._1,x._2)) println("--------------------------------") //sort by val tuples2 = stringToInt.toList.sortBy(x => -x._2) //print the result tuples2.foreach(x => println("tuples2 is: ",x._1,x._2))
发表评论
最新留言
路过,博主的博客真漂亮。。
[***.116.15.85]2025年04月28日 02时32分21秒
关于作者

喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
-- 愿君每日到此一游!
推荐文章
大数据在不同领域的应用
2021-05-14
页面置换算法
2021-05-14
推荐系统资料
2021-05-14
文件系统的层次结构
2021-05-14
减少磁盘延迟时间的方法
2021-05-14
vue(渐进式前端框架)
2021-05-14
权值初始化和与损失函数
2021-05-14
案例讨论
2021-05-14
传输层基本功能
2021-05-14
问题的计算复杂度:排序问题
2021-05-14
算法的伪码表示
2021-05-14
递推方程与算法分析
2021-05-14
主定理的应用
2021-05-14
动态规划算法的迭代实现
2021-05-14
最优装载问题
2021-05-14
最大团问题
2021-05-14
圆排列问题
2021-05-14
课程总结
2021-05-14
认识CMake及应用
2021-05-14
CMake的主体框架
2021-05-14