Posts
MijazzChan
Cancel

Try Understanding Lombok What is Lombok Project Lombok MVN Repo 大概一年多以前我接触Sping Boot的设计模式时, 了解到Entity, Service, Repository, 等层次设计的时候, POJO什么的. 当时的项目用的是Spring Data JPA做的持久层, 中间经过几层的数据, 对象...

Code Scoping in Data Practicing Personal Notes cast() in spark scala Found useful when I tried to import or operate on a dataFrame. Source code is as follows /** * Casts the column to ...

Data Practicing-EP8 基于日期的话, 因为有index的缘故, 按日分类和按月分类都较为方便. days = ['Mon','Tue','Wed', 'Thur', 'Fri', 'Sat', 'Sun'] fullData.groupby([fullData.index.dayofweek]).size().plot(kind='barh') plt.yticks(n...

Data Practicing-EP7 Visualization in Python Pandas和notebook一起用, 在这个先被Spark处理过的几百万行的数据集上做可视化还是感觉方便些. 先做个依赖导入和数据清洗吧 # -*- coding: utf-8 -*- # Python Version == 3.8.6 import os import pandas as pd...

Data Practicing-EP6 Introduce pyspark Scala和Python下对于Spark的操作还是有很多相似的地方的. 迁移到PySpark下, 因为toPandas和collect() => List这两个pyspark独有的特性, 使得可视化较Scala下方便. 不过要注意的是Spark.DataFrame和Pandas.DataFrame是两个...

Data Practicing-EP5 Get Weather Data StaticTool.java - +(Add Row) + public static final String WEATHER_DATA = DATA_PATH + "temperature.full.csv"; MergeWeather.scala - 1 package edu.zstu.mi...

Data Practicing-EP4 Find Data Chicago Crime Data is from CHICAGO DATA PORTAL Visit Here 这次使用的是Chicago的Crime Data. 从2001年至最近的. [email protected]  ~/devEnvs  wc -l chicagoCrimeData.csv 7212...

Data Practicing-EP3 Introduce Spark 这里贴出几个官方文档 Spark Overview Java API Docs Scala API Docs Spark SQL Docs 这里只记录一下SparkRDD, RDD -> Resilient Distributed Datasets. 它是一种可扩展的弹性分布...

Data Practicing-EP2 Testing Spark EP0中的spark [email protected]  ~/devEnvs  ll -a total 161M drwxr-xr-x 8 mijazz mijazz 4.0K Nov 27 17:27 . drwx------ 38 mijazz mijazz 4.0K Nov 27 17:27 .. drwxr-...

Data Practicing-EP1 Testing Hadoop EP0中给出的hadoop-3.1.4.tar.gz [email protected]  ~/devEnvs  tar -xf ./hadoop-3.1.4.tar.gz # 文件结构 [email protected]  ~/devEnvs  tree -L 1 ./hadoop-3.1.4 ./hadoop-3...