Final Project: NYC Parking Tickets
Final Project Big Data Technologies 201A
This is where the big data file is and the sample data
Move to Linux server
pscp -i c:\BigDataTechnolgies\ServerKey\tjpauley_azure.ppk C:\data\DateDim.csv tjpauley@IPAddress:
Note: I used the FAQ at putty's website
Linux Command
Copy to sandbox data folder
sudo cp DateDim.csv /data
Zeppelin Notebook
DateDims.toDF().registerTempTable("DateDim")
DateDims.show(1)
Zeppelin Notebook Commands
%md = markdown
%sql = SQL
http://IPAddress:9995 Note: Open port 9995
Jupyter Notebook Commands
http://IPAddress:9999 Note: Open port 9999
This is where the big data file is and the sample data
This is a cool map I couldn't get to work. Maybe next course I will attempt it
This is a way to split csv files with a windows machine
Moving local csv file from Local Drive to VM (Docker Container)
Command PromptMove to Linux server
pscp -i c:\BigDataTechnolgies\ServerKey\tjpauley_azure.ppk C:\data\DateDim.csv tjpauley@IPAddress:
Note: I used the FAQ at putty's website
Linux Command
Copy to sandbox data folder
sudo cp DateDim.csv /data
Zeppelin Notebook
Create data frame
case class DateDim ( DateNo: String, Weekday: String, Year: Integer, Month: Integer, Day: Integer)
val ParkingFourteens = spark.read.option("inferSchema", "true").option("header","true").csv("file:///data/DateDim.csv")DateDims.toDF().registerTempTable("DateDim")
DateDims.show(1)
Zeppelin Notebook Commands
%md = markdown
%sql = SQL
http://IPAddress:9995 Note: Open port 9995
Jupyter Notebook Commands
http://IPAddress:9999 Note: Open port 9999
Comments
Post a Comment