Posts

Showing posts from December, 2017

Final Project: NYC Parking Tickets

Final Project Big Data Technologies 201A This is where the big data file is and the sample data https://www.kaggle.com/new-york-city/nyc-parking-tickets/data This is a cool map I couldn't get to work. Maybe next course I will attempt it http://www.bigendiandata.com/2017-06-27-Mapping_in_Jupyter/ This is a way to split csv files with a windows machine https://www.addictivetips.com/windows-tips/csv-splitter-for-windows/ Moving local csv file from Local Drive to VM (Docker Container) Command Prompt Move to Linux server pscp -i c:\BigDataTechnolgies\ServerKey\tjpauley_azure.ppk C:\data\DateDim.csv tjpauley@ IPAddress : Note: I used the FAQ at putty's website Linux Command Copy to sandbox data folder   sudo cp DateDim .csv /data Zeppelin Notebook Create data frame case class DateDim ( DateNo: String, Weekday: String, Year: Integer, Month: Integer, Day: Integer) val ParkingFourteens = spark.read.option("inferSchema", "true&