29
loading...
This website collects cookies to deliver better user experience
glob
, a package that is preinstalled that searches for files in relative paths.
For details, see this section for documentation.
pandas
, a package used for data manipulation. To install, type on the command line, pip install pandas
.
For details, see the pandas site.
virtualenvwrapper
, a package that sets up virtual environments for Python. For details, see the related documentation.
This Kaggle dataset for the CSV data.
Create a project called etl_car_sales
with PyCharm.
Create a virtual environment with the command line mkvirtualenv etl_car_sales
.
Install pandas
and virtualenwrapper
.
Extract the zip file and move the CSV files for car_sales
to your etl_car_sales
directory, like this:
extract_from_csv
function to read the CSV file
in your main directory and then return a Dataframe with the pandas
function read_csv
.extract
function to read the data from the extracted csv file and append the data to the Dataframe that
is being return the extract_from_csv
function.extracted_data
variable and the result would read like this: