Import / Export PostgreSQL data to / from SparkR dataframe

Connecting PostgreSQL table to SparkR

To connect and load data from PostgreSQL table into Spark dataframe, we need PostgreSQL JDBC connector, so first download PostgreSQL connector JAR file from official PostgreSQL site

Put the downloaded jar ( which is postgresql-9.4.1212.jre6.jar in my case) file in “jars\” directory within your Spark Directory ( in my case, path of jars folder is F:\spark-2.1.0\jars ).

So, now coming back to PostgreSQL, I have created a new database in PostgreSQL called dataxone, currently it doesn’t have any data.
So lets grab some dummy data generated from https://www.mockaroo.com/. Our data will be in CSV format. Now we will first dump this CSV in PostgreSQL database, by first importing data in SparkR dataframe, and export directly to PostgreSQL using write.jdbc() function in SparkR.

Importing data from CSV in SparkR Dataframe using Read.df()

Read https://dataxone.com/import-export-csv-data-sparkr-dataframe/ post to get a detail idea how to import CSV in SparkR dataframe.

So, code to import data in PostgreSQL from CSV will look something like:

In above code, first we set some variables, initiated sparkR session, then read CSV using read.df() function.
Then created JDBC URL having details of server as well as Database. Next we passed dataframe, along with JDBCURL and user details in write.jdbc() function, which ultimately will create a table, called “rawdata” under dataxone database. Here I need not have to create table explicitly in dataxone, write.jdbc() will do it for me. But data types of new table might be taken care by you.

Exporting data from PostgreSQL table to SparkR Dataframe and write it on CSV

Following code will read data from PostgreSQL, into Spark Dataframe, then will dump it in CSV file on respective path:

We will basically use read.jdbc() to read data from PostgreSQL, then will write it on CSV.

— Import / Export PostgreSQL data to / from SparkR dataframe —

You may also like...