Create Spark Dataset from Cassandra Table

Views: 215   

Sample Program :

SparkSession sparkSession = SparkSession.builder().appName("cassandratable").master("local[2]").getOrCreate();

Map<String, String> cassandraMap = new HashMap<String, String>();

cassandraMap.put(spark.cassandra.connection.host, <cassandraHostName>); //localhost
cassandraMap.put(spark.cassandra.connection.port, <cassandraPort>); //9042
cassandraMap.put(keyspace, <cassandraKeySpace>);
cassandraMap.put(table, <cassandraTable>);

Dataset<Row> cassandraTableData = sparkSession.read().format(org.apache.spark.sql.cassandra).options(cassandraMap).load();

System.out.println("Column Names : " + Arrays.asList(cassandraTableData .columns()));
System.out.println("Table Schema : " + cassandraTableData.schema());

Bullet Points:
1.) In above code initically I have created SparkSession to run spark job in standalone mode.
2.) In second step we provided cassandra table details like its hostname, portno, keyspace, tableName
3.) In third step, we read cassandra table using 'org.apache.spark.sql.cassandra' format and load cassandra table into cassandraTableData Dataset using load command.
4.) We can find the table column names using columns() method
5.) We can find the table schema using schema() method

On By