Mar 14, 2015 · 83 thoughts on “ Spark Architecture ” Raja March 17, 2015 at 5:06 pm. Nice observation.I feel that enough RAM size or nodes will save, despite using LRU cache.I think incorporating Tachyon helps a little too, like de-duplicating in-memory data and some more features not related like speed, sharing, safe. Feb 17, 2017 · Importing Data into Hive Tables Using Spark. Apache Spark is a modern processing engine that is focused on in-memory processing. Spark’s primary data abstraction is an immutable distributed collection of items called a resilient distributed dataset (RDD). Tables in Spark¶ Spark uses both HiveCatalog and HadoopTables to load tables. Hive is used when the identifier passed to load or save is not a path, otherwise Spark assumes it is a path-based table. To read and write to tables from Spark see: Reading a table in Spark; Appending to a table in Spark; Overwriting data in a table in Spark; Schemas¶ In another scenario, the Spark logs showed that reading every line of every file took a handful of repetitive operations-validate the file, open the file, seek to the next When processing the full set of logs we would see out-of-memory heap errors or complaints about exceeding Spark's data frame size.