The wide usage of Apache-Spark is because of its working mechanism that it follows: The data structure of Spark is based on RDD (acronym of Resilient Distributed Dataset) RDD consists of unchangeable distributed collection of objects these datasets may contain any type of objects related to Python, Java, Scala and can also contain the user defined classes. Spark uses DAG scheduler, memory caching and query execution to process the data as fast as possible and thus for large data handling. As the processing of large amounts of data needs fast processing, the processing machine/package must be efficient to do so. Apache-Spark is an open-source framework for big data processing, used by professional data scientists and engineers to perform actions on large amounts of data.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |