Introduction
Principles of Distributed Computing
Apache Spark
Hadoop
Principles of Data Serialization
How data object is passed over the network
Serialization of objects
Serialization approaches
Thrift
Protocol Buffers
Apache Avro
data structure
size, speed, format characteristics
persistent data storage
integration with dynamic languages
dynamic typing
schemas
untagged data
change management
Data Serialization and Distributed Computing
Avro as a subproject of Hadoop
Java serialization
Hadoop serialization
Avro serialization
Using Avro with
Hive (AvroSerDe)
Pig (AvroStorage)
Porting Existing RPC Frameworks
Summary and Conclusion
|