1, Hortonwork and Cloudera comparison
In addition to functionality, maintenance performance is also required to be compared
2, the environment deployment needs refinement
Which components are deployed
Which WebService services are deployed
How to plan between the nodes
How memory disk resources are allocated
Whether to turn on component high availability
If the open component is highly available, consider deploying 2 sets of Zookeeper
How offline clustering is interoperable with online clustering
3. Storage lists in big data components should be listed
List the stored tables
Gives the data logic for each table, that is, from and to the data source, how to use
Whether you need to consider historical data, whether to consider using a zipper table
4. Data Migration Solution needs optimization
Whether to stop service when migrating (NO)
What data needs to be migrated and which can be considered without migration
How data in MySQL is migrated
How data is migrated in Hive, HBase, HDFs
Why use Sqoop
Why not distcp?
Duration estimate for Migration
How to verify after migration
"Data Migration" considerations