Recent technological advances have led to advanced parallel computing hardware and complex I/O workloads, com- prising Machine Learning, Deep Learning, and other artificial intelligence techniques. These advances have made the existing parallel I/O stack more complex and challenging to tune which if not optimized properly, can lead to massive overheads and per- formance degradation. With these ever-increasing complexities of the I/O stack deployed on large-scale systems, one needs to have an in-depth understanding of the I/O behavior of these systems and be aware of the performance modeling and prediction tools required to evaluate and optimize I/O. Therefore, it is critical to have a comprehensive study that end users can use as a guide to evaluate and optimize parallel I/O in their applications. This paper presents such a study by surveying the current landscape of parallel I/O characterization and evaluation on large-scale HPC systems. By taking a deep dive into th e different layers of the I/O stack, this paper shows how the different access patterns are shaped as an I/O request traverses down the I/O stack and what optimizations can be made to these access patterns. The paper also looks at different workload generation methodologies and the different profiling and tracing tools that can collect performance statistics for these workloads. It also discusses different parallel I/O evaluation techniques such as statistical analysis, machine learning, and replay-based modeling. Lastly, it ties this whole discussion with the current active area of research in parallel I/O: automatically evaluating, analyzing, and optimizing parallel I/O in applications without involving an I/O expert in the loop.