Sparse subspace clustering (SSC) is a representative data clustering paradigm that has been broadly applied in the unsupervised classification of hyperspectral images (HSIs). Existing SSC methods usually produce a subspace affinity matrix between representations of hyperspectral pixels first, followed by spectral clustering for the affinity matrix. To this end, the separated framework fails to exploit the dualities contained in both features and pixels or higher order entities at the same time, and thus, it is difficult to compute coclusters simultaneously. In addition, SSC methods often require expensive computational consumption and memory capacity to approximate the spectral decomposition of the affinity matrix, thus hindering the applicability of SSC methods for large HSIs. To overcome these limitations, we propose a novel graph convolutional sparse subspace coclustering (GCSSC) model with nonnegative orthogonal factorization for large HSIs in which affinity matrix learning and spectral coclustering are integrated into a unified optimizing model to obtain the optimal clustering results. Specifically, to form a more compact self-representation, the superpixel-based adaptive dictionary construction strategy is proposed instead of the global dictionary to precisely represent the pixels. To explore the spatial-contextual and spectral neighboring characteristics between dictionary atoms, graph convolution is incorporated into the dictionary atoms to aggregate the local neighborhood information, and the affinity matrix in the proposed coclustering framework is constructed under a joint sparsity constrained representation model. To reduce high computational consumption and memory capacity, a nonnegative orthogonal factorization constraint is proposed to offer an alternative spectral clustering for hyperspectral pixels and dictionary atoms simultaneously. The clustering performance of the proposed method is evaluated for three classical HSIs, and the experimental results illustrate that the proposed method is memory and computationally efficient and outperforms the state-of-the-art HSI clustering methods.