This paper investigates the 4-D light field (LF) reconstruction from 2-D measurements captured by the coded aperture camera. To tackle such an ill-posed inverse problem, we propose a cycle-consistent reconstruction network (CR-Net). To be specific, based on the intrinsic linear imaging model of the coded aperture, CR-Net reconstructs an LF through progressively eliminating the residuals between the projected measurements from the reconstructed LF and input measurements. Moreover, to address the crucial issue of extracting representative features from high-dimensional LF data efficiently and effectively, we formulate the problem in a probability space and propose to approximate a posterior distribution of a set of carefully-defined LF processing events, including both layer-wise spatial-angular feature extraction and network-level feature aggregation. Through droppath from a densely-connected template network, we derive an adaptively learned spatial-angular fusion strategy, which is sharply contrasted with existing manners that combine spatial and angular features empirically. Extensive experiments on both simulated measurements and measurements by a real coded aperture camera demonstrate the significant advantage of our method over state-of-the-art ones, i.e., our method improves the reconstruction quality by 4.5 dB.