In this paper, a full-reference video quality assessment (VQA) model is designed for the perceptual quality assessment of the screen content videos (SCVs), called the hybrid spatiotemporal feature-based model (HSFM). The SCVs are of hybrid structure including screen and natural scenes, which are perceived by the human visual system (HVS) with different visual effects. With this consideration, the three dimensional Laplacian of Gaussian (3D-LOG) filter and three dimensional Natural Scene Statistics (3D-NSS) are exploited to extract the screen and natural spatiotemporal features, based on the reference and distorted SCV sequences separately. The similarities of these extracted features are then computed independently, followed by generating the distorted screen and natural quality scores for screen and natural scenes. After that, an adaptive screen and natural quality fusion scheme through the local video activity is developed to combine them for arriving at the final VQA score of the distorted SCV under evaluation. The experimental results on the Screen Content Video Database (SCVD) and Compressed Screen Content Video Quality (CSCVQ) databases have shown that the proposed HSFM is more in line with the perceptual quality assessment of the SCVs perceived by the HVS, compared with a variety of classic and latest IQA/VQA models.