The immunohistochemistry (IHC) test of biopsy tissue is crucial to develop targeted treatment and evaluate prognosis for cancer patients. The IHC staining slide is usually digitized into the whole-slide image (WSI) with gigapixels for quantitative image analysis. To perform a whole image prediction (e.g., IHC scoring, survival prediction, and cancer grading) from this kind of high-dimensional image, algorithms are often developed based on multi-instance learning (MIL) framework. However, the multi-scale information of WSI and the associations among instances are not well explored in existing MIL based studies. Inspired by the fact that pathologists jointly analyze visual fields at multiple powers of objective for diagnostic predictions, we propose a Pathologist-Tree Network (PTree-Net) to sparsely model the WSI efficiently in multi-scale manner. Specifically, we propose a Focal-Aware Module (FAM) that can approximately estimate diagnosis-related regions with an extractor trained using the thumbnail of WSI. With the initial diagnosis-related regions, we hierarchically model the multi-scale patches in a tree structure, where both the global and local information can be captured. To explore this tree structure in an end-to-end network, we propose a patch Relevance-enhanced Graph Convolutional Network (RGCN) to explicitly model the correlations of adjacent parent-child nodes, accompanied by patch relevance to exploit the implicit contextual information among distant nodes. In addition, tree-based self-supervision is devised to improve representation learning and suppress irrelevant instances adaptively. Extensive experiments are performed on a large-scale IHC HER2 dataset. The ablation study confirms the effectiveness of our design, and our approach outperforms state-of-the-art by a large margin.