Segmentation of scanning electron microscopy (SEM) images is critical yet time-consuming for geological analyses, as it needs to differentiate the boundaries for different mineral objects to facilitate subsequent analyses, such as porosity calculation. Recently, various machine learning methods, especially convolutional neural networks (CNNs), have been explored to segment SEM images of fine-grained shale samples. However, we found that general CNNs do not yield optimal performance due to insufficient training data and imbalanced objects in SEM images. This work has revised the U-Net architecture, a popular approach for biomedical image analyses, by incorporating a loss function that addresses the imbalance issue. Furthermore, we used the ensemble learning method to train multiple models and combined the results to improve the overall performance of segmentation. We prepared 2162 sub-images from raw SEM images in our experiments and divided them into training, validation, and testing datasets. The overall results show that our method improves the average Intersection over Union (IOU) of mineral objects from 0.49 to 0.58, compared to the original U-Net model. Our method can clearly distinguish each object from others with boundaries, even in highly imbalanced images. Training our models takes less than 3 mins using a single GPU, while manual labeling can take up to 3 hrs for each image. Therefore, the method helps geoscientists gain insights quickly and effectively by building neural network models from a small dataset of SEM images.