The past decade has witnessed a growing demand for drone-based fire detection systems, driven by escalating concerns about wildfires exacerbated by climate change, as corroborated by environmental studies. However, deploying existing drone-based fire detection systems in real-world operational conditions poses practical challenges, notably the intricate and unstructured environments and the dynamic nature of UAV-mounted cameras, often leading to false alarms and inaccurate detections. In this paper, we describe a two-stage framework for fire detection and geo-localization. The key features of the proposed work included the compilation of a large dataset from several sources to capture various visual contexts related to fire scenes. The bounding boxes of the regions of interest were labeled using three target levels, namely fire, non-fire, and smoke. The second feature was the investigation of YOLO models to undertake the detection and localization tasks. YOLO-NAS was retained as the best performing model using the compiled dataset with an average mAP50 of 0.71 and an F1_score of 0.68. Additionally, a fire localization scheme based on stereo vision was introduced, and the hardware implementation was executed on a drone equipped with a Pixhawk microcontroller. The test results were very promising and showed the ability of the proposed approach to contribute to a comprehensive and effective fire detection system.