Tracing the Milky Way’s Past with HDBSCAN: Finding the Ghosts of Ancient Galaxies
Astronomers believe that galaxies like the Milky Way grew over billions of years by swallowing smaller galaxies. The leftovers of these mergers now float in our galaxy’s outer regions, forming structures called stellar halos. Detecting these remnants is tricky because the stars often get mixed together and can resemble stars formed directly in the Milky Way. In this study, Andrea Sante and collaborators test whether a machine learning tool, called HDBSCAN, a clustering algorithm, can help untangle this complicated history.
Why Clustering?
When a smaller galaxy falls into the Milky Way, its stars share common traits: their chemical “fingerprints,” motions, and ages. In principle, these similarities mean they should clump together in “feature space,” a kind of abstract map of star properties. Clustering algorithms group such stars without needing to know the number of groups in advance. HDBSCAN is particularly appealing because it can handle oddly shaped groups and ignore random noise, unlike simpler methods such as k-means.
The Auriga Test Bed
To put HDBSCAN to the test, the authors used Auriga simulations, detailed computer models of galaxies like the Milky Way. These simulations include the physics of star formation, stellar explosions, and black hole activity. The team focused on three simulated galaxies (Au7, Au25, and Au27), each with a very different past. For example, Au7 had early, major mergers that are now well-mixed, while Au25 had a recent large accretion leaving behind clear streams. Au27 even had a Gaia-Enceladus-Sausage–like merger, similar to one thought to have shaped our Milky Way.
Teaching HDBSCAN to See
The researchers gave HDBSCAN a 12-dimensional feature space, including star motions, positions, ages, and chemical abundances, to maximize the information it could use. Because the algorithm’s performance depends heavily on its settings, they fine-tuned parameters using Optuna, a tool that searches systematically for the best configuration. They tested their results using metrics like the V-measure, which compares how well clusters align with the true progenitor galaxies in the simulations.
Results: A Mixed Bag
When applied to accreted-only haloes (stars from merged galaxies, without contamination from Milky Way-born stars), HDBSCAN successfully recovered most major merger events. It was especially good at identifying stellar streams, thin, elongated structures from more recent mergers, with high purity. However, recall (how complete each cluster is) was lower, often because large progenitors fragmented into multiple clusters or because many stars were marked as “noise.” In contrast, when in situ stars (formed inside the Milky Way) were added, the algorithm struggled more. Older mergers were often hidden within the Milky Way’s own population, leaving only the more recent events detectable. Still, the purity of identified clusters remained high.
What This Means for Galactic Archaeology
The study shows that HDBSCAN, when carefully optimized, can reconstruct significant parts of a galaxy’s assembly history. It works best for detecting younger, colder stellar streams and is less reliable for ancient, well-mixed debris. Contamination from stars formed inside the Milky Way is a major challenge, limiting how far back in time astronomers can trace mergers. The authors conclude that while clustering is a powerful tool, astronomers must use it carefully, combining it with other methods to fully piece together the Milky Way’s history.
Source: Sante