This is exactly the kind of work that needs to happen before we start plugging GenAI into mission systems. Shows how dangerous it is to assume general performance equals military readiness. If a model fumbles MCDP 6, that’s not a bug, it’s a liability.
Also appreciate the nod to cost tradeoffs. There’s a place for smaller, cheaper models, but only if they’re tested where it counts. This is how we close the gap between Silicon Valley hype and actual warfighting utility.
For the test I opted to use Azure's AI Foundry APIs instead of going to DeepSeek's. If I had the compute locally accessible I would have gone the locally hosted route similar to how I ran the Llama 3.1-8B tests.
This is exactly the kind of work that needs to happen before we start plugging GenAI into mission systems. Shows how dangerous it is to assume general performance equals military readiness. If a model fumbles MCDP 6, that’s not a bug, it’s a liability.
Also appreciate the nod to cost tradeoffs. There’s a place for smaller, cheaper models, but only if they’re tested where it counts. This is how we close the gap between Silicon Valley hype and actual warfighting utility.
Great one, I can imagine doing a lot with this idea. I saw deepseek does very well, makes me wonder if a local version could be safe enough to use.
For the test I opted to use Azure's AI Foundry APIs instead of going to DeepSeek's. If I had the compute locally accessible I would have gone the locally hosted route similar to how I ran the Llama 3.1-8B tests.