| | Enhancing DeepSeek Models with MLA and FP8 Optimizations in VLLM (neuralmagic.com) |
|
2 points by hochmartinez 53 days ago | past
|
| | Multimodal Model Quantization Support Through LLM Compressor by Neural Magic (neuralmagic.com) |
|
1 point by BUFU 60 days ago | past
|
| | What happens if we remove 50 percent of Llama? (neuralmagic.com) |
|
231 points by BUFU 4 months ago | past | 132 comments
|
| | We Ran Over Half a Million Evaluations on Quantized LLMs (neuralmagic.com) |
|
12 points by eldar_ciki 6 months ago | past | 2 comments
|
| | Pushing the Boundaries of Mixed-Precision LLM Inference with Marlin (neuralmagic.com) |
|
2 points by mwitiderrick 10 months ago | past
|
| | Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse (neuralmagic.com) |
|
238 points by mwitiderrick on Nov 23, 2023 | past | 26 comments
|
| | Build Scalable NLP and Computer Vision Pipelines with DeepSparse (neuralmagic.com) |
|
1 point by mwitiderrick on June 8, 2023 | past
|
| | Achieving 1,000X CPU Performance Boost with Sparse Models in MLPerf (neuralmagic.com) |
|
1 point by NM_Ricky on April 5, 2023 | past | 1 comment
|
| | SparseGPT: Remove 100B Parameters for Free (neuralmagic.com) |
|
3 points by homarp on March 24, 2023 | past | 1 comment
|
| | SparseGPT: Remove 100B Parameters for Free (neuralmagic.com) |
|
2 points by todsacerdoti on March 24, 2023 | past
|
| | Sparsify Image Classification Models Faster with SparseML and Deep Lake (neuralmagic.com) |
|
1 point by mwitiderrick on March 16, 2023 | past
|
| | YOLOv8 Detection 10x Faster with DeepSparse (neuralmagic.com) |
|
1 point by mwitiderrick on Jan 19, 2023 | past
|
| | Image Segmentation: Your Ultimate Guide to Easy Deployment and Fast Inferencing (neuralmagic.com) |
|
2 points by mwitiderrick on Jan 5, 2023 | past | 2 comments
|
| | Search Documents Quickly with Extractive Question Answering (neuralmagic.com) |
|
1 point by mwitiderrick on Dec 15, 2022 | past | 1 comment
|
| | Accelerate Customer Review Classification with Sparse Transformers (neuralmagic.com) |
|
1 point by mwitiderrick on Nov 22, 2022 | past | 1 comment
|
| | Neural Network inference on commodity CPUs using sparsity (neuralmagic.com) |
|
2 points by atylerrice on Sept 21, 2022 | past | 3 comments
|
| | Using compound sparsification for faster BERT on CPUs with better accuracy (neuralmagic.com) |
|
4 points by szpcela on Sept 24, 2021 | past
|
| | YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance (neuralmagic.com) |
|
121 points by T-A on Sept 10, 2021 | past | 53 comments
|
| | Show HN: YOLOv3 – Pruning and Quantizing to Improve Object Detection Performance (neuralmagic.com) |
|
4 points by markurtz on June 23, 2021 | past
|
| | A Software Architecture for the Future of ML (neuralmagic.com) |
|
2 points by beefman on May 29, 2021 | past
|