SciKnowEval
Introduction
博学之 ,审问之 ,慎思之 ,明辨之 ,笃行之。
—— 《礼记 · 中庸》
The Scientific Knowledge Evaluation (SciKnowEval) benchmark for Large Language Models (LLMs) is inspired by the profound principles outlined in the “Doctrine of the Mean” from ancient Chinese philosophy. This benchmark is designed to assess LLMs based on their proficiency in Studying Extensively, Enquiring Earnestly, Thinking Profoundly, Discerning Clearly, and Practicing Assiduously. Each of these dimensions offers a unique perspective on evaluating the capabilities of LLMs in handling scientific knowledge.
L1: Studying Extensively
(Knowledge Memory)
This dimension evaluates the breadth of an LLM’s knowledge across various scientific domains. It measures the model’s ability to remember a wide range of scientific concepts.
L2: Enquiring Earnestly
(Knowledge Comprehension)
This aspect focuses on the LLM’s capacity for deep enquiry and exploration within scientific contexts, such as analyzing scientific texts, identifying key concepts, and questioning relevant information.
L3: Thinking Profoundly
(Knowledge Reasoning)
This criterion examines the model’s capacity for critical thinking, logical deduction, numerical calculation, function prediction, and the ability to engage in reflective reasoning to solve problems.
L4: Discerning Clearly
(Knowledge Discernment )
This aspect evaluates the LLM’s ability to make correct, secure, and ethical decisions based on scientific knowledge, including assessing the harmfulness and toxicity of information, and understanding the ethical implications and safety concerns related to scientific endeavors.
L5: Practicing Assiduously
(Knowledge Application)
The final dimension assesses the LLM’s capability to apply scientific knowledge effectively in real-world scenarios, such as analyzing complex scientific problems and creating innovative solutions.
SciKnowEval represents a comprehensive benchmark for assessing the capability of LLMs in processing and utilizing scientific knowledge. It aims to promote the development of scientific LLMs that not only possess extensive knowledge but also demonstrate ethical discernment and practical applicability, ultimately contributing to the advancement of scientific research.
News
2024-09-20: Release the SciKnowEval report of OpenAI o1.
2024-07-28: Add the Physics and Materials to SciKnowEval.
2024-06-21: The first release of SciKnowEval.
Leaderboards
Last updated: 22 July, 2024
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | Overall | Biology | Chemistry | Material | Physics |
---|---|---|---|---|---|---|---|---|---|---|
331 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Claude-3.5-Sonnet-20240620 | 1 | 5.36 | 2.83 | 3.06 | 3.42 |
332 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | GPT-4o-2024-05-13 | 2 | 5.20 | 5.50 | 4.18 | 2.83 |
333 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen2-72B-Inst | 3 | 8.12 | 7.17 | 4.88 | 3.17 |
334 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | GPT-4-Turbo-2024-04-09 | 4 | 8.48 | 7.17 | 5.53 | 5.17 |
335 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Gemini1.5-Pro-latest | 5 | 7.88 | 6.58 | 6.18 | 7.42 |
336 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Llama3-70B-Inst | 6 | 8.12 | 7.13 | 5.71 | 7.50 |
337 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen-Max | 7 | 10.44 | 8.04 | 7.65 | 6.33 |
338 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Claude3-Sonnet-20240229 | 8 | 8.24 | 9.63 | 8.24 | 10.58 |
339 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | SciKnowMind-7b-v0.1 | 9 | 8.56 | 8.96 | 11.53 | 10.00 |
340 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen2-7B-Inst | 10 | 12.48 | 13.63 | 12.06 | 9.17 |
341 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen1.5-14B-Chat | 11 | 13.16 | 12.21 | 12.35 | 11.92 |
342 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | GPT-3.5-Turbo-0125 | 12 | 12.52 | 12.42 | 12.82 | 12.92 |
343 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Llama3-8B-Inst | 13 | 13.12 | 12.21 | 14.88 | 15.75 |
344 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | ChemDFM-13B | 14 | 15.72 | 12.67 | 15.00 | 16.33 |
345 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | ChemLLM-20B-Chat | 15 | 14.32 | 14.83 | 16.06 | 17.25 |
346 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | MolInst-Llama3-8B | 16 | 16.00 | 15.00 | 16.06 | 16.58 |
347 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Qwen1.5-7B-Chat | 17 | 15.36 | 15.29 | 18.12 | 18.17 |
348 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Gemma1.1-7B-Inst | 18 | 19.44 | 18.42 | 15.94 | 16.33 |
349 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Mistral-7B-Inst-v0.2 | 19 | 19.60 | 20.25 | 14.12 | 15.83 |
350 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | ChatGLM3-6B | 20 | 17.04 | 20.04 | 20.35 | 21.17 |
351 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Galactica-30B | 21 | 17.84 | 19.63 | 21.24 | 19.58 |
352 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Llama2-13B-Chat | 22 | 18.24 | 19.92 | 20.00 | 21.33 |
353 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | SciGLM-6B | 23 | 20.40 | 20.17 | 20.29 | 22.83 |
354 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | ChemLLM-7B-Chat | 24 | 19.60 | 20.79 | 22.53 | 20.25 |
355 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | Galactica-6.7B | 25 | 21.28 | 21.04 | 22.47 | 21.33 |
356 | user | 25/07/2024 02:24 AM | user | 25/07/2024 02:24 AM | LlaSMol-Mistral-7B | 26 | 22.16 | 22.17 | 23.47 | 24.17 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | L1 | L2 | L3 | L4 | L5 | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|
103 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Claude-3.5-Sonnet-20240620 | 3.00 | 6.13 | 8.86 | 3.00 | 1.60 | 5.36 |
104 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | GPT-4o-2024-05-13 | 4.50 | 4.25 | 8.43 | 5.33 | 2.40 | 5.20 |
105 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Qwen2-72B-Inst | 3.50 | 5.88 | 13.86 | 5.33 | 7.20 | 8.12 |
106 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | GPT-4-Turbo-2024-04-09 | 8.50 | 8.63 | 10.71 | 6.67 | 6.20 | 8.48 |
107 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Gemini1.5-Pro-latest | 8.00 | 7.88 | 8.14 | 4.67 | 9.40 | 7.88 |
108 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Llama3-70B-Inst | 13.00 | 5.00 | 12.00 | 7.00 | 6.40 | 8.12 |
109 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Qwen-Max | 4.50 | 11.50 | 11.29 | 7.33 | 11.80 | 10.44 |
110 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Claude3-Sonnet-20240229 | 11.00 | 6.88 | 12.43 | 8.00 | 3.60 | 8.24 |
111 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | SciKnowMind-7b-v0.1 | 2.00 | 10.38 | 5.14 | 11.67 | 11.20 | 8.56 |
112 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Qwen2-7B-Inst | 9.50 | 11.50 | 14.57 | 9.67 | 14.00 | 12.48 |
113 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Qwen1.5-14B-Chat | 10.50 | 15.38 | 12.71 | 14.67 | 10.40 | 13.16 |
114 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | GPT-3.5-Turbo-0125 | 7.00 | 12.00 | 16.57 | 9.00 | 12.00 | 12.52 |
115 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Llama3-8B-Inst | 14.00 | 9.38 | 16.57 | 12.00 | 14.60 | 13.12 |
116 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | ChemDFM-13B | 11.50 | 15.88 | 16.43 | 15.00 | 16.60 | 15.72 |
117 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | ChemLLM-20B-Chat | 18.00 | 10.50 | 14.14 | 20.33 | 15.60 | 14.32 |
118 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | MolInst-Llama3-8B | 19.50 | 14.13 | 11.71 | 17.67 | 22.60 | 16.00 |
119 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Qwen1.5-7B-Chat | 14.50 | 15.25 | 19.14 | 13.67 | 11.60 | 15.36 |
120 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Gemma1.1-7B-Inst | 21.50 | 22.75 | 16.00 | 20.00 | 17.80 | 19.44 |
121 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Mistral-7B-Inst-v0.2 | 16.50 | 18.38 | 20.14 | 18.33 | 22.80 | 19.60 |
122 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | ChatGLM3-6B | 17.50 | 19.63 | 15.43 | 15.33 | 16.00 | 17.04 |
123 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Galactica-30B | 17.00 | 19.13 | 12.00 | 22.67 | 21.40 | 17.84 |
124 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Llama2-13B-Chat | 25.00 | 16.63 | 22.71 | 15.00 | 13.80 | 18.24 |
125 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | SciGLM-6B | 21.00 | 21.63 | 18.57 | 20.00 | 21.00 | 20.40 |
126 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | ChemLLM-7B-Chat | 22.00 | 19.63 | 15.86 | 22.00 | 22.40 | 19.60 |
127 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | Galactica-6.7B | 23.50 | 22.13 | 16.57 | 24.00 | 24.00 | 21.28 |
128 | user | 25/07/2024 02:20 AM | user | 25/07/2024 02:20 AM | LlaSMol-Mistral-7B | 25.50 | 22.25 | 19.00 | 25.67 | 23.00 | 22.16 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | L1 | L2 | L3 | L4 | L5 | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|
103 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Claude-3.5-Sonnet-20240620 | 1.67 | 3.71 | 2.14 | 2.33 | 3.75 | 2.83 |
104 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | GPT-4o-2024-05-13 | 2.33 | 4.86 | 6.71 | 10.67 | 3.00 | 5.50 |
105 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen2-72B-Inst | 5.67 | 4.29 | 10.86 | 6.00 | 7.75 | 7.17 |
106 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | GPT-4-Turbo-2024-04-09 | 6.67 | 2.71 | 11.43 | 9.67 | 6.00 | 7.17 |
107 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Gemini1.5-Pro-latest | 5.33 | 6.86 | 6.29 | 4.00 | 9.50 | 6.58 |
108 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Llama3-70B-Inst | 6.00 | 6.57 | 8.57 | 3.67 | 9.00 | 7.13 |
109 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen-Max | 7.67 | 5.43 | 10.00 | 7.67 | 9.75 | 8.04 |
110 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Claude3-Sonnet-20240229 | 10.67 | 7.86 | 12.00 | 12.00 | 6.00 | 9.63 |
111 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | SciKnowMind-7b-v0.1 | 2.33 | 12.29 | 2.43 | 10.67 | 18.25 | 8.96 |
112 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen2-7B-Inst | 15.33 | 11.71 | 16.14 | 8.67 | 15.00 | 13.63 |
113 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen1.5-14B-Chat | 14.67 | 11.86 | 10.14 | 13.67 | 13.50 | 12.21 |
114 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | GPT-3.5-Turbo-0125 | 14.33 | 13.00 | 12.43 | 12.33 | 10.00 | 12.42 |
115 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Llama3-8B-Inst | 11.33 | 11.00 | 12.86 | 12.00 | 14.00 | 12.21 |
116 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChemDFM-13B | 12.00 | 14.86 | 12.29 | 14.00 | 9.00 | 12.67 |
117 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChemLLM-20B-Chat | 15.67 | 12.57 | 15.57 | 22.33 | 11.25 | 14.83 |
118 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | MolInst-Llama3-8B | 14.67 | 14.86 | 11.43 | 15.00 | 21.75 | 15.00 |
119 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen1.5-7B-Chat | 16.33 | 15.43 | 13.57 | 17.33 | 15.75 | 15.29 |
120 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Gemma1.1-7B-Inst | 23.00 | 21.57 | 18.14 | 16.00 | 11.75 | 18.42 |
121 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Mistral-7B-Inst-v0.2 | 20.33 | 20.00 | 21.14 | 12.67 | 24.75 | 20.25 |
122 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChatGLM3-6B | 21.33 | 21.00 | 20.43 | 17.67 | 18.50 | 20.04 |
123 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Galactica-30B | 12.33 | 22.43 | 18.29 | 22.33 | 20.50 | 19.63 |
124 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Llama2-13B-Chat | 24.67 | 19.86 | 21.14 | 15.00 | 18.00 | 19.92 |
125 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | SciGLM-6B | 21.00 | 20.86 | 20.71 | 21.67 | 16.25 | 20.17 |
126 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChemLLM-7B-Chat | 22.00 | 21.29 | 18.43 | 23.67 | 21.00 | 20.79 |
127 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Galactica-6.7B | 18.67 | 23.86 | 18.00 | 18.67 | 25.00 | 21.04 |
128 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | LlaSMol-Mistral-7B | 25.33 | 24.71 | 21.86 | 26.00 | 13.00 | 22.17 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | L1 | L2 | L3 | L4 | L5 | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|
103 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Claude-3.5-Sonnet-20240620 | 3.50 | 3.50 | 2.67 | 4.50 | 3.00 | 3.42 |
104 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | GPT-4o-2024-05-13 | 1.00 | 2.50 | 4.33 | 4.00 | 1.00 | 2.83 |
105 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen2-72B-Inst | 4.50 | 2.75 | 2.33 | 3.00 | 5.00 | 3.17 |
106 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | GPT-4-Turbo-2024-04-09 | 5.00 | 6.75 | 5.00 | 4.00 | 2.00 | 5.17 |
107 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Gemini1.5-Pro-latest | 3.50 | 12.50 | 4.33 | 7.50 | 4.00 | 7.42 |
108 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Llama3-70B-Inst | 6.50 | 9.75 | 6.00 | 6.50 | 7.00 | 7.50 |
109 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen-Max | 7.00 | 5.25 | 7.67 | 6.00 | 6.00 | 6.33 |
110 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Claude3-Sonnet-20240229 | 10.50 | 11.00 | 11.00 | 10.50 | 8.00 | 10.58 |
111 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | SciKnowMind-7b-v0.1 | 5.50 | 14.50 | 11.00 | 2.50 | 13.00 | 10.00 |
112 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen2-7B-Inst | 9.50 | 9.25 | 9.67 | 7.50 | 10.00 | 9.17 |
113 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen1.5-14B-Chat | 13.00 | 11.75 | 11.67 | 12.00 | 11.00 | 11.92 |
114 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | GPT-3.5-Turbo-0125 | 11.00 | 14.50 | 12.33 | 14.50 | 9.00 | 12.92 |
115 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Llama3-8B-Inst | 13.00 | 13.50 | 19.00 | 13.00 | 26.00 | 15.75 |
116 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChemDFM-13B | 17.00 | 15.50 | 16.33 | 16.50 | 18.00 | 16.33 |
117 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChemLLM-20B-Chat | 14.50 | 12.50 | 24.00 | 15.50 | 25.00 | 17.25 |
118 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | MolInst-Llama3-8B | 12.50 | 18.00 | 17.33 | 17.50 | 15.00 | 16.58 |
119 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen1.5-7B-Chat | 17.00 | 15.50 | 20.33 | 18.50 | 24.00 | 18.17 |
120 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Gemma1.1-7B-Inst | 19.00 | 19.50 | 12.33 | 15.50 | 12.00 | 16.33 |
121 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Mistral-7B-Inst-v0.2 | 19.50 | 14.25 | 17.00 | 14.50 | 14.00 | 15.83 |
122 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChatGLM3-6B | 23.00 | 21.00 | 20.33 | 20.00 | 23.00 | 21.17 |
123 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Galactica-30B | 19.00 | 22.75 | 14.33 | 23.50 | 16.00 | 19.58 |
124 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Llama2-13B-Chat | 23.00 | 19.75 | 21.33 | 23.50 | 20.00 | 21.33 |
125 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | SciGLM-6B | 24.00 | 22.25 | 24.33 | 23.50 | 17.00 | 22.83 |
126 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChemLLM-7B-Chat | 21.00 | 18.75 | 21.67 | 21.00 | 19.00 | 20.25 |
127 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Galactica-6.7B | 23.00 | 25.00 | 13.67 | 23.50 | 22.00 | 21.33 |
128 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | LlaSMol-Mistral-7B | 26.00 | 24.75 | 22.00 | 26.00 | 21.00 | 24.17 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | L1 | L2 | L3 | L4 | L5 | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|
103 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Claude-3.5-Sonnet-20240620 | 1.00 | 4.50 | 2.00 | 3.00 | 2.67 | 3.06 |
104 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | GPT-4o-2024-05-13 | 2.00 | 5.00 | 4.00 | 3.50 | 4.00 | 4.18 |
105 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen2-72B-Inst | 4.00 | 6.67 | 2.80 | 1.50 | 7.33 | 4.88 |
106 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | GPT-4-Turbo-2024-04-09 | 5.00 | 5.67 | 4.60 | 8.50 | 5.00 | 5.53 |
107 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Gemini1.5-Pro-latest | 6.00 | 7.17 | 5.00 | 6.00 | 6.33 | 6.18 |
108 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Llama3-70B-Inst | 7.00 | 4.50 | 7.00 | 4.50 | 6.33 | 5.71 |
109 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen-Max | 8.00 | 6.33 | 7.40 | 10.00 | 9.00 | 7.65 |
110 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Claude3-Sonnet-20240229 | 9.00 | 9.83 | 8.20 | 7.50 | 5.33 | 8.24 |
111 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | SciKnowMind-7b-v0.1 | 3.00 | 13.50 | 11.00 | 7.00 | 14.33 | 11.53 |
112 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen2-7B-Inst | 10.00 | 10.83 | 14.20 | 13.50 | 10.67 | 12.06 |
113 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen1.5-14B-Chat | 11.00 | 12.00 | 15.00 | 12.50 | 9.00 | 12.35 |
114 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | GPT-3.5-Turbo-0125 | 12.00 | 12.67 | 18.40 | 7.00 | 8.00 | 12.82 |
115 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Llama3-8B-Inst | 13.00 | 15.50 | 13.20 | 9.50 | 20.67 | 14.88 |
116 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChemDFM-13B | 16.00 | 14.00 | 15.00 | 14.50 | 17.00 | 15.00 |
117 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChemLLM-20B-Chat | 18.00 | 15.00 | 16.60 | 18.00 | 15.33 | 16.06 |
118 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | MolInst-Llama3-8B | 14.00 | 17.67 | 13.20 | 16.50 | 18.00 | 16.06 |
119 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Qwen1.5-7B-Chat | 15.00 | 15.67 | 20.00 | 17.00 | 21.67 | 18.12 |
120 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Gemma1.1-7B-Inst | 20.00 | 16.17 | 14.00 | 18.00 | 16.00 | 15.94 |
121 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Mistral-7B-Inst-v0.2 | 19.00 | 12.17 | 14.00 | 14.50 | 16.33 | 14.12 |
122 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChatGLM3-6B | 22.00 | 20.83 | 20.40 | 19.50 | 19.33 | 20.35 |
123 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Galactica-30B | 17.00 | 22.67 | 21.20 | 22.50 | 19.00 | 21.24 |
124 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Llama2-13B-Chat | 25.00 | 18.50 | 21.80 | 21.50 | 17.33 | 20.00 |
125 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | SciGLM-6B | 24.00 | 20.00 | 20.60 | 22.50 | 17.67 | 20.29 |
126 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | ChemLLM-7B-Chat | 23.00 | 21.00 | 25.20 | 22.50 | 21.00 | 22.53 |
127 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | Galactica-6.7B | 21.00 | 24.33 | 20.20 | 24.50 | 21.67 | 22.47 |
128 | user | 25/07/2024 02:21 AM | user | 25/07/2024 02:21 AM | LlaSMol-Mistral-7B | 26.00 | 24.33 | 20.80 | 26.00 | 23.67 | 23.47 |
Datasets
Last updated: 22 July, 2024
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Ability Level | Task Name | Task Type | Data Source | #Questions |
---|---|---|---|---|---|---|---|---|---|
1 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L1 | Biological Literature QA | MCQ | Literature Corpus | 14,869 |
2 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L1 | Protein Property Identification | MCQ | UniProtKB | 1,500 |
3 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Drug-Drug Relation Extraction | RE | Bohrium | 464 |
4 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Biomedical Judgment and Interpretation | T/F | PubMedQA | 904 |
5 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Compound-Disease Relation Extraction | RE | Bohrium | 867 |
6 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Gene-Disease Relation Extraction | RE | Bohrium | 203 |
7 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Detailed Understanding | MCQ | LibreTexts | 828 |
8 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Text Summary | GEN | LibreTexts | 1,291 |
9 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Hypothesis Verification | T/F | LibreTexts | 619 |
10 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L2 | Reasoning and Interpretation | MCQ | LibreTexts | 647 |
11 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Solubility Prediction | MCQ | PEER, DeepSol | 201 |
12 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | $\beta$-lactamase Activity Prediction | MCQ | PEER, Envision | 209 |
13 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Fluorescence Prediction | MCQ | PEER, Sarkisyan's | 205 |
14 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | GB1 Fitness Prediction | MCQ | PEER, FLIP | 201 |
15 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Stability Prediction | MCQ | PEER, Rocklin's | 203 |
16 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Protein-Protein Interaction | MCQ | STRING, SHS27K, SHS148K | 205 |
17 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L3 | Biological Calculation | MCQ | MedMCQA, SciEval, MMLU | 60 |
18 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L4 | Biological Harmful QA | GEN | Self-generated | 297 |
19 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L4 | Proteotoxicity Prediction | MCQ, T/F | UniProtKB | 510 |
20 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L4 | Biological Laboratory Safety Test | MCQ, T/F | LabExam (ZJU) | 194 |
21 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Biological Protocol Procedure Design | GEN | Protocol Journal | 591 |
22 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Biological Protocol Reagent Design | GEN | Protocol Journal | 565 |
23 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Protein Captioning | GEN | UniProtKB | 937 |
24 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Protein Design | GEN | UniProtKB | 860 |
25 | user | 22/07/2024 01:23 PM | user | 22/07/2024 01:23 PM | L5 | Single Cell Analysis | GEN | SHARE-seq | 300 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Ability Level | Task Name | Task Type | Data Source | #Questions |
---|---|---|---|---|---|---|---|---|---|
1 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L1 | Molecular Name Conversion | MCQ | PubChem | 1,008 |
2 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L1 | Molecular Property Identification | MCQ, T/F | MoleculeNet | 1,625 |
3 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L1 | Chemical Literature QA | MCQ | Literature Corpus | 6,316 |
4 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L2 | Reaction Mechanism Inference | MCQ | LibreTexts | 269 |
5 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L2 | Compound Identification and Properties | MCQ | LibreTexts | 497 |
6 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L2 | Doping Extraction | RE | NERRE | 821 |
7 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L2 | Detailed Understanding | MCQ | LibreTexts | 626 |
8 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L2 | Text Summary | GEN | LibreTexts | 692 |
9 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L2 | Hypothesis Verification | T/F | LibreTexts | 544 |
10 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L2 | Reasoning and Interpretation | MCQ | LibreTexts | 516 |
11 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L3 | Molar Weight Calculation | MCQ | PubChem | 1,042 |
12 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L3 | Molecular Property Calculation | MCQ | MoleculeNet | 740 |
13 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L3 | Molecular Structure Prediction | MCQ | PubChem | 608 |
14 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L3 | Reaction Prediction | MCQ | USPTO-Mixed | 1,122 |
15 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L3 | Retrosynthesis | MCQ | USPTO-50k | 1,122 |
16 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L3 | Balancing Chemical Equation | GEN | WebQC | 535 |
17 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L3 | Chemical Calculation | MCQ | XieZhi, SciEval, MMLU | 269 |
18 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L4 | Chemical Harmful QA | GEN | Proposition-65, ILO | 454 |
19 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L4 | Molecular Toxicity Prediction | MCQ, T/F | Toxric | 870 |
20 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L4 | Chemical Laboratory Safety Test | MCQ, T/F | LabExam (ZJU) | 531 |
21 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L5 | Molecular Captioning | GEN | ChEBI-20 | 943 |
22 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L5 | Molecular Generation | GEN | ChEBI-20 | 897 |
23 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L5 | Chemical Protocol Procedure Design | GEN | Protocol Journal | 74 |
24 | user | 22/07/2024 01:24 PM | user | 22/07/2024 01:24 PM | L5 | Chemical Protocol Reagent Design | GEN | Protocol Journal | 129 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Ability Level | Task Name | Task Type | Data Source | #Questions |
---|---|---|---|---|---|---|---|---|---|
1 | user | 23/07/2024 04:40 AM | user | 24/07/2024 04:47 AM | L1 | Physics Literature QA | MCQ | Literature Corpus | 4,403 |
2 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L1 | Fundamental Physics Exam | MCQ | SciQ | 2,375 |
3 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L2 | Detailed Understanding | MCQ | Literature Corpus | 400 |
4 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L2 | Text Summary | GEN | Literature Corpus | 400 |
5 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L2 | Hypothesis Verification | T/F | Literature Corpus | 400 |
6 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L2 | Reasoning and Interpretation | MCQ | Literature Corpus | 400 |
7 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L3 | High School Physics Calculation | MCQ | tiku.cn | 698 |
8 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L3 | General Physics Calculation | MCQ | SciEval, SciBench | 800 |
9 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L3 | Physics Formula Derivation | MCQ | Physics Inference Dataset | 218 |
10 | user | 23/07/2024 04:40 AM | user | 24/07/2024 04:47 AM | L4 | Physics Safety QA | GEN | Nature Portfolio | 341 |
11 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L4 | Laboratory Safety Test | MCQ | LabExam (ZJU) | 606 |
12 | user | 23/07/2024 04:40 AM | user | 23/07/2024 04:40 AM | L5 | Physics Problem Solving | GEN | Qualifying Exam | 302 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Ability Level | Task Name | Task Type | Data Source | #Questions |
---|---|---|---|---|---|---|---|---|---|
19 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L1 | Material Literature QA | MCQ | Literature Corpus | 5534 |
20 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L2 | Chemical Composition Extraction | GEN | Literature Corpus | 203 |
21 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L2 | Digital Data Extraction | MCQ | Literature Corpus | 170 |
22 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L2 | Detailed Understanding | MCQ | Literature Corpus | 400 |
23 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L2 | Text Summary | GEN | Literature Corpus | 400 |
24 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L2 | Hypothesis Verification | T/F | Literature Corpus | 300 |
25 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L2 | Reasoning and Interpretation | MCQ | Literature Corpus | 359 |
26 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L3 | Valence Electron Difference Calculation | MCQ | Metallic Glass Forming Database | 146 |
27 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L3 | Material Calculation | MCQ | MaScQA | 348 |
28 | user | 24/07/2024 07:52 AM | user | 24/07/2024 07:52 AM | L3 | Lattice Volume Calculation | MCQ | Materials Project | 160 |
Task Scores
Last updated: 22 July, 2024
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | biology_literature_QA | protein_property_identification | drug_drug_relation_extraction | biomedical_judgment_and_interpretation | compound_disease_relation_extraction | gene_disease_relation_extraction | biological_detailed_understanding | biological_text_summary | biological_hypothesis_verification | biological_reasoning_and_interpretation | solubility_prediction | beta_lactamase_activity_prediction | fluorescence_prediction | GB1_ftness_prediction | stability_prediction | Protein_Protein_Interaction | biological_calculation | biological_harmful_QA | proteotoxicity_prediction | biological_laboratory_safety_test | biological_procedure_generation | biological_reagent_generation | protein_description_generation | protein_design | single_cell_analysis | molecule_name_conversion | molecular_property_prediction |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
52 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Claude-3.5-Sonnet-20240620 | 0.8415 | 0.3700 | 0.1307 | 0.9852 | 0.1691 | 0.3929 | 0.9952 | 4.8104 | 0.9547 | 0.9815 | 0.4686 | 0.5369 | 0.4975 | 0.2823 | 0.3088 | 0.3140 | 0.5833 | 0.9933 | 0.8235 | 0.8454 | 3.0711 | 2.5556 | 0.1298 | 0.0143 | 0.0197 | 0.8996 | 0.4363 |
53 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | GPT-4o-2024-05-13 | 0.8371 | 0.3420 | 0.1741 | 0.9546 | 0.3770 | 0.3620 | 0.9940 | 4.7701 | 0.9482 | 0.9769 | 0.4831 | 0.5025 | 0.5172 | 0.3254 | 0.2402 | 0.3527 | 0.5833 | 0.5556 | 0.8588 | 0.8667 | 3.1334 | 2.5333 | 0.1219 | 0.0098 | 0.0165 | 0.8665 | 0.4403 |
54 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen2-72B-Inst | 0.8151 | 0.4587 | 0.1555 | 0.9388 | 0.3477 | 0.3197 | 0.9940 | 4.7709 | 0.9385 | 0.9799 | 0.5459 | 0.4975 | 0.4975 | 0.2440 | 0.1520 | 0.2415 | 0.5000 | 0.9327 | 0.7529 | 0.8557 | 2.7782 | 2.2427 | 0.1103 | 0.0027 | 0.0074 | 0.6487 | 0.4055 |
55 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | GPT-4-Turbo-2024-04-09 | 0.8006 | 0.3173 | 0.0973 | 0.9134 | 0.2955 | 0.2359 | 0.9952 | 4.8104 | 0.9676 | 0.9784 | 0.5121 | 0.5123 | 0.4384 | 0.2919 | 0.1471 | 0.3333 | 0.5500 | 0.8721 | 0.7941 | 0.7667 | 2.6638 | 2.3418 | 0.1103 | 0.0074 | 0.0156 | 0.7787 | 0.3821 |
56 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Gemini1.5-Pro-latest | 0.8160 | 0.2740 | 0.1687 | 0.9483 | 0.3117 | 0.2382 | 0.9928 | 4.3197 | 0.9450 | 0.9753 | 0.5266 | 0.5123 | 0.5074 | 0.2919 | 0.2059 | 0.2609 | 0.5667 | 1.0000 | 0.7765 | 0.7444 | 2.7851 | 2.3929 | 0.0477 | 0.0039 | 0.0038 | 0.8360 | 0.3770 |
57 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Llama3-70B-Inst | 0.8047 | 0.2500 | 0.1720 | 0.9176 | 0.3669 | 0.4859 | 0.9928 | 4.7941 | 0.9417 | 0.9784 | 0.5121 | 0.5025 | 0.5025 | 0.2440 | 0.2745 | 0.1981 | 0.5000 | 0.9596 | 0.7686 | 0.7423 | 2.5078 | 2.4000 | 0.1140 | 0.0005 | 0.0138 | 0.7249 | 0.3883 |
58 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen-Max | 0.8050 | 0.4093 | 0.1227 | 0.9324 | 0.1372 | 0.0230 | 0.9915 | 4.8118 | 0.9385 | 0.9815 | 0.5024 | 0.5025 | 0.5025 | 0.1675 | 0.2892 | 0.3043 | 0.4167 | 0.9091 | 0.6314 | 0.8299 | 2.6944 | 2.2222 | 0.0745 | 0.0032 | 0.0007 | 0.6909 | 0.3698 |
59 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Claude3-Sonnet-20240229 | 0.7644 | 0.2793 | 0.1602 | 0.9620 | 0.3585 | 0.2918 | 0.9867 | 4.7670 | 0.9498 | 0.9660 | 0.4686 | 0.4384 | 0.5172 | 0.2440 | 0.3039 | 0.2754 | 0.3833 | 1.0000 | 0.4088 | 0.7111 | 2.8614 | 2.3707 | 0.1180 | 0.0082 | 0.0183 | 0.6846 | 0.3450 |
60 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | SciKnowMind-7b-v0.1 | 0.8309 | 0.7480 | 0.1529 | 0.9704 | 0.2672 | 0.2289 | 0.9819 | 4.6471 | 0.8883 | 0.9722 | 0.6860 | 0.4729 | 0.5911 | 0.8325 | 0.8039 | 0.6135 | 0.4167 | 0.0168 | 0.5157 | 0.8454 | 1.4506 | 1.0319 | 0.2009 | 0.0003 | 0.0918 | 0.8262 | 0.4843 |
61 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen2-7B-Inst | 0.7716 | 0.2827 | 0.1455 | 0.9155 | 0.2888 | 0.0779 | 0.9831 | 4.8599 | 0.9094 | 0.9630 | 0.5169 | 0.4877 | 0.5025 | 0.1962 | 0.2696 | 0.1498 | 0.4167 | 0.5488 | 0.5569 | 0.8402 | 2.2010 | 1.9573 | 0.1002 | 0.0003 | 0.0014 | 0.4113 | 0.3218 |
62 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen1.5-14B-Chat | 0.7466 | 0.3407 | 0.1256 | 0.8701 | 0.2161 | 0.0535 | 0.9879 | 4.5967 | 0.8932 | 0.9645 | 0.4879 | 0.4926 | 0.5222 | 0.3445 | 0.2598 | 0.1498 | 0.3333 | 0.4916 | 0.3971 | 0.6667 | 2.3102 | 2.2171 | 0.1081 | 0.0002 | 0.0073 | 0.4489 | 0.3406 |
63 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | GPT-3.5-Turbo-0125 | 0.7667 | 0.4013 | 0.1536 | 0.9187 | 0.2370 | 0.2757 | 0.9771 | 4.7291 | 0.8900 | 0.9552 | 0.4686 | 0.4778 | 0.5025 | 0.2057 | 0.2647 | 0.2560 | 0.3167 | 0.9764 | 0.4706 | 0.7111 | 2.1698 | 2.1060 | 0.0693 | 0.0027 | 0.0054 | 0.4767 | 0.3278 |
64 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Llama3-8B-Inst | 0.7482 | 0.2560 | 0.1675 | 0.8912 | 0.2890 | 0.3916 | 0.9879 | 4.5944 | 0.9045 | 0.9676 | 0.4928 | 0.5025 | 0.4975 | 0.1914 | 0.2451 | 0.2126 | 0.3333 | 0.9832 | 0.3824 | 0.6444 | 2.1906 | 2.2393 | 0.1120 | 0.0001 | 0.0006 | 0.5538 | 0.3636 |
65 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChemDFM-13B | 0.7187 | 0.3693 | 0.1488 | 0.8448 | 0.2293 | 0.1234 | 0.9831 | 4.3676 | 0.8835 | 0.9460 | 0.4686 | 0.5025 | 0.4975 | 0.3062 | 0.2500 | 0.2222 | 0.2667 | 0.7475 | 0.3794 | 0.6444 | 1.8856 | 1.6564 | 0.0906 | 0.0001 | 0.0032 | 0.6353 | 0.3527 |
66 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChemLLM-20B-Chat | 0.6746 | 0.2540 | 0.1606 | 0.9641 | 0.2292 | 0.2740 | 0.9879 | 4.5875 | 0.9061 | 0.9552 | 0.5217 | 0.5025 | 0.4975 | 0.2679 | 0.2549 | 0.2367 | 0.2500 | 0.0337 | 0.3353 | 0.5667 | 2.0312 | 1.6003 | 0.0737 | 0.0002 | 0.0021 | 0.5717 | 0.3214 |
67 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | MolInst-Llama3-8B | 0.7282 | 0.2053 | 0.1486 | 0.9271 | 0.5026 | 0.1038 | 0.9650 | 2.9574 | 0.8948 | 0.9321 | 0.5362 | 0.4778 | 0.4483 | 0.2823 | 0.2892 | 0.2560 | 0.4167 | 0.1515 | 0.3706 | 0.6556 | 1.1317 | 1.0272 | 0.0091 | 0.0001 | 0.0014 | 0.5690 | 0.3495 |
68 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen1.5-7B-Chat | 0.7206 | 0.2640 | 0.1693 | 0.8733 | 0.2127 | 0.0874 | 0.9758 | 4.6200 | 0.8560 | 0.9522 | 0.4928 | 0.2759 | 0.4631 | 0.1340 | 0.2255 | 0.2222 | 0.3667 | 0.5488 | 0.3971 | 0.6778 | 2.2340 | 2.0884 | 0.1004 | 0.0004 | 0.0033 | 0.4005 | 0.3495 |
69 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Gemma1.1-7B-Inst | 0.4386 | 0.2520 | 0.0110 | 0.9060 | 0.0452 | 0.0078 | 0.1775 | 4.0495 | 0.8689 | 0.3858 | 0.5604 | 0.5025 | 0.4975 | 0.2249 | 0.2206 | 0.1546 | 0.3167 | 0.8687 | 0.0618 | 0.4889 | 2.2582 | 1.9609 | 0.0967 | 0.0000 | 0.0000 | 0.3737 | 0.2952 |
70 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Mistral-7B-Inst-v0.2 | 0.7136 | 0.2627 | 0.1024 | 0.0116 | 0.2613 | 0.2514 | 0.9710 | 3.6176 | 0.1424 | 0.9599 | 0.5459 | 0.4483 | 0.1872 | 0.1435 | 0.1176 | 0.1159 | 0.3500 | 0.3266 | 0.3735 | 0.5889 | 1.0289 | 1.0359 | 0.0156 | 0.0000 | 0.0015 | 0.3871 | 0.3003 |
71 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChatGLM3-6B | 0.6400 | 0.2627 | 0.1304 | 0.7054 | 0.1906 | 0.1355 | 0.9396 | 4.2353 | 0.8074 | 0.8966 | 0.4734 | 0.5025 | 0.5123 | 0.2440 | 0.2304 | 0.2222 | 0.3167 | 0.7542 | 0.3147 | 0.6667 | 1.5303 | 1.4813 | 0.1096 | 0.0001 | 0.0038 | 0.2778 | 0.3195 |
72 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Galactica-30B | 0.7294 | 0.2480 | 0.1104 | 0.8817 | 0.2107 | 0.3066 | 0.9408 | 1.9133 | 0.6084 | 0.9090 | 0.5700 | 0.4975 | 0.5025 | 0.2488 | 0.2549 | 0.2609 | 0.2667 | 0.0000 | 0.2588 | 0.5778 | 1.0260 | 1.1077 | 0.0171 | 0.0001 | 0.0018 | 0.4364 | 0.3936 |
73 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Llama2-13B-Chat | 0.4985 | 0.1687 | 0.0923 | 0.8110 | 0.2662 | 0.3685 | 0.9312 | 4.3963 | 0.8722 | 0.9336 | 0.4734 | 0.0000 | 0.4286 | 0.0909 | 0.2500 | 0.1884 | 0.2167 | 0.9865 | 0.3735 | 0.4667 | 1.8839 | 1.9026 | 0.1160 | 0.0001 | 0.0043 | 0.2437 | 0.2594 |
74 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | SciGLM-6B | 0.6165 | 0.2220 | 0.0647 | 0.2608 | 0.1334 | 0.1265 | 0.9601 | 2.8947 | 0.6197 | 0.9167 | 0.4686 | 0.6158 | 0.2069 | 0.1627 | 0.2647 | 0.2077 | 0.2167 | 0.4343 | 0.2206 | 0.6111 | 1.0855 | 1.0940 | 0.0652 | 0.0001 | 0.0008 | 0.4471 | 0.2038 |
75 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChemLLM-7B-Chat | 0.6096 | 0.2093 | 0.1141 | 0.9261 | 0.1080 | 0.1151 | 0.9287 | 3.6246 | 0.8204 | 0.9090 | 0.4879 | 0.4975 | 0.5025 | 0.2201 | 0.2353 | 0.2560 | 0.2833 | 0.0135 | 0.3088 | 0.5222 | 1.1478 | 1.1721 | 0.0567 | 0.0000 | 0.0000 | 0.3405 | 0.3176 |
76 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Galactica-6.7B | 0.6058 | 0.2053 | 0.1017 | 0.8501 | 0.2416 | 0.0743 | 0.8237 | 1.2765 | 0.5777 | 0.7855 | 0.4831 | 0.4975 | 0.4828 | 0.2201 | 0.2745 | 0.2415 | 0.2333 | 0.0034 | 0.2382 | 0.3556 | 1.0017 | 1.0120 | 0.0333 | 0.0001 | 0.0004 | 0.2706 | 0.3610 |
77 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | LlaSMol-Mistral-7B | 0.3980 | 0.1753 | 0.1318 | 0.5871 | 0.0902 | 0.1255 | 0.7186 | 2.5834 | 0.4223 | 0.7978 | 0.1836 | 0.0197 | 0.0049 | 0.1914 | 0.3725 | 0.2415 | 0.3167 | 0.0000 | 0.0853 | 0.2889 | 1.1361 | 1.0000 | 0.0616 | 0.0000 | 0.0007 | 0.2742 | 0.1891 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | chemical_literature_QA | reaction_mechanism_inference | compound_identification_and_properties | extract_doping | chemical_detailed_understanding | chemical_text_summary | chemical_hypothesis_verification | chemical_reasoning_and_interpretation | molar_weight_calculation | molecular_property_calculation | molecule_structure_prediction | reaction_prediction | retrosynthesis | balancing_chemical_equation | chemical_calculation | chemical_harmful_QA | mol_toxicity_prediction | chemical_laboratory_safety_test | molecule_captioning | molecule_generation | chemical_procedure_generation | chemical_reagent_generation |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
52 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Claude-3.5-Sonnet-20240620 | 0.8735 | 0.9888 | 0.9879 | 0.5676 | 0.9968 | 4.8032 | 0.9505 | 0.9767 | 0.4568 | 0.3973 | 0.4730 | 0.5535 | 0.4563 | 0.4430 | 0.6766 | 0.7225 | 0.6057 | 0.8249 | 0.1199 | 0.7105 | 2.9459 | 2.4320 |
53 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | GPT-4o-2024-05-13 | 0.8575 | 1.0000 | 0.9859 | 0.4994 | 0.9952 | 4.7598 | 0.9486 | 0.9689 | 0.2591 | 0.3473 | 0.4608 | 0.4804 | 0.4118 | 0.1047 | 0.5242 | 0.0154 | 0.3740 | 0.8523 | 0.1390 | 0.6086 | 2.9595 | 2.4240 |
54 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen2-72B-Inst | 0.8420 | 0.9963 | 0.9879 | 0.5743 | 0.9920 | 4.7945 | 0.9376 | 0.9767 | 0.2006 | 0.2743 | 0.4301 | 0.2843 | 0.3387 | 0.1776 | 0.5428 | 0.2974 | 0.5391 | 0.7985 | 0.1198 | 0.3796 | 2.7027 | 2.1680 |
55 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | GPT-4-Turbo-2024-04-09 | 0.8233 | 0.9963 | 0.9859 | 0.6063 | 0.9936 | 4.8017 | 0.9541 | 0.9825 | 0.2774 | 0.2608 | 0.3885 | 0.3610 | 0.2834 | 0.0991 | 0.4647 | 0.2247 | 0.3415 | 0.7764 | 0.1061 | 0.5762 | 2.6486 | 2.4419 |
56 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Gemini1.5-Pro-latest | 0.8328 | 0.9926 | 0.9819 | 0.5929 | 0.9936 | 4.3271 | 0.9156 | 0.9612 | 0.2706 | 0.3865 | 0.3897 | 0.2941 | 0.3182 | 0.2673 | 0.6989 | 0.8194 | 0.3902 | 0.8017 | 0.0720 | 0.5729 | 2.3784 | 2.3178 |
57 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Llama3-70B-Inst | 0.8313 | 0.9814 | 0.9759 | 0.5789 | 0.9856 | 4.8292 | 0.9229 | 0.9650 | 0.2841 | 0.3392 | 0.3713 | 0.2807 | 0.2201 | 0.3290 | 0.4535 | 0.5793 | 0.6172 | 0.7966 | 0.0919 | 0.5287 | 2.4189 | 2.1840 |
58 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen-Max | 0.8309 | 0.9926 | 0.9759 | 0.5502 | 0.9920 | 4.8090 | 0.9376 | 0.9748 | 0.1891 | 0.3189 | 0.3137 | 0.3993 | 0.3984 | 0.2542 | 0.4758 | 0.1872 | 0.5563 | 0.7797 | 0.1031 | 0.3627 | 2.6892 | 2.1360 |
59 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Claude3-Sonnet-20240229 | 0.8033 | 0.9926 | 0.9638 | 0.5847 | 0.9856 | 4.7048 | 0.8991 | 0.9592 | 0.2466 | 0.2730 | 0.2439 | 0.4349 | 0.3217 | 0.2804 | 0.3606 | 0.7731 | 0.2520 | 0.6582 | 0.1245 | 0.5321 | 2.8243 | 2.1318 |
60 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | SciKnowMind-7b-v0.1 | 0.8687 | 0.9740 | 0.9618 | 0.4644 | 0.9760 | 4.5890 | 0.8422 | 0.9709 | 0.5931 | 0.8932 | 0.7512 | 0.7513 | 0.5169 | 0.2935 | 0.4015 | 0.0000 | 0.5839 | 0.8079 | 0.1073 | 0.2131 | 1.3014 | 1.0588 |
61 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen2-7B-Inst | 0.7977 | 0.9480 | 0.9598 | 0.5253 | 0.9617 | 4.8509 | 0.8752 | 0.9553 | 0.1871 | 0.2716 | 0.3051 | 0.2077 | 0.3182 | 0.1514 | 0.3866 | 0.0396 | 0.5586 | 0.7834 | 0.0831 | 0.2215 | 2.2162 | 1.9200 |
62 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen1.5-14B-Chat | 0.7811 | 0.9777 | 0.9618 | 0.4750 | 0.9696 | 4.6151 | 0.8844 | 0.9592 | 0.2188 | 0.3595 | 0.2917 | 0.4269 | 0.3913 | 0.1551 | 0.3978 | 0.0264 | 0.3089 | 0.7173 | 0.1031 | 0.2446 | 2.1622 | 2.1040 |
63 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | GPT-3.5-Turbo-0125 | 0.7855 | 0.9703 | 0.9396 | 0.5070 | 0.9649 | 4.7453 | 0.8826 | 0.9417 | 0.1900 | 0.4108 | 0.3456 | 0.2647 | 0.2531 | 0.2355 | 0.3606 | 0.0837 | 0.2846 | 0.6878 | 0.1201 | 0.5012 | 2.1622 | 2.1200 |
64 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Llama3-8B-Inst | 0.7873 | 0.9851 | 0.9577 | 0.5265 | 0.9744 | 4.5398 | 0.8771 | 0.9650 | 0.1996 | 0.3351 | 0.2745 | 0.3957 | 0.3556 | 0.1720 | 0.3457 | 0.4361 | 0.2683 | 0.6751 | 0.0452 | 0.3780 | 2.2568 | 2.0320 |
65 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChemDFM-13B | 0.7724 | 0.9703 | 0.9598 | 0.4729 | 0.9744 | 4.3198 | 0.8569 | 0.9495 | 0.1823 | 0.3108 | 0.3811 | 0.6399 | 0.4332 | 0.1327 | 0.3086 | 0.0374 | 0.3740 | 0.5992 | 0.2530 | 0.8493 | 1.7568 | 1.6320 |
66 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChemLLM-20B-Chat | 0.7523 | 0.9888 | 0.9638 | 0.4083 | 0.9728 | 4.5760 | 0.9009 | 0.9320 | 0.2015 | 0.3446 | 0.2598 | 0.3084 | 0.1551 | 0.1458 | 0.2974 | 0.0000 | 0.2602 | 0.4684 | 0.1755 | 0.5155 | 2.0676 | 1.4880 |
67 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | MolInst-Llama3-8B | 0.7457 | 0.9814 | 0.9618 | 0.4976 | 0.9601 | 2.9768 | 0.8624 | 0.9495 | 0.2006 | 0.3189 | 0.3493 | 0.2709 | 0.4055 | 0.1813 | 0.4015 | 0.0154 | 0.3333 | 0.6793 | 0.0236 | 0.2785 | 1.0405 | 1.0775 |
68 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Qwen1.5-7B-Chat | 0.7598 | 0.9628 | 0.9316 | 0.4799 | 0.9473 | 4.6006 | 0.8422 | 0.9495 | 0.2131 | 0.3338 | 0.3272 | 0.2799 | 0.3824 | 0.1047 | 0.3383 | 0.0308 | 0.2520 | 0.6245 | 0.0752 | 0.2091 | 2.0946 | 2.1085 |
69 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Gemma1.1-7B-Inst | 0.4490 | 0.1673 | 0.1972 | 0.4769 | 0.1709 | 3.9783 | 0.8661 | 0.4272 | 0.1603 | 0.2459 | 0.2770 | 0.1248 | 0.2709 | 0.2505 | 0.3234 | 0.2379 | 0.2358 | 0.5907 | 0.1306 | 0.2226 | 2.3514 | 1.7752 |
70 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Mistral-7B-Inst-v0.2 | 0.7327 | 0.9591 | 0.9155 | 0.1136 | 0.9409 | 3.7757 | 0.1119 | 0.9495 | 0.1881 | 0.2527 | 0.2941 | 0.0588 | 0.0125 | 0.0000 | 0.3309 | 0.1079 | 0.3821 | 0.5570 | 0.0333 | 0.1328 | 1.0638 | 1.0000 |
71 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChatGLM3-6B | 0.6937 | 0.9108 | 0.8431 | 0.3758 | 0.9105 | 4.1447 | 0.7945 | 0.8796 | 0.2399 | 0.2297 | 0.2586 | 0.2094 | 0.0588 | 0.0916 | 0.2862 | 0.0441 | 0.2439 | 0.5865 | 0.0921 | 0.1522 | 1.4459 | 1.6279 |
72 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Galactica-30B | 0.7663 | 0.9219 | 0.8189 | 0.3919 | 0.8946 | 2.0275 | 0.5596 | 0.8835 | 0.2514 | 0.2595 | 0.2586 | 0.0098 | 0.1622 | 0.2187 | 0.2862 | 0.0110 | 0.2114 | 0.5148 | 0.0375 | 0.2401 | 1.0811 | 1.1680 |
73 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Llama2-13B-Chat | 0.5058 | 0.9219 | 0.8410 | 0.4558 | 0.8850 | 4.2967 | 0.8073 | 0.9165 | 0.2524 | 0.2608 | 0.1164 | 0.0357 | 0.0499 | 0.0822 | 0.2119 | 0.4537 | 0.2764 | 0.3586 | 0.0747 | 0.1662 | 1.8378 | 1.8720 |
74 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | SciGLM-6B | 0.6657 | 0.9257 | 0.8974 | 0.2774 | 0.9249 | 2.8567 | 0.5908 | 0.8913 | 0.2063 | 0.3000 | 0.2365 | 0.0187 | 0.0062 | 0.1121 | 0.2788 | 0.0154 | 0.2439 | 0.5021 | 0.1228 | 0.2984 | 1.0135 | 1.2320 |
75 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | ChemLLM-7B-Chat | 0.6617 | 0.9071 | 0.8491 | 0.2095 | 0.9042 | 3.5326 | 0.8055 | 0.8971 | 0.2361 | 0.2851 | 0.2549 | 0.0428 | 0.2201 | 0.1364 | 0.2862 | 0.0000 | 0.2276 | 0.5063 | 0.0480 | 0.1983 | 1.1081 | 1.1680 |
76 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | Galactica-6.7B | 0.7038 | 0.7546 | 0.7002 | 0.4199 | 0.7732 | 1.2173 | 0.0000 | 0.7553 | 0.2495 | 0.2514 | 0.3064 | 0.2888 | 0.1649 | 0.0523 | 0.2379 | 0.0396 | 0.2683 | 0.4177 | 0.0351 | 0.1314 | 1.0000 | 1.0240 |
77 | user | 25/07/2024 02:22 AM | user | 25/07/2024 02:22 AM | LlaSMol-Mistral-7B | 0.3911 | 0.6431 | 0.6258 | 0.1857 | 0.6214 | 2.4916 | 0.4532 | 0.7437 | 0.2495 | 0.1959 | 0.2059 | 0.0134 | 0.0027 | 0.0037 | 0.3383 | 0.0000 | 0.2033 | 0.2743 | 0.2600 | 0.6840 | 1.0769 | 1.0000 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | physics_literature_QA | fundamental_physics_exam | physics_detailed_understanding | physics_text_summary | physics_hypothesis_verification | physics_reasoning_and_interpretation | high_school_physics_calculation | general_physics_calculation | physics_formula_derivation | physics_safety_QA | physics_laboratory_safety_test | physics_problem_solving |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
77 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Claude-3.5-Sonnet-20240620 | 0.8617 | 0.9453 | 0.9950 | 4.8225 | 0.9850 | 0.9975 | 0.5716 | 0.4963 | 4.8257 | 0.8830 | 0.7805 | 3.7616 |
78 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | GPT-4o-2024-05-13 | 0.8756 | 0.9659 | 0.9950 | 4.8775 | 0.9875 | 0.9975 | 0.5745 | 0.3962 | 4.8624 | 0.8801 | 0.7871 | 3.8576 |
79 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen2-72B-Inst | 0.8429 | 0.9524 | 0.9975 | 4.8900 | 0.9850 | 0.9950 | 0.6705 | 0.4500 | 4.8119 | 0.8743 | 0.8152 | 3.0563 |
80 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | GPT-4-Turbo-2024-04-09 | 0.8322 | 0.9558 | 0.9925 | 4.8800 | 0.9750 | 0.9925 | 0.5444 | 0.4188 | 4.7982 | 0.8509 | 0.7987 | 3.7682 |
81 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Gemini1.5-Pro-latest | 0.8474 | 0.9558 | 0.9925 | 4.4150 | 0.9725 | 0.9825 | 0.6046 | 0.4662 | 4.4495 | 0.8480 | 0.7789 | 3.5728 |
82 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Llama3-70B-Inst | 0.8404 | 0.9398 | 0.9900 | 4.8300 | 0.9750 | 0.9900 | 0.4599 | 0.4313 | 4.6101 | 0.8772 | 0.7558 | 2.7781 |
83 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen-Max | 0.8236 | 0.9406 | 0.9925 | 4.9200 | 0.9775 | 0.9950 | 0.6117 | 0.3175 | 4.7936 | 0.8304 | 0.7970 | 2.9603 |
84 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Claude3-Sonnet-20240229 | 0.7891 | 0.9048 | 0.9925 | 4.7925 | 0.9600 | 0.9875 | 0.3711 | 0.3337 | 4.5688 | 0.8129 | 0.7541 | 2.6391 |
85 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | SciKnowMind-7b-v0.1 | 0.8590 | 0.9234 | 0.9925 | 4.4125 | 0.9375 | 0.9900 | 0.4441 | 0.3488 | 3.3303 | 0.9006 | 0.7954 | 1.4238 |
86 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen2-7B-Inst | 0.7832 | 0.9124 | 0.9925 | 4.7650 | 0.9600 | 0.9925 | 0.4341 | 0.3550 | 4.2339 | 0.8304 | 0.7822 | 1.8477 |
87 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen1.5-14B-Chat | 0.7579 | 0.9023 | 0.9900 | 4.7725 | 0.9575 | 0.9900 | 0.4771 | 0.3075 | 3.7752 | 0.8099 | 0.7327 | 1.7682 |
88 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | GPT-3.5-Turbo-0125 | 0.7600 | 0.9069 | 0.9850 | 4.6825 | 0.9400 | 0.9900 | 0.3367 | 0.3475 | 4.0505 | 0.7661 | 0.7178 | 1.9272 |
89 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Llama3-8B-Inst | 0.7761 | 0.8926 | 0.9925 | 1.0200 | 0.9525 | 0.9950 | 0.3009 | 0.3475 | 1.2202 | 0.8129 | 0.7030 | 1.0099 |
90 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | ChemDFM-13B | 0.7228 | 0.8893 | 0.9925 | 4.3675 | 0.9275 | 0.9875 | 0.4226 | 0.2838 | 1.4725 | 0.7661 | 0.6799 | 1.0894 |
91 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | ChemLLM-20B-Chat | 0.7386 | 0.8931 | 0.9875 | 4.5000 | 0.9700 | 0.9900 | 0.2908 | 0.1850 | 1.1330 | 0.7632 | 0.7244 | 1.0199 |
92 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | MolInst-Llama3-8B | 0.7550 | 0.9061 | 0.9900 | 2.5100 | 0.9400 | 0.9725 | 0.3066 | 0.3300 | 1.7661 | 0.7632 | 0.6815 | 1.2185 |
93 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen1.5-7B-Chat | 0.7296 | 0.8644 | 0.9825 | 4.6925 | 0.9175 | 0.9900 | 0.3496 | 0.2387 | 1.0046 | 0.7632 | 0.6749 | 1.0232 |
94 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Gemma1.1-7B-Inst | 0.6794 | 0.8531 | 0.9800 | 3.8975 | 0.9225 | 0.9650 | 0.3195 | 0.3613 | 3.3486 | 0.7339 | 0.7376 | 1.5497 |
95 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Mistral-7B-Inst-v0.2 | 0.7123 | 0.8008 | 0.9775 | 4.7550 | 0.9475 | 0.9900 | 0.3080 | 0.3038 | 2.4725 | 0.7719 | 0.6931 | 1.3775 |
96 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | ChatGLM3-6B | 0.6358 | 0.7604 | 0.9700 | 4.2725 | 0.8675 | 0.9675 | 0.3109 | 0.2425 | 1.2798 | 0.7456 | 0.6271 | 1.0298 |
97 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Galactica-30B | 0.7323 | 0.7693 | 0.9725 | 1.5000 | 0.5350 | 0.9525 | 0.3009 | 0.3987 | 2.2661 | 0.6754 | 0.3812 | 1.1490 |
98 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Llama2-13B-Chat | 0.4332 | 0.7781 | 0.8975 | 4.4275 | 0.8825 | 0.9800 | 0.1862 | 0.2537 | 1.6560 | 0.4035 | 0.5990 | 1.0695 |
99 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | SciGLM-6B | 0.6129 | 0.7200 | 0.9650 | 2.9425 | 0.8850 | 0.9375 | 0.2364 | 0.2213 | 1.1250 | 0.6228 | 0.4637 | 1.1074 |
100 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | ChemLLM-7B-Chat | 0.6358 | 0.8101 | 0.9700 | 2.9850 | 0.9500 | 0.9700 | 0.3352 | 0.2325 | 1.0000 | 0.6959 | 0.6271 | 1.0791 |
101 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Galactica-6.7B | 0.6533 | 0.6623 | 0.9025 | 1.2550 | 0.4150 | 0.8300 | 0.3138 | 0.3962 | 2.0000 | 0.5936 | 0.5281 | 1.0430 |
102 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | LlaSMol-Mistral-7B | 0.3597 | 0.2943 | 0.8000 | 2.6624 | 0.1600 | 0.9025 | 0.1232 | 0.2587 | 1.3807 | 0.3158 | 0.2096 | 1.0503 |
wdt_ID | wdt_created_by | wdt_created_at | wdt_last_edited_by | wdt_last_edited_at | Models | material_literature_QA | material_component_extraction | material_data_extraction | material_detailed_understanding | material_text_summary | material_hypothesis_verification | material_reasoning_and_interpretation | valence_electron_difference_calculation | material_calculation | lattice_volume_calculation | perovskite_stability_prediction | diffusion_rate_analysis | material_safety_QA | material_toxicity_prediction | property_and_usage_analysis | crystal_structure_and_composition_analysis | specified_band_gap_material_generation |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
77 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Claude-3.5-Sonnet-20240620 | 0.7765 | 0.5431 | 0.9588 | 0.9025 | 4.7800 | 0.8300 | 0.9916 | 0.6027 | 0.5489 | 0.7188 | 0.4896 | 0.7852 | 0.8870 | 0.6683 | 2.7966 | 1.2041 | 2.8267 |
78 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | GPT-4o-2024-05-13 | 0.7647 | 0.5616 | 0.8882 | 0.9025 | 4.8200 | 0.9000 | 0.9833 | 0.4658 | 0.3793 | 0.5312 | 0.6188 | 0.6913 | 0.8621 | 0.6715 | 2.6780 | 1.2908 | 2.5133 |
79 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen2-72B-Inst | 0.7423 | 0.4791 | 0.9176 | 0.8975 | 4.7900 | 0.8133 | 0.9972 | 0.4521 | 0.4195 | 0.5625 | 0.5333 | 0.7987 | 0.8704 | 0.7008 | 2.6525 | 1.1633 | 2.5100 |
80 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | GPT-4-Turbo-2024-04-09 | 0.7407 | 0.4889 | 0.9176 | 0.9025 | 4.7875 | 0.8700 | 0.9889 | 0.3973 | 0.4770 | 0.5375 | 0.5167 | 0.6577 | 0.8383 | 0.6374 | 2.5932 | 1.2857 | 2.7067 |
81 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Gemini1.5-Pro-latest | 0.7385 | 0.6084 | 0.9353 | 0.8975 | 4.3975 | 0.7900 | 0.9833 | 0.4795 | 0.5029 | 0.4188 | 0.4396 | 0.7785 | 0.8526 | 0.6520 | 2.8390 | 1.1837 | 2.2833 |
82 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Llama3-70B-Inst | 0.7295 | 0.6786 | 0.9235 | 0.9025 | 4.7725 | 0.7133 | 0.9944 | 0.4315 | 0.3764 | 0.4500 | 0.4854 | 0.4497 | 0.8490 | 0.6764 | 2.6525 | 1.2551 | 2.5033 |
83 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen-Max | 0.7161 | 0.5209 | 0.8941 | 0.8975 | 4.8250 | 0.8500 | 0.9833 | 0.3904 | 0.3391 | 0.5000 | 0.5021 | 0.4497 | 0.8407 | 0.6081 | 2.4831 | 1.1786 | 2.4567 |
84 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Claude3-Sonnet-20240229 | 0.7085 | 0.6207 | 0.8824 | 0.8925 | 4.6025 | 0.6867 | 0.9554 | 0.4658 | 0.3506 | 0.3937 | 0.3563 | 0.6980 | 0.8335 | 0.6650 | 2.7458 | 1.1276 | 2.6967 |
85 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | SciKnowMind-7b-v0.1 | 0.7454 | 0.4828 | 0.8471 | 0.9000 | 4.3125 | 0.5367 | 0.9805 | 0.3699 | 0.2931 | 0.4313 | 0.3875 | 0.3691 | 0.8609 | 0.6309 | 2.6949 | 1.0357 | 1.3867 |
86 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen2-7B-Inst | 0.6877 | 0.5825 | 0.8118 | 0.8900 | 4.7425 | 0.6833 | 0.9805 | 0.3082 | 0.2500 | 0.3937 | 0.3542 | 0.4362 | 0.7895 | 0.5919 | 2.7119 | 1.0714 | 2.1267 |
87 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen1.5-14B-Chat | 0.6789 | 0.5431 | 0.8412 | 0.8750 | 4.7175 | 0.6900 | 0.9638 | 0.2808 | 0.3218 | 0.4250 | 0.3187 | 0.3087 | 0.8014 | 0.5593 | 2.7542 | 1.0510 | 2.4367 |
88 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | GPT-3.5-Turbo-0125 | 0.6706 | 0.5185 | 0.8588 | 0.8725 | 4.5375 | 0.7600 | 0.9415 | 0.3151 | 0.2759 | 0.2313 | 0.3542 | 0.2148 | 0.7919 | 0.6846 | 2.6186 | 1.2551 | 2.3333 |
89 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Llama3-8B-Inst | 0.6690 | 0.0000 | 0.9059 | 0.8875 | 1.1825 | 0.6500 | 0.9833 | 0.2877 | 0.2759 | 0.5000 | 0.3542 | 0.3893 | 0.8002 | 0.6390 | 1.0763 | 1.1173 | 1.0000 |
90 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | ChemDFM-13B | 0.6523 | 0.5185 | 0.8235 | 0.8750 | 4.3050 | 0.7200 | 0.9610 | 0.4110 | 0.2213 | 0.3312 | 0.3292 | 0.4161 | 0.7634 | 0.6081 | 2.3305 | 1.0204 | 2.0733 |
91 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | ChemLLM-20B-Chat | 0.6446 | 0.0468 | 0.8471 | 0.9025 | 4.4675 | 0.4500 | 0.9666 | 0.2945 | 0.2241 | 0.3187 | 0.3146 | 0.4564 | 0.7646 | 0.4878 | 1.9237 | 1.0918 | 2.0100 |
92 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | MolInst-Llama3-8B | 0.6534 | 0.4113 | 0.8471 | 0.8850 | 2.4575 | 0.5133 | 0.9471 | 0.3699 | 0.2270 | 0.4437 | 0.3563 | 0.3624 | 0.7610 | 0.5480 | 1.4915 | 1.0816 | 1.2867 |
93 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Qwen1.5-7B-Chat | 0.6531 | 0.3953 | 0.8235 | 0.8525 | 4.5475 | 0.6467 | 0.9694 | 0.1986 | 0.1954 | 0.4125 | 0.3417 | 0.2483 | 0.7872 | 0.5008 | 1.1525 | 1.0306 | 1.2767 |
94 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Gemma1.1-7B-Inst | 0.6113 | 0.6527 | 0.8118 | 0.8700 | 3.7425 | 0.5667 | 0.9304 | 0.3219 | 0.2845 | 0.3625 | 0.3458 | 0.3691 | 0.7325 | 0.5268 | 2.4068 | 1.0000 | 2.4433 |
95 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Mistral-7B-Inst-v0.2 | 0.6297 | 0.5123 | 0.8706 | 0.8775 | 4.7000 | 0.7433 | 0.9387 | 0.3219 | 0.2787 | 0.3187 | 0.4292 | 0.3691 | 0.7515 | 0.6163 | 2.5424 | 1.0000 | 2.3167 |
96 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | ChatGLM3-6B | 0.5593 | 0.1330 | 0.6118 | 0.8225 | 4.0975 | 0.6267 | 0.9164 | 0.2808 | 0.2155 | 0.3125 | 0.3125 | 0.2685 | 0.7229 | 0.5089 | 2.1102 | 1.0153 | 1.4867 |
97 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Galactica-30B | 0.6456 | 0.4002 | 0.5882 | 0.8425 | 1.6598 | 0.2667 | 0.8914 | 0.2466 | 0.2816 | 0.2562 | 0.2687 | 0.2081 | 0.6778 | 0.3593 | 1.1795 | 1.1020 | 1.1933 |
98 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Llama2-13B-Chat | 0.4387 | 0.5505 | 0.7059 | 0.7725 | 4.2475 | 0.5200 | 0.9387 | 0.2260 | 0.2241 | 0.3438 | 0.2771 | 0.1812 | 0.4185 | 0.5187 | 2.1864 | 1.0255 | 2.0067 |
99 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | SciGLM-6B | 0.5365 | 0.4828 | 0.7647 | 0.8425 | 2.9223 | 0.5933 | 0.8830 | 0.2466 | 0.2471 | 0.3000 | 0.2729 | 0.2550 | 0.6409 | 0.4683 | 1.9068 | 1.0459 | 1.6633 |
100 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | ChemLLM-7B-Chat | 0.5492 | 0.0012 | 0.6176 | 0.8375 | 3.3000 | 0.6633 | 0.9053 | 0.2329 | 0.2069 | 0.0563 | 0.2562 | 0.1611 | 0.7170 | 0.3041 | 1.9661 | 1.0204 | 1.0833 |
101 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | Galactica-6.7B | 0.5799 | 0.2561 | 0.4471 | 0.7875 | 1.2323 | 0.2300 | 0.7521 | 0.2808 | 0.2816 | 0.2687 | 0.2646 | 0.2349 | 0.5672 | 0.2992 | 1.2143 | 1.0357 | 1.0333 |
102 | user | 25/07/2024 02:23 AM | user | 25/07/2024 02:23 AM | LlaSMol-Mistral-7B | 0.3150 | 0.0254 | 0.7059 | 0.6775 | 2.6624 | 0.1933 | 0.6964 | 0.2329 | 0.3851 | 0.0563 | 0.2354 | 0.2148 | 0.3341 | 0.2065 | 1.4576 | 1.0051 | 1.0100 |
Submission
Upload your results <json file>
FAQ
The dataset can be found in our GitHub.
The examples of results can be found in our GitHub.
The evaluation will take some time after submission. Please be patient, usually less than a week.
Please contact Mr. Junjie Huang, junjie6282@zju.edu.cn