Introduction

博学之 ,审问之 ,慎思之 ,明辨之 ,笃行之。

—— 《礼记 · 中庸》

The Scientific Knowledge Evaluation (SciKnowEval) benchmark for Large Language Models (LLMs) is inspired by the profound principles outlined in the “Doctrine of the Mean” from ancient Chinese philosophy. This benchmark is designed to assess LLMs based on their proficiency in Studying Extensively, Enquiring Earnestly, Thinking Profoundly, Discerning Clearly, and Practicing Assiduously. Each of these dimensions offers a unique perspective on evaluating the capabilities of LLMs in handling scientific knowledge.

L1: Studying Extensively
(Knowledge Memory)

This dimension evaluates the breadth of an LLM’s knowledge across various scientific domains. It measures the model’s ability to remember a wide range of scientific concepts.

L2: Enquiring Earnestly
(Knowledge Comprehension)

This aspect focuses on the LLM’s capacity for deep enquiry and exploration within scientific contexts, such as analyzing scientific texts, identifying key concepts, and questioning relevant information.

L3: Thinking Profoundly
(Knowledge Reasoning)

This criterion examines the model’s capacity for critical thinking, logical deduction, numerical calculation, function prediction, and the ability to engage in reflective reasoning to solve problems.

L4: Discerning Clearly
(Knowledge Discernment )

This aspect evaluates the LLM’s ability to make correct, secure, and ethical decisions based on scientific knowledge, including assessing the harmfulness and toxicity of information, and understanding the ethical implications and safety concerns related to scientific endeavors.

L5: Practicing Assiduously
(Knowledge Application)

The final dimension assesses the LLM’s capability to apply scientific knowledge effectively in real-world scenarios, such as analyzing complex scientific problems and creating innovative solutions.

SciKnowEval represents a comprehensive benchmark for assessing the capability of LLMs in processing and utilizing scientific knowledge. It aims to promote the development of scientific LLMs that not only possess extensive knowledge but also demonstrate ethical discernment and practical applicability, ultimately contributing to the advancement of scientific research.

News

2024-09-20: Release the SciKnowEval report of OpenAI o1.
2024-07-28: Add the Physics and Materials to SciKnowEval.
2024-06-21: The first release of SciKnowEval.

Leaderboards

Last updated: 22 July, 2024

wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Models L1 L2 L3 L4 L5 Avg
103 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Claude-3.5-Sonnet-20240620 3.00 6.13 8.86 3.00 1.60 5.36
104 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM GPT-4o-2024-05-13 4.50 4.25 8.43 5.33 2.40 5.20
105 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Qwen2-72B-Inst 3.50 5.88 13.86 5.33 7.20 8.12
106 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM GPT-4-Turbo-2024-04-09 8.50 8.63 10.71 6.67 6.20 8.48
107 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Gemini1.5-Pro-latest 8.00 7.88 8.14 4.67 9.40 7.88
108 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Llama3-70B-Inst 13.00 5.00 12.00 7.00 6.40 8.12
109 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Qwen-Max 4.50 11.50 11.29 7.33 11.80 10.44
110 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Claude3-Sonnet-20240229 11.00 6.88 12.43 8.00 3.60 8.24
111 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM SciKnowMind-7b-v0.1 2.00 10.38 5.14 11.67 11.20 8.56
112 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Qwen2-7B-Inst 9.50 11.50 14.57 9.67 14.00 12.48
113 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Qwen1.5-14B-Chat 10.50 15.38 12.71 14.67 10.40 13.16
114 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM GPT-3.5-Turbo-0125 7.00 12.00 16.57 9.00 12.00 12.52
115 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Llama3-8B-Inst 14.00 9.38 16.57 12.00 14.60 13.12
116 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM ChemDFM-13B 11.50 15.88 16.43 15.00 16.60 15.72
117 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM ChemLLM-20B-Chat 18.00 10.50 14.14 20.33 15.60 14.32
118 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM MolInst-Llama3-8B 19.50 14.13 11.71 17.67 22.60 16.00
119 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Qwen1.5-7B-Chat 14.50 15.25 19.14 13.67 11.60 15.36
120 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Gemma1.1-7B-Inst 21.50 22.75 16.00 20.00 17.80 19.44
121 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Mistral-7B-Inst-v0.2 16.50 18.38 20.14 18.33 22.80 19.60
122 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM ChatGLM3-6B 17.50 19.63 15.43 15.33 16.00 17.04
123 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Galactica-30B 17.00 19.13 12.00 22.67 21.40 17.84
124 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Llama2-13B-Chat 25.00 16.63 22.71 15.00 13.80 18.24
125 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM SciGLM-6B 21.00 21.63 18.57 20.00 21.00 20.40
126 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM ChemLLM-7B-Chat 22.00 19.63 15.86 22.00 22.40 19.60
127 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM Galactica-6.7B 23.50 22.13 16.57 24.00 24.00 21.28
128 user 25/07/2024 02:20 AM user 25/07/2024 02:20 AM LlaSMol-Mistral-7B 25.50 22.25 19.00 25.67 23.00 22.16
wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Models L1 L2 L3 L4 L5 Avg
103 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Claude-3.5-Sonnet-20240620 1.67 3.71 2.14 2.33 3.75 2.83
104 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM GPT-4o-2024-05-13 2.33 4.86 6.71 10.67 3.00 5.50
105 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen2-72B-Inst 5.67 4.29 10.86 6.00 7.75 7.17
106 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM GPT-4-Turbo-2024-04-09 6.67 2.71 11.43 9.67 6.00 7.17
107 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Gemini1.5-Pro-latest 5.33 6.86 6.29 4.00 9.50 6.58
108 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Llama3-70B-Inst 6.00 6.57 8.57 3.67 9.00 7.13
109 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen-Max 7.67 5.43 10.00 7.67 9.75 8.04
110 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Claude3-Sonnet-20240229 10.67 7.86 12.00 12.00 6.00 9.63
111 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM SciKnowMind-7b-v0.1 2.33 12.29 2.43 10.67 18.25 8.96
112 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen2-7B-Inst 15.33 11.71 16.14 8.67 15.00 13.63
113 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen1.5-14B-Chat 14.67 11.86 10.14 13.67 13.50 12.21
114 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM GPT-3.5-Turbo-0125 14.33 13.00 12.43 12.33 10.00 12.42
115 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Llama3-8B-Inst 11.33 11.00 12.86 12.00 14.00 12.21
116 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChemDFM-13B 12.00 14.86 12.29 14.00 9.00 12.67
117 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChemLLM-20B-Chat 15.67 12.57 15.57 22.33 11.25 14.83
118 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM MolInst-Llama3-8B 14.67 14.86 11.43 15.00 21.75 15.00
119 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen1.5-7B-Chat 16.33 15.43 13.57 17.33 15.75 15.29
120 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Gemma1.1-7B-Inst 23.00 21.57 18.14 16.00 11.75 18.42
121 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Mistral-7B-Inst-v0.2 20.33 20.00 21.14 12.67 24.75 20.25
122 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChatGLM3-6B 21.33 21.00 20.43 17.67 18.50 20.04
123 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Galactica-30B 12.33 22.43 18.29 22.33 20.50 19.63
124 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Llama2-13B-Chat 24.67 19.86 21.14 15.00 18.00 19.92
125 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM SciGLM-6B 21.00 20.86 20.71 21.67 16.25 20.17
126 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChemLLM-7B-Chat 22.00 21.29 18.43 23.67 21.00 20.79
127 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Galactica-6.7B 18.67 23.86 18.00 18.67 25.00 21.04
128 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM LlaSMol-Mistral-7B 25.33 24.71 21.86 26.00 13.00 22.17
wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Models L1 L2 L3 L4 L5 Avg
103 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Claude-3.5-Sonnet-20240620 3.50 3.50 2.67 4.50 3.00 3.42
104 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM GPT-4o-2024-05-13 1.00 2.50 4.33 4.00 1.00 2.83
105 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen2-72B-Inst 4.50 2.75 2.33 3.00 5.00 3.17
106 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM GPT-4-Turbo-2024-04-09 5.00 6.75 5.00 4.00 2.00 5.17
107 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Gemini1.5-Pro-latest 3.50 12.50 4.33 7.50 4.00 7.42
108 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Llama3-70B-Inst 6.50 9.75 6.00 6.50 7.00 7.50
109 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen-Max 7.00 5.25 7.67 6.00 6.00 6.33
110 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Claude3-Sonnet-20240229 10.50 11.00 11.00 10.50 8.00 10.58
111 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM SciKnowMind-7b-v0.1 5.50 14.50 11.00 2.50 13.00 10.00
112 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen2-7B-Inst 9.50 9.25 9.67 7.50 10.00 9.17
113 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen1.5-14B-Chat 13.00 11.75 11.67 12.00 11.00 11.92
114 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM GPT-3.5-Turbo-0125 11.00 14.50 12.33 14.50 9.00 12.92
115 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Llama3-8B-Inst 13.00 13.50 19.00 13.00 26.00 15.75
116 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChemDFM-13B 17.00 15.50 16.33 16.50 18.00 16.33
117 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChemLLM-20B-Chat 14.50 12.50 24.00 15.50 25.00 17.25
118 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM MolInst-Llama3-8B 12.50 18.00 17.33 17.50 15.00 16.58
119 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen1.5-7B-Chat 17.00 15.50 20.33 18.50 24.00 18.17
120 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Gemma1.1-7B-Inst 19.00 19.50 12.33 15.50 12.00 16.33
121 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Mistral-7B-Inst-v0.2 19.50 14.25 17.00 14.50 14.00 15.83
122 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChatGLM3-6B 23.00 21.00 20.33 20.00 23.00 21.17
123 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Galactica-30B 19.00 22.75 14.33 23.50 16.00 19.58
124 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Llama2-13B-Chat 23.00 19.75 21.33 23.50 20.00 21.33
125 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM SciGLM-6B 24.00 22.25 24.33 23.50 17.00 22.83
126 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChemLLM-7B-Chat 21.00 18.75 21.67 21.00 19.00 20.25
127 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Galactica-6.7B 23.00 25.00 13.67 23.50 22.00 21.33
128 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM LlaSMol-Mistral-7B 26.00 24.75 22.00 26.00 21.00 24.17
wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Models L1 L2 L3 L4 L5 Avg
103 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Claude-3.5-Sonnet-20240620 1.00 4.50 2.00 3.00 2.67 3.06
104 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM GPT-4o-2024-05-13 2.00 5.00 4.00 3.50 4.00 4.18
105 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen2-72B-Inst 4.00 6.67 2.80 1.50 7.33 4.88
106 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM GPT-4-Turbo-2024-04-09 5.00 5.67 4.60 8.50 5.00 5.53
107 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Gemini1.5-Pro-latest 6.00 7.17 5.00 6.00 6.33 6.18
108 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Llama3-70B-Inst 7.00 4.50 7.00 4.50 6.33 5.71
109 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen-Max 8.00 6.33 7.40 10.00 9.00 7.65
110 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Claude3-Sonnet-20240229 9.00 9.83 8.20 7.50 5.33 8.24
111 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM SciKnowMind-7b-v0.1 3.00 13.50 11.00 7.00 14.33 11.53
112 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen2-7B-Inst 10.00 10.83 14.20 13.50 10.67 12.06
113 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen1.5-14B-Chat 11.00 12.00 15.00 12.50 9.00 12.35
114 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM GPT-3.5-Turbo-0125 12.00 12.67 18.40 7.00 8.00 12.82
115 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Llama3-8B-Inst 13.00 15.50 13.20 9.50 20.67 14.88
116 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChemDFM-13B 16.00 14.00 15.00 14.50 17.00 15.00
117 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChemLLM-20B-Chat 18.00 15.00 16.60 18.00 15.33 16.06
118 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM MolInst-Llama3-8B 14.00 17.67 13.20 16.50 18.00 16.06
119 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Qwen1.5-7B-Chat 15.00 15.67 20.00 17.00 21.67 18.12
120 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Gemma1.1-7B-Inst 20.00 16.17 14.00 18.00 16.00 15.94
121 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Mistral-7B-Inst-v0.2 19.00 12.17 14.00 14.50 16.33 14.12
122 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChatGLM3-6B 22.00 20.83 20.40 19.50 19.33 20.35
123 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Galactica-30B 17.00 22.67 21.20 22.50 19.00 21.24
124 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Llama2-13B-Chat 25.00 18.50 21.80 21.50 17.33 20.00
125 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM SciGLM-6B 24.00 20.00 20.60 22.50 17.67 20.29
126 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM ChemLLM-7B-Chat 23.00 21.00 25.20 22.50 21.00 22.53
127 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM Galactica-6.7B 21.00 24.33 20.20 24.50 21.67 22.47
128 user 25/07/2024 02:21 AM user 25/07/2024 02:21 AM LlaSMol-Mistral-7B 26.00 24.33 20.80 26.00 23.67 23.47

Datasets

Last updated: 22 July, 2024

wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Ability Level Task Name Task Type Data Source #Questions
1 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L1 Molecular Name Conversion MCQ PubChem 1,008
2 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L1 Molecular Property Identification MCQ, T/F MoleculeNet 1,625
3 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L1 Chemical Literature QA MCQ Literature Corpus 6,316
4 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L2 Reaction Mechanism Inference MCQ LibreTexts 269
5 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L2 Compound Identification and Properties MCQ LibreTexts 497
6 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L2 Doping Extraction RE NERRE 821
7 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L2 Detailed Understanding MCQ LibreTexts 626
8 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L2 Text Summary GEN LibreTexts 692
9 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L2 Hypothesis Verification T/F LibreTexts 544
10 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L2 Reasoning and Interpretation MCQ LibreTexts 516
11 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L3 Molar Weight Calculation MCQ PubChem 1,042
12 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L3 Molecular Property Calculation MCQ MoleculeNet 740
13 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L3 Molecular Structure Prediction MCQ PubChem 608
14 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L3 Reaction Prediction MCQ USPTO-Mixed 1,122
15 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L3 Retrosynthesis MCQ USPTO-50k 1,122
16 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L3 Balancing Chemical Equation GEN WebQC 535
17 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L3 Chemical Calculation MCQ XieZhi, SciEval, MMLU 269
18 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L4 Chemical Harmful QA GEN Proposition-65, ILO 454
19 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L4 Molecular Toxicity Prediction MCQ, T/F Toxric 870
20 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L4 Chemical Laboratory Safety Test MCQ, T/F LabExam (ZJU) 531
21 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L5 Molecular Captioning GEN ChEBI-20 943
22 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L5 Molecular Generation GEN ChEBI-20 897
23 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L5 Chemical Protocol Procedure Design GEN Protocol Journal 74
24 user 22/07/2024 01:24 PM user 22/07/2024 01:24 PM L5 Chemical Protocol Reagent Design GEN Protocol Journal 129
wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Ability Level Task Name Task Type Data Source #Questions
1 user 23/07/2024 04:40 AM user 24/07/2024 04:47 AM L1 Physics Literature QA MCQ Literature Corpus 4,403
2 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L1 Fundamental Physics Exam MCQ SciQ 2,375
3 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L2 Detailed Understanding MCQ Literature Corpus 400
4 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L2 Text Summary GEN Literature Corpus 400
5 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L2 Hypothesis Verification T/F Literature Corpus 400
6 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L2 Reasoning and Interpretation MCQ Literature Corpus 400
7 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L3 High School Physics Calculation MCQ tiku.cn 698
8 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L3 General Physics Calculation MCQ SciEval, SciBench 800
9 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L3 Physics Formula Derivation MCQ Physics Inference Dataset 218
10 user 23/07/2024 04:40 AM user 24/07/2024 04:47 AM L4 Physics Safety QA GEN Nature Portfolio 341
11 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L4 Laboratory Safety Test MCQ LabExam (ZJU) 606
12 user 23/07/2024 04:40 AM user 23/07/2024 04:40 AM L5 Physics Problem Solving GEN Qualifying Exam 302
wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Ability Level Task Name Task Type Data Source #Questions
19 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L1 Material Literature QA MCQ Literature Corpus 5534
20 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L2 Chemical Composition Extraction GEN Literature Corpus 203
21 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L2 Digital Data Extraction MCQ Literature Corpus 170
22 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L2 Detailed Understanding MCQ Literature Corpus 400
23 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L2 Text Summary GEN Literature Corpus 400
24 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L2 Hypothesis Verification T/F Literature Corpus 300
25 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L2 Reasoning and Interpretation MCQ Literature Corpus 359
26 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L3 Valence Electron Difference Calculation MCQ Metallic Glass Forming Database 146
27 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L3 Material Calculation MCQ MaScQA 348
28 user 24/07/2024 07:52 AM user 24/07/2024 07:52 AM L3 Lattice Volume Calculation MCQ Materials Project 160

Task Scores

Last updated: 22 July, 2024

wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Models chemical_literature_QA reaction_mechanism_inference compound_identification_and_properties extract_doping chemical_detailed_understanding chemical_text_summary chemical_hypothesis_verification chemical_reasoning_and_interpretation molar_weight_calculation molecular_property_calculation molecule_structure_prediction reaction_prediction retrosynthesis balancing_chemical_equation chemical_calculation chemical_harmful_QA mol_toxicity_prediction chemical_laboratory_safety_test molecule_captioning molecule_generation chemical_procedure_generation chemical_reagent_generation
52 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Claude-3.5-Sonnet-20240620 0.8735 0.9888 0.9879 0.5676 0.9968 4.8032 0.9505 0.9767 0.4568 0.3973 0.4730 0.5535 0.4563 0.4430 0.6766 0.7225 0.6057 0.8249 0.1199 0.7105 2.9459 2.4320
53 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM GPT-4o-2024-05-13 0.8575 1.0000 0.9859 0.4994 0.9952 4.7598 0.9486 0.9689 0.2591 0.3473 0.4608 0.4804 0.4118 0.1047 0.5242 0.0154 0.3740 0.8523 0.1390 0.6086 2.9595 2.4240
54 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Qwen2-72B-Inst 0.8420 0.9963 0.9879 0.5743 0.9920 4.7945 0.9376 0.9767 0.2006 0.2743 0.4301 0.2843 0.3387 0.1776 0.5428 0.2974 0.5391 0.7985 0.1198 0.3796 2.7027 2.1680
55 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM GPT-4-Turbo-2024-04-09 0.8233 0.9963 0.9859 0.6063 0.9936 4.8017 0.9541 0.9825 0.2774 0.2608 0.3885 0.3610 0.2834 0.0991 0.4647 0.2247 0.3415 0.7764 0.1061 0.5762 2.6486 2.4419
56 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Gemini1.5-Pro-latest 0.8328 0.9926 0.9819 0.5929 0.9936 4.3271 0.9156 0.9612 0.2706 0.3865 0.3897 0.2941 0.3182 0.2673 0.6989 0.8194 0.3902 0.8017 0.0720 0.5729 2.3784 2.3178
57 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Llama3-70B-Inst 0.8313 0.9814 0.9759 0.5789 0.9856 4.8292 0.9229 0.9650 0.2841 0.3392 0.3713 0.2807 0.2201 0.3290 0.4535 0.5793 0.6172 0.7966 0.0919 0.5287 2.4189 2.1840
58 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Qwen-Max 0.8309 0.9926 0.9759 0.5502 0.9920 4.8090 0.9376 0.9748 0.1891 0.3189 0.3137 0.3993 0.3984 0.2542 0.4758 0.1872 0.5563 0.7797 0.1031 0.3627 2.6892 2.1360
59 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Claude3-Sonnet-20240229 0.8033 0.9926 0.9638 0.5847 0.9856 4.7048 0.8991 0.9592 0.2466 0.2730 0.2439 0.4349 0.3217 0.2804 0.3606 0.7731 0.2520 0.6582 0.1245 0.5321 2.8243 2.1318
60 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM SciKnowMind-7b-v0.1 0.8687 0.9740 0.9618 0.4644 0.9760 4.5890 0.8422 0.9709 0.5931 0.8932 0.7512 0.7513 0.5169 0.2935 0.4015 0.0000 0.5839 0.8079 0.1073 0.2131 1.3014 1.0588
61 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Qwen2-7B-Inst 0.7977 0.9480 0.9598 0.5253 0.9617 4.8509 0.8752 0.9553 0.1871 0.2716 0.3051 0.2077 0.3182 0.1514 0.3866 0.0396 0.5586 0.7834 0.0831 0.2215 2.2162 1.9200
62 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Qwen1.5-14B-Chat 0.7811 0.9777 0.9618 0.4750 0.9696 4.6151 0.8844 0.9592 0.2188 0.3595 0.2917 0.4269 0.3913 0.1551 0.3978 0.0264 0.3089 0.7173 0.1031 0.2446 2.1622 2.1040
63 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM GPT-3.5-Turbo-0125 0.7855 0.9703 0.9396 0.5070 0.9649 4.7453 0.8826 0.9417 0.1900 0.4108 0.3456 0.2647 0.2531 0.2355 0.3606 0.0837 0.2846 0.6878 0.1201 0.5012 2.1622 2.1200
64 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Llama3-8B-Inst 0.7873 0.9851 0.9577 0.5265 0.9744 4.5398 0.8771 0.9650 0.1996 0.3351 0.2745 0.3957 0.3556 0.1720 0.3457 0.4361 0.2683 0.6751 0.0452 0.3780 2.2568 2.0320
65 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM ChemDFM-13B 0.7724 0.9703 0.9598 0.4729 0.9744 4.3198 0.8569 0.9495 0.1823 0.3108 0.3811 0.6399 0.4332 0.1327 0.3086 0.0374 0.3740 0.5992 0.2530 0.8493 1.7568 1.6320
66 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM ChemLLM-20B-Chat 0.7523 0.9888 0.9638 0.4083 0.9728 4.5760 0.9009 0.9320 0.2015 0.3446 0.2598 0.3084 0.1551 0.1458 0.2974 0.0000 0.2602 0.4684 0.1755 0.5155 2.0676 1.4880
67 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM MolInst-Llama3-8B 0.7457 0.9814 0.9618 0.4976 0.9601 2.9768 0.8624 0.9495 0.2006 0.3189 0.3493 0.2709 0.4055 0.1813 0.4015 0.0154 0.3333 0.6793 0.0236 0.2785 1.0405 1.0775
68 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Qwen1.5-7B-Chat 0.7598 0.9628 0.9316 0.4799 0.9473 4.6006 0.8422 0.9495 0.2131 0.3338 0.3272 0.2799 0.3824 0.1047 0.3383 0.0308 0.2520 0.6245 0.0752 0.2091 2.0946 2.1085
69 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Gemma1.1-7B-Inst 0.4490 0.1673 0.1972 0.4769 0.1709 3.9783 0.8661 0.4272 0.1603 0.2459 0.2770 0.1248 0.2709 0.2505 0.3234 0.2379 0.2358 0.5907 0.1306 0.2226 2.3514 1.7752
70 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Mistral-7B-Inst-v0.2 0.7327 0.9591 0.9155 0.1136 0.9409 3.7757 0.1119 0.9495 0.1881 0.2527 0.2941 0.0588 0.0125 0.0000 0.3309 0.1079 0.3821 0.5570 0.0333 0.1328 1.0638 1.0000
71 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM ChatGLM3-6B 0.6937 0.9108 0.8431 0.3758 0.9105 4.1447 0.7945 0.8796 0.2399 0.2297 0.2586 0.2094 0.0588 0.0916 0.2862 0.0441 0.2439 0.5865 0.0921 0.1522 1.4459 1.6279
72 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Galactica-30B 0.7663 0.9219 0.8189 0.3919 0.8946 2.0275 0.5596 0.8835 0.2514 0.2595 0.2586 0.0098 0.1622 0.2187 0.2862 0.0110 0.2114 0.5148 0.0375 0.2401 1.0811 1.1680
73 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Llama2-13B-Chat 0.5058 0.9219 0.8410 0.4558 0.8850 4.2967 0.8073 0.9165 0.2524 0.2608 0.1164 0.0357 0.0499 0.0822 0.2119 0.4537 0.2764 0.3586 0.0747 0.1662 1.8378 1.8720
74 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM SciGLM-6B 0.6657 0.9257 0.8974 0.2774 0.9249 2.8567 0.5908 0.8913 0.2063 0.3000 0.2365 0.0187 0.0062 0.1121 0.2788 0.0154 0.2439 0.5021 0.1228 0.2984 1.0135 1.2320
75 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM ChemLLM-7B-Chat 0.6617 0.9071 0.8491 0.2095 0.9042 3.5326 0.8055 0.8971 0.2361 0.2851 0.2549 0.0428 0.2201 0.1364 0.2862 0.0000 0.2276 0.5063 0.0480 0.1983 1.1081 1.1680
76 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM Galactica-6.7B 0.7038 0.7546 0.7002 0.4199 0.7732 1.2173 0.0000 0.7553 0.2495 0.2514 0.3064 0.2888 0.1649 0.0523 0.2379 0.0396 0.2683 0.4177 0.0351 0.1314 1.0000 1.0240
77 user 25/07/2024 02:22 AM user 25/07/2024 02:22 AM LlaSMol-Mistral-7B 0.3911 0.6431 0.6258 0.1857 0.6214 2.4916 0.4532 0.7437 0.2495 0.1959 0.2059 0.0134 0.0027 0.0037 0.3383 0.0000 0.2033 0.2743 0.2600 0.6840 1.0769 1.0000
wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Models physics_literature_QA fundamental_physics_exam physics_detailed_understanding physics_text_summary physics_hypothesis_verification physics_reasoning_and_interpretation high_school_physics_calculation general_physics_calculation physics_formula_derivation physics_safety_QA physics_laboratory_safety_test physics_problem_solving
77 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Claude-3.5-Sonnet-20240620 0.8617 0.9453 0.9950 4.8225 0.9850 0.9975 0.5716 0.4963 4.8257 0.8830 0.7805 3.7616
78 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM GPT-4o-2024-05-13 0.8756 0.9659 0.9950 4.8775 0.9875 0.9975 0.5745 0.3962 4.8624 0.8801 0.7871 3.8576
79 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen2-72B-Inst 0.8429 0.9524 0.9975 4.8900 0.9850 0.9950 0.6705 0.4500 4.8119 0.8743 0.8152 3.0563
80 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM GPT-4-Turbo-2024-04-09 0.8322 0.9558 0.9925 4.8800 0.9750 0.9925 0.5444 0.4188 4.7982 0.8509 0.7987 3.7682
81 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Gemini1.5-Pro-latest 0.8474 0.9558 0.9925 4.4150 0.9725 0.9825 0.6046 0.4662 4.4495 0.8480 0.7789 3.5728
82 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Llama3-70B-Inst 0.8404 0.9398 0.9900 4.8300 0.9750 0.9900 0.4599 0.4313 4.6101 0.8772 0.7558 2.7781
83 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen-Max 0.8236 0.9406 0.9925 4.9200 0.9775 0.9950 0.6117 0.3175 4.7936 0.8304 0.7970 2.9603
84 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Claude3-Sonnet-20240229 0.7891 0.9048 0.9925 4.7925 0.9600 0.9875 0.3711 0.3337 4.5688 0.8129 0.7541 2.6391
85 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM SciKnowMind-7b-v0.1 0.8590 0.9234 0.9925 4.4125 0.9375 0.9900 0.4441 0.3488 3.3303 0.9006 0.7954 1.4238
86 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen2-7B-Inst 0.7832 0.9124 0.9925 4.7650 0.9600 0.9925 0.4341 0.3550 4.2339 0.8304 0.7822 1.8477
87 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen1.5-14B-Chat 0.7579 0.9023 0.9900 4.7725 0.9575 0.9900 0.4771 0.3075 3.7752 0.8099 0.7327 1.7682
88 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM GPT-3.5-Turbo-0125 0.7600 0.9069 0.9850 4.6825 0.9400 0.9900 0.3367 0.3475 4.0505 0.7661 0.7178 1.9272
89 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Llama3-8B-Inst 0.7761 0.8926 0.9925 1.0200 0.9525 0.9950 0.3009 0.3475 1.2202 0.8129 0.7030 1.0099
90 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM ChemDFM-13B 0.7228 0.8893 0.9925 4.3675 0.9275 0.9875 0.4226 0.2838 1.4725 0.7661 0.6799 1.0894
91 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM ChemLLM-20B-Chat 0.7386 0.8931 0.9875 4.5000 0.9700 0.9900 0.2908 0.1850 1.1330 0.7632 0.7244 1.0199
92 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM MolInst-Llama3-8B 0.7550 0.9061 0.9900 2.5100 0.9400 0.9725 0.3066 0.3300 1.7661 0.7632 0.6815 1.2185
93 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen1.5-7B-Chat 0.7296 0.8644 0.9825 4.6925 0.9175 0.9900 0.3496 0.2387 1.0046 0.7632 0.6749 1.0232
94 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Gemma1.1-7B-Inst 0.6794 0.8531 0.9800 3.8975 0.9225 0.9650 0.3195 0.3613 3.3486 0.7339 0.7376 1.5497
95 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Mistral-7B-Inst-v0.2 0.7123 0.8008 0.9775 4.7550 0.9475 0.9900 0.3080 0.3038 2.4725 0.7719 0.6931 1.3775
96 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM ChatGLM3-6B 0.6358 0.7604 0.9700 4.2725 0.8675 0.9675 0.3109 0.2425 1.2798 0.7456 0.6271 1.0298
97 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Galactica-30B 0.7323 0.7693 0.9725 1.5000 0.5350 0.9525 0.3009 0.3987 2.2661 0.6754 0.3812 1.1490
98 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Llama2-13B-Chat 0.4332 0.7781 0.8975 4.4275 0.8825 0.9800 0.1862 0.2537 1.6560 0.4035 0.5990 1.0695
99 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM SciGLM-6B 0.6129 0.7200 0.9650 2.9425 0.8850 0.9375 0.2364 0.2213 1.1250 0.6228 0.4637 1.1074
100 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM ChemLLM-7B-Chat 0.6358 0.8101 0.9700 2.9850 0.9500 0.9700 0.3352 0.2325 1.0000 0.6959 0.6271 1.0791
101 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Galactica-6.7B 0.6533 0.6623 0.9025 1.2550 0.4150 0.8300 0.3138 0.3962 2.0000 0.5936 0.5281 1.0430
102 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM LlaSMol-Mistral-7B 0.3597 0.2943 0.8000 2.6624 0.1600 0.9025 0.1232 0.2587 1.3807 0.3158 0.2096 1.0503
wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at Models material_literature_QA material_component_extraction material_data_extraction material_detailed_understanding material_text_summary material_hypothesis_verification material_reasoning_and_interpretation valence_electron_difference_calculation material_calculation lattice_volume_calculation perovskite_stability_prediction diffusion_rate_analysis material_safety_QA material_toxicity_prediction property_and_usage_analysis crystal_structure_and_composition_analysis specified_band_gap_material_generation
77 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Claude-3.5-Sonnet-20240620 0.7765 0.5431 0.9588 0.9025 4.7800 0.8300 0.9916 0.6027 0.5489 0.7188 0.4896 0.7852 0.8870 0.6683 2.7966 1.2041 2.8267
78 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM GPT-4o-2024-05-13 0.7647 0.5616 0.8882 0.9025 4.8200 0.9000 0.9833 0.4658 0.3793 0.5312 0.6188 0.6913 0.8621 0.6715 2.6780 1.2908 2.5133
79 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen2-72B-Inst 0.7423 0.4791 0.9176 0.8975 4.7900 0.8133 0.9972 0.4521 0.4195 0.5625 0.5333 0.7987 0.8704 0.7008 2.6525 1.1633 2.5100
80 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM GPT-4-Turbo-2024-04-09 0.7407 0.4889 0.9176 0.9025 4.7875 0.8700 0.9889 0.3973 0.4770 0.5375 0.5167 0.6577 0.8383 0.6374 2.5932 1.2857 2.7067
81 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Gemini1.5-Pro-latest 0.7385 0.6084 0.9353 0.8975 4.3975 0.7900 0.9833 0.4795 0.5029 0.4188 0.4396 0.7785 0.8526 0.6520 2.8390 1.1837 2.2833
82 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Llama3-70B-Inst 0.7295 0.6786 0.9235 0.9025 4.7725 0.7133 0.9944 0.4315 0.3764 0.4500 0.4854 0.4497 0.8490 0.6764 2.6525 1.2551 2.5033
83 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen-Max 0.7161 0.5209 0.8941 0.8975 4.8250 0.8500 0.9833 0.3904 0.3391 0.5000 0.5021 0.4497 0.8407 0.6081 2.4831 1.1786 2.4567
84 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Claude3-Sonnet-20240229 0.7085 0.6207 0.8824 0.8925 4.6025 0.6867 0.9554 0.4658 0.3506 0.3937 0.3563 0.6980 0.8335 0.6650 2.7458 1.1276 2.6967
85 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM SciKnowMind-7b-v0.1 0.7454 0.4828 0.8471 0.9000 4.3125 0.5367 0.9805 0.3699 0.2931 0.4313 0.3875 0.3691 0.8609 0.6309 2.6949 1.0357 1.3867
86 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen2-7B-Inst 0.6877 0.5825 0.8118 0.8900 4.7425 0.6833 0.9805 0.3082 0.2500 0.3937 0.3542 0.4362 0.7895 0.5919 2.7119 1.0714 2.1267
87 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen1.5-14B-Chat 0.6789 0.5431 0.8412 0.8750 4.7175 0.6900 0.9638 0.2808 0.3218 0.4250 0.3187 0.3087 0.8014 0.5593 2.7542 1.0510 2.4367
88 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM GPT-3.5-Turbo-0125 0.6706 0.5185 0.8588 0.8725 4.5375 0.7600 0.9415 0.3151 0.2759 0.2313 0.3542 0.2148 0.7919 0.6846 2.6186 1.2551 2.3333
89 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Llama3-8B-Inst 0.6690 0.0000 0.9059 0.8875 1.1825 0.6500 0.9833 0.2877 0.2759 0.5000 0.3542 0.3893 0.8002 0.6390 1.0763 1.1173 1.0000
90 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM ChemDFM-13B 0.6523 0.5185 0.8235 0.8750 4.3050 0.7200 0.9610 0.4110 0.2213 0.3312 0.3292 0.4161 0.7634 0.6081 2.3305 1.0204 2.0733
91 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM ChemLLM-20B-Chat 0.6446 0.0468 0.8471 0.9025 4.4675 0.4500 0.9666 0.2945 0.2241 0.3187 0.3146 0.4564 0.7646 0.4878 1.9237 1.0918 2.0100
92 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM MolInst-Llama3-8B 0.6534 0.4113 0.8471 0.8850 2.4575 0.5133 0.9471 0.3699 0.2270 0.4437 0.3563 0.3624 0.7610 0.5480 1.4915 1.0816 1.2867
93 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Qwen1.5-7B-Chat 0.6531 0.3953 0.8235 0.8525 4.5475 0.6467 0.9694 0.1986 0.1954 0.4125 0.3417 0.2483 0.7872 0.5008 1.1525 1.0306 1.2767
94 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Gemma1.1-7B-Inst 0.6113 0.6527 0.8118 0.8700 3.7425 0.5667 0.9304 0.3219 0.2845 0.3625 0.3458 0.3691 0.7325 0.5268 2.4068 1.0000 2.4433
95 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Mistral-7B-Inst-v0.2 0.6297 0.5123 0.8706 0.8775 4.7000 0.7433 0.9387 0.3219 0.2787 0.3187 0.4292 0.3691 0.7515 0.6163 2.5424 1.0000 2.3167
96 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM ChatGLM3-6B 0.5593 0.1330 0.6118 0.8225 4.0975 0.6267 0.9164 0.2808 0.2155 0.3125 0.3125 0.2685 0.7229 0.5089 2.1102 1.0153 1.4867
97 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Galactica-30B 0.6456 0.4002 0.5882 0.8425 1.6598 0.2667 0.8914 0.2466 0.2816 0.2562 0.2687 0.2081 0.6778 0.3593 1.1795 1.1020 1.1933
98 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Llama2-13B-Chat 0.4387 0.5505 0.7059 0.7725 4.2475 0.5200 0.9387 0.2260 0.2241 0.3438 0.2771 0.1812 0.4185 0.5187 2.1864 1.0255 2.0067
99 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM SciGLM-6B 0.5365 0.4828 0.7647 0.8425 2.9223 0.5933 0.8830 0.2466 0.2471 0.3000 0.2729 0.2550 0.6409 0.4683 1.9068 1.0459 1.6633
100 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM ChemLLM-7B-Chat 0.5492 0.0012 0.6176 0.8375 3.3000 0.6633 0.9053 0.2329 0.2069 0.0563 0.2562 0.1611 0.7170 0.3041 1.9661 1.0204 1.0833
101 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM Galactica-6.7B 0.5799 0.2561 0.4471 0.7875 1.2323 0.2300 0.7521 0.2808 0.2816 0.2687 0.2646 0.2349 0.5672 0.2992 1.2143 1.0357 1.0333
102 user 25/07/2024 02:23 AM user 25/07/2024 02:23 AM LlaSMol-Mistral-7B 0.3150 0.0254 0.7059 0.6775 2.6624 0.1933 0.6964 0.2329 0.3851 0.0563 0.2354 0.2148 0.3341 0.2065 1.4576 1.0051 1.0100

Submission

Upload your results <json file>

    FAQ

    Where can I download the SciKnowEval dataset?

    The dataset can be found in our GitHub.

    What’s the format of uploaded results?

    The examples of results can be found in our GitHub.

    How long will it take for the results to appear in the leaderboard?

    The evaluation will take some time after submission. Please be patient, usually less than a week.

    How to contact us?

    Please contact Mr. Junjie Huang,   junjie6282@zju.edu.cn